When designing a data encoding, once the decision to use conventional binary representation is made, the next fundamental decision is whether the data should be memory aligned or not. RPC, CORBA and D-BUS use memory aligned data while ICE and IDR don't.
Memory aligned data ensures that 2, 4, 8, 10 or 16 byte values are stored at an address multiple of their size. For instance a 2 byte value (short integer) would then be stored at addresses 0, 2, 4 or 6. A 4 byte value (long integer) would be stored at addresses 0, 4 or 8 and an 8 byte value at addresses 0, 8, etc.
Some processors (Itanium, RISC) can only handle aligned data and the programmer has to add support for unaligned data himself. The x86 processor supports unaligned data but with a performance penalty. A quick benchmark on a X86 compatible processor showed that accessing unaligned data is nearly twice slower than accessing aligned data.
Memory aligned data requires padding space which on average can represent about 1/3 of used memory space. On modern PC with multi gigabyte RAM, this memory overhead is not relevant. But for hand held devices, embedded computer, long term stored or transmitted data, the memory overhead is much more relevant. A multipurpose encoding should thus care about memory usage as much as encoding/decoding performance.
It is to be noted that generating aligned data may in some case require additional computation and may thus have its own overhead. The code to marshal unaligned data is much simpler and straightforward. One use a simple pointer incremented by the accessed data type size. Getting rid of the alignment constrain also simplify PDU message encoding and encapsulation.
So basically the only drawback of using serialized and unaligned data encoding is the memory access overhead. But this penalty can be removed by ad'hoc hardware or processor instruction. For instance, processors could introduce a pointer variant behaving like an iterator on varying size data. This iterator could take advantage of asynchronous memory prefetching.
RPC, CORBA and D-BUS also benefit from a nearly direct mapping between encoded and internal data structure representation. This is fine when communicating with programs using this representation. But what about java, python or ruby which have their own data and object representation ? This optimization doesn't hold anymore.
This analysis has lead us to the decision to use sequential unaligned data for IDR encoding. What would be your choice on this matter ?
DIS is designed to address many shortcomings of existing internet services. But it is also subject to the network effect which means that its value is function of the number of users.
It's like a TV broadcasting system. Whatever the quality of the technology, without interesting broadcasts it won't attract users and without users it won't attract producers and without producers it won't get attractive broadcasts and then again no users. But when the feedback in such system becomes positive we get a virtuous circle which amplifies its spinning by itself. And once the virtuous circle is spinning one can tap its energy with advertisement, freemium or even a fee if the traction is strong enough.
To bootstrap it, one has to pick the right TV broadcast which has to be one that people want to see while also simple and cheap to produce. As you may guess this is a strategic choice.
Starting such type of system is not easy, but once rolling it is quite safe because it is difficult to stop and provides thus a protection against competitors. A good example is Google failing to catch on You Tube which ended up buying it.
Like the TV broadcasting system, DIS offers the liberty degree to pick the right bootstrapping application. It has to be something that people want while being also cheap and simple to produce.
The business model is also strategic. Offering a free service means not tapping energy from the virtuous circle which is good to get maximum acceleration. But at some point we need to tap energy from it so that we can invest it in development of new features and make it more attractive.
The business model and the bootstrapping application are thus key to success, but don't expect me to present them here before they are in production.
This picture of Marseille was taken from the top of Marseilleveyre, a small mountain south of Marseille. The small harbor in front is the harbor of Pointe Rouge. The Vieux Port, center of Marseille, and Notre Dame de la Garde, are not visible here.
One of the islands in the bay host the Château d'If that was a prison for sometime. This place was made famous by the roman "Le Comte de Monte Cristo" of Alexandre Dumas.
I just wanted to share with you two nice pictures taken during my summer holidays.
The picture shows some vineyards and a very old church in a small village close to Blaye. The wine of this region is labeled côte de grave and famous. The river Gironde is just behind the small hill in the background. The famous Bordeau wine region Medoc is just on the other side of the Gironde.
Eric Schmidt, Google CEO, presented his view of the web evolution track at the Seoul Digital Forum.
DIS was conceived with this view in mind. The protocol and key service are the keystone of such system.
There are other properties that Eric Schmidt didn't evoke but that require a better response than provided by existing systems.
The low level C++ wrapper class for cryptographic functions is now finalized. I use XySSL as low level C cryptographic library. XySSL is an open source project of Christophe Devine, a French computer scientist specialized in security. XySSL will support the VIA padlock cryptographic engine which is a good news since VIA servers are cheap, cold and low consuming computers.
The signing algorithm is parameterized so that one can easily switch to a stronger model if needed. For now we'll use the PKCS1 2.0 OAEP signature model described in RFC3447 because it is stream friendly. The signature model described in IEEE 1363a adds a salt with the hash value. The salt is some random bytes that are hashed before the information to sign.
The problem with this is that the salt is not available when starting to decode the information. To do so we would have to put the signature in front of the information. But then it is the signature generation that would not be stream friendly. One would have to first serialize the data in some buffer so that we can compute the hash value and encode the signature. This then breaks the stream processing model.
It is not clear to me how this salt adds any security to the signature. Please add a comment if you have some hints on this. It seem that picking a stronger hash function with longer digest or combining multiple hash functions output would contribute more to security than the salt value.
An enlightening reading on hacking! 'To those about to hack"
As people may have understood by now, I'm more of an Abe than a George...
Progress is good on multiple fronts.
- I never managed to make libgc (C++ garbage collector) work with code compiled in release mode (VC2003). I spent some time debugging it without success. Version 7.0 has just been release but the problem is still there. So I had to solve it. I finally found out the cause and made a quick hack for my code to work. The author has been notified and I hope the bug will be definitely fixed in the next release.
- In the mean time I also investigated various cryptographic packages to use for the prototype. There are quite many out there. Openssl is the one I'll pick because it fits best my requirements. But it needs a C++ wrapper that makes its use more simple and convenient as in other C++ cryptographic packages.
- Signed and multi-signed information data encoding format is now finalized. It was not trivial because the requirements were quite tricky to match. Their properties are attractive, but this must still be implemented and tested to validate its usability.
I've added a photo of my self so you can associate a face to DIS. The city where I live and work is Marseille which is in Provence (south of France). View it with Google maps.
Below you can see the calanque of Sugiton, just next to Marseille, that you can reach after a 30 minute walk through a pine wood from Luminy.
A modern communication protocol must be secure. And to do it right, security must have been integrated in the design from the very start. Here is a short list of security requirements for DITP:
- authenticate peers
- support exchanged data authentication and encryption
- provide access control on accessible services, objects and methods
- support single and multi-signed information of any kind
- signed information supporting polymorphism and aggregates
- allow anyone to verify any signature with minimal knowledge
Multi-signed information is when more than one people sign a given information, (i.e. a contract).
With a stream oriented encoding this all imply that we are able to apply a hash function (i.e. SHA) on transmitted data while it is encoded or decoded.
This is what I am currently implementing. Unfortunately, a server crash, monopolized all my time this week. Murphy's law revenge...