Distributed Information System (DIS)
  • Home
  • The blog
  • Contact

Debugging DIS

9/8/2008

0 Comments

 

While developing a prototype of DIS to check and validate it, I'm frequently confronted to bugs.

One reason is that the complexity of the program is comparable to a compiler. Individual encoding rules are simple, but they can be used in an infinite set of combinations.


After testing many different debuggers on Linux, my conclusion is that none of them is as good as the one of Visual C++ on Windows. So when I have to develop new code which may require debugging, I always develop it with Visual C++. Once it is validated, I move it on Linux.

The biggest difference is in the capability to explore data structures, STL containers, and other application specific data. If there wasn't Visual C++, I would have completely dropped Windows. It was thus a very smart marketing move of Microsoft to make Visual C++ available for free.

But Visual C++ is not yet perfect. So when not debugging, I prefer working on Linux. On feature I'm really missing in Visual C++ is one provided in Eclipse.

With object oriented programming, one usually store one class per file. I guess it is to simplify locating the class definition information. Simply look for a file with the same name as the class. The back side of this is that we end up with the code spread in many files. But when browsing the code, I often met a method call and would like to see its implementation.

To do this I have to switch to, or open, the corresponding file, locate the method definition in the file. Once examined, I may want to go back to where I was before.

Eclipse has this smart and powerful ability to change an identifier into a hyper text link. One simply move the pointer over the identifier and press the control key at the same time. The identifier changes into a hyper text link (underlined blue text) and a click moves you directly on the method implementation.

In visual, you get a context menu when clicking on an identifier and then you have to locate and click the "go to definition" menu command. Two clicks.

0 Comments

Making DITP flexible, versatile and simple

9/4/2008

0 Comments

 

DITP is flexible, versatile and simple because it uses the inter-object communication model. Not only for the user needs to communicate with its service, but also to setup and configure the connection data processing (i.e. authentication, encryption, compression, logging, tunneling,...).


Inter-object communication

By adopting the inter-object communication model users can create any type of service (remote object) they want. They can also extend or refine their capabilities by using inheritance with the polymorphism property and preserve backward compatibility at the same time.

This makes DITP versatile and flexible, but any other inter-object communication protocol could claim the same.

Configuring and setting up the connection

What makes DITP different is that the connection configuration and setup are also performed by using the object oriented model. The different algorithms used are controlled by specific services and the client controls them by invoking the appropriate methods.

This makes the protocol very flexible and versatile since the algorithm can be combined and configured in any way. It is easy to add support for new algorithms and there is no constrain on the transaction polka required to configure them.

This design choice basically factorizes and parameterizes the protocol. What is left for DITP to define is how to open a connection, how to exchange messages between client and services and how to setup a new client-service binding.

Opening the DITP connection

Opening a DITP connection implies a very simple transaction where the client and the service side exchange a four byte message. If both message contain the expected value, the connection is considered opened. It can hardly be made simpler.

When the connection is opened the client and service implicitly attach a channel control service to the connection. This service has very few methods. One is used to close the connection and the other to request the attachment of another service whose type and identity are given as argument.

That is all it takes to have an operational DITP server. The exchanged messages have also a very simple structure, but will be described in another note because they have another original feature allowing to minimize latency.

Once the connection is opened, if the client wants to secure the connection by adding authentication or encryption, he request the attachment of the corresponding services and configure them by calling their methods.


This is why I claim that DITP is versatile, flexible and simple.

0 Comments

Optimizing DITP connection open

8/25/2008

0 Comments

 

The DITP protocol has been designed to minimize the time required to setup an operational connection. This is achieved by a simple method which is made explicit in the following figure.

In common protocols, like HTTP and SMTP, the server is expected to send a greeting before the client can respond and proceed by sending its first request.

The time the client has to wait for this greeting message is usually dominated by the round trip time. In a LAN the round trip time is less than a millisecond, but on Internet it will take many tens of milliseconds and sometime hundreds of millisecond if the server is on another continent.

By simply swapping the DITP open transaction orientation, we save one round trip time delay before the client can sent its first transaction request. Another advantage of this method is that the server can use a very narrow timeout window for the arrival of the DITP connection setup request. This protects against some type of DOS attacks.

There are two additional things we can observe from the previous figure.

1.- The HTTP or SMTP protocol could be optimized by allowing the client to send its first data without having to wait for the server greeting.

2.- The round trip time due to TCP could be avoided if we could combine it with the DITP connection set up and the two first requests. There is clearly room for improvement on this layer, but this is out of scope regarding this project.

0 Comments

"8 Reasons why XML sucks"

7/15/2008

0 Comments

 

The choice to make IDR a binary data representation was not easy to make because it is not in the current trend. The following very interesting article comfort me in my choice "8 Reasons why XML sucks".

0 Comments

Time value encoding in DIS

5/26/2008

0 Comments

 

One fundamental question is the encoding of a time value. A time value has two types of use. One is as time stamp and the other is just as a general time reference.

Requirements

On one hand, a time stamp has the requirement to have a well defined and controlled precision, while the covered time span can be limited (i.e. +/- 200 years).  On the other hand, a general time reference needs to be applicable to a very large time span, with less constrains on the precision limit.

Options

For the time reference value one could use a double precision float representation with seconds as units. All arithmetic operations are provided right out of the box and generally hardwired in the processor. Conversion to calendar time is trivial since one simply has to extract the integer part of the value and convert it to a time_t value. From there one can use the common calendar time conversion and formatting functions.

For time stamps, using integers seems preferable. But we still have a choice between a split encoding like the timeval structure, a 64bit fixed point encoding, or an integer with very small time unit (i.e. nanoseconds).

Discussion

There is not much to discuss about the absolute time. Using a double precision float is an optimal solution. For time stamps however we have three different solutions.

From my experience, I've seen that split time encoding like the timeval structure is not convenient to use when dealing with time arithmetics. It is even error prone if the user has to program the operations himself.

I also tried to implement a fixed point time encoding class with the decimal point between bit 29 and 30. But this is tricky to get right and some operations are not trivial to implement correctly. This is because fractional computation requires normalization and optimal rounding errors handling.

A 64bit  integer using  nanoseconds as time units is apparently the most simple and straightforward time stamp encoding. Converting to seconds is done with a simple 64bit integer division which is also hardwired in most recent processors. Conversion to other time units like microseconds, milliseconds, days or week is as accurate and simple. Multiplication or division with decimal scalar values is also trivial.

Another advantage of the 64bit integer nanosecond values is that there is no need of special functions to do the conversions or operations. A programmer can easily figure out what to do and use conventional arithmetic operations.

With a 64 bit signed integers with nanosecond units, the covered time span is over +/- 292 years range. One can thus afford keep the current time_t January 1970 epoch and push back the wrapping limit far away. 

Conclusion

In DIS, we'll thus use a double precision float for general time reference value and a 64bit integer with nanosecond units for time stamps and delays encoding.

Note: I've seen the use of a double precision float for time encoding in some Windows operating system API. I still have to see the use of a 64bit signed integer with nanosecond units. It would make sense as an upgrade of time_t which is required since we are getting close to the wrapping limit.Update : It has been brought to my attention that Java stores time values in a signed 64bit integer with milliseconds as time units relative to January 1, 1970. The covered time span is thus +/- 290 million years. I'll stay with the nanosecond units for time stamps.

0 Comments

Progress status and cardinality encoding....

5/12/2008

0 Comments

 

It is time for a new communication duty on my project. It's still in a steady progress but not as fast as I would like. I now use the latest version of libgc. I spent most of my time last month searching the source of a major memory leak. I finally found out that it was caused by STL containers. I changed the code and now use gc_allocator.  Even strings had to be changed. Now the client and server run without any memory leak. I thought of changing language (i.e. D) but I didn't want to cut myself away from the C++ community as potential users. So I had to sort out the problem and I finally did.

The client service communication model is now finalized and under test. The system is ready to support transmitted data processing (compression, authentication and encryption).

In this  note I'll explain the encoding of a key data type I name cardinality. A cardinality is the number of elements in a sequence. Because of its efficient encoding I extended its use to other type of information. What makes a cardinality different from a classical unsigned integer is that small values are much more frequent than big values. Consider for instance strings. Most strings will be less than 256 bytes in length but from time to time big strings may show up.

We could thus benefit from an encoding that is compact for small values and eventually bigger for big values. I spent some time investigating all the possible encodings I could find and the most interesting ones have been found in BER and in ICE.

BER's cardinality encoding

BER stands for Basic Encoding Rules and is used in SNMP and x509 certificates encoding. In this encoding the cardinality value is chopped in 7 bits chunks and stored in as many bytes in big endian order. The leading bytes with value 0 are dropped. The most significant bit of all the bytes is set to one, except in the last byte where it is set to 0 and signals the last byte.

I implemented a BER encoder and decoder a long time ago and, as attractive as this encoding might be, it is not trivial to do and it requires some bit manipulations on bytes which makes it not optimal in term of performance.

ICE's cardinality encoding

ICE is an inter-object communication protocol which is really worth looking at and eventually use until DITP is available ;). Check their web site for more information.

ICE was apparently initially inspired by IIOP  that simply doesn't have a cardinality. Sequence length are encoded as a straight 4 byte integer. But ICE encodes classes and method names as strings which are generally short. The 4 byte encoding overhead was very clear. So the designer of ICE's encoding added a small change in it that significantly increased its efficiency.

If the cardinality value is less than 255, the value is stored in a single byte. Otherwise a byte with the value 255 is written and followed by the 4 byte value encoding the cardinality value. Encoding and decoding is trivial and much more efficient then the BER encoding. It is not as compact for values bigger or equal to 255, but the overhead is insignificant when considering the amount of data that follows (i.e. string).

IDR's cardinality encoding

IDR extends the idea found in ICE by supporting 8 byte encoding and the intermediate 2 byte integer size. As with ICE, if the value is smaller than 255 the cardinality is encoded as a single byte. If less than 65535, it is encoded in a total of three bytes. The first byte holds the value 255 and the cardinality value is stored in the following 2 bytes as a classical unsigned integer. If the value is bigger, it is followed by a 4 byte value, etc. up to the 8 byte integer.

The encoding is as trivial and efficient as the one of ICE, if we take care to detect and encode small values first. It is more compact for values up to 65535, and then becomes less efficient because big values have a long encoding size. But as already pointed out, this overhead is insignificant when considering the amount of data that follows.

Use of cardinality in IDR

The cardinality encoding is so efficient that it its use has been extended for other types of information. Here is a short list of them.

- Object references in serialized object aggregates are encoded as cardinality because a reference is simply the object index number in the sequence of serialized objects. The value 0 is reserved for the null reference and the first serialized object is thus identified by the reference value 1. Small reference values are expected to be the most frequent.

- Methods are identified by a number instead of a string as is common place with inter-object communication protocol. The encoding is much more compact and efficient to handle and, on the server side, the method call dispatching becomes simple and efficient because the identifier can be used as an index in a method pointer table. Encoding the method identifier as a cardinality was an obvious solution since small values will be the norm.

- A channel can host multiple concurrent client-service connections.  These bindings are identified by a unique number, starting with 1 and incrementing for each new binding. We can expect that most frequent binding identification will have a small value and thus the use of cardinality encoding imposed itself.

The progress may be slow but the benefit is that taking the time to think and explore the possible solutions for every choice yields a better thought out product.

0 Comments

Climbing the Mountain [Paul Buchheit]

4/2/2008

0 Comments

 

Paul Buchheit provides additional thoughts on what makes a startup successful. A good team ? A good idea ? A good execution ? Which one is more important, which is less ? He compares it to climbing a mountain with a gold pot at the top. A good analogy.

0 Comments

How to correctly define a standard...

3/19/2008

0 Comments

 

The "Martian headset" is a long but very interesting article on software standards published on the Joel on Software blog.

I've learned that it is not enough to publish a standard specification document. At least one reference implementation is required. Java did this with its compiler and managed by that to ensure interoperability. It could even resist attempts to break the standard.

Lesson learned !

0 Comments

Startup success fundamentals...;

3/10/2008

0 Comments

 

As a follow up to the previous blog note, I found this blog note of Paul Buchheit very enlightening.

0 Comments

The startup equation....

3/2/2008

2 Comments

 

The Drake equation provides an estimation of the probability of extraterrestrial life existence.  Note that XKCD has a joke about it here, the point being that attempting to quantify the probability is pushing the logic too far.

The value of the Drake equation is in identifying the relevant parameters so that if any of these parameter is demonstrated to be zero, then extraterrestrial life can be for sure excluded. If it is not null, there is a very high probability that we are not alone in the universe even if  the probability is extremely small. See my note on Murphy's Law to understand why.

Transposing this to startups, it would be very tempting to elaborate an equivalent equation to estimate a startup success probability.

Following Y Combinator news, I found today an interesting post that summarizes the fundamental questions provided in the marketing book "Selling the Invisible" by Beckwith helping to identify the profile of a startup. A book I will surely read.

My impression so far is that these questions are identifying the key parameters of a possible Startup equation, the equivalent of the Drake equation. I refactored them here below:

1. who are you ?
2. what do you do ?
3. who do you serve ?
4. who are your competitors ?
5. what makes you the best ?
6. how do you monetize ?

There are probably other parameters. Though I think these are the most important one because they tell you if the business is sound. Take care to examine these questions with a very broad view and open mind. Be also very accurate and objective.

For instance, regarding monetizing, I would strongly recommend reading the article "Free! Why $0.00 Is the Future of Business" of Wired to make sure you address this question with a modern view. As I understand it, the fundamental logic here is to identify indirect monetizing sources.

When answering the question "what do you do" make sure to explore beyond the basic facts. If your activity is to provide a service, you are also involved in a trust relationship with your users. What do you do on this aspect ? Don't miss these because they could be the "killing" difference with the competitors.

Here is a short anecdote on this last point. I'm buying my computer stuff through Internet from one big French company. One day, there was a problem and I contacted them to resolve the issue. At one point during the email discussion I said that what they sell was in fact trust and not computer goods.

I recognize that I pushed it a bit far but apparently it seeded some thoughts because in the next month they added various insurance offers for buyers. A kind of premium fee for priority help in case of trouble. You know, something that turns a client into a VIP. So they basically added a panel of virtual anti-anxiety pills to their catalog !

Keep thus a very broad view on what and how to monetize.

2 Comments
<<Previous
Forward>>

    Author

    Christophe Meessen is a  computer science engineer working in France.

    Any suggestions to make DIS more useful ? Tell me by using the contact page.

    Categories

    All
    Business Model
    Database
    Dis
    Ditp
    Dvcs
    Git
    Gob
    Idr
    Misc
    Murphys Law
    Programming Language
    Progress Status
    Startup
    Suggested Reading
    Web Site

    Archives

    December 2017
    November 2015
    September 2015
    February 2013
    December 2012
    November 2012
    May 2012
    February 2012
    March 2010
    October 2009
    September 2009
    July 2009
    June 2009
    May 2009
    February 2009
    January 2009
    November 2008
    September 2008
    August 2008
    July 2008
    May 2008
    April 2008
    March 2008
    February 2008
    January 2008
    December 2007
    October 2007
    August 2007
    July 2007
    June 2007
    May 2007

    RSS Feed

    Live traffic feed
    You have no departures or arrivals yet. Wait a few minutes and check again.
    Powered by FEEDJIT
Powered by Create your own unique website with customizable templates.