Blog Posts

The 8 fallacies of distributed computing

10/17/2009

The following two paragraphs are the introductory paragraphs of the document Fallacies of distributed computing (pdf) by Arnon Rotem-Gal-Oz that presents the 8 fallacies of distributed computed.

"Distributed systems already exist for a long tThe software industry has been writing distributed systems for several decades. Two examples include The US Department of Defense ARPANET (which eventually evolved into the Internet) which was established back in 1969 and the SWIFT protocol (used for money transfers) was also established in the same time frame [Britton2001].

Nevertheless, In 1994, Peter Deutsch, a sun fellow at the time, drafted 7 assumptions architects and designers of distributed systems are likely to make, which prove wrong in the long run - resulting in all sorts of troubles and pains for the solution and architects who made the assumptions. In 1997 James Gosling added another such fallacy [JDJ2004]. The assumptions are now collectively known as the "The 8 fallacies of distributed computing" [Gosling]:

The network is reliable
Latency is zero
Bandwidth is infinite
The network is secure
Topology doesn't change
There is one administrator
Transport cost is zero
The network is homogeneous

..."

While in the process of designing a new distributed information system, it a good idea to check how it position itself regarding these 8 fallacies.

The network is reliable

DIS uses TCP which was designed to be reliable and robust. Reliable means that data is transmitted uncorrupted to the other end and robust means that it may resist to a certain amount of errors. There is however a limit to the robustness of a TCP connection, and in some conditions connection to a remote service may even not be possible.

DITP, the communication protocol of DIS, is of course designed to handle connection failures. Higher level and distributed services will have to take it in account too.

Making a distribute information system robust implies to anticipate connection failures at any stage of the communication. For instance, a flock of servers designed to synchronize with each other may suddenly be partitioned in two or more unconnected flocks because of a network failure, and be connected back together later.

The latency is zero

Latency was a major focus in the design of the DITP protocol because DIS is intended to be used for World Area Network (WAN) applications. DITP reduces latency impact by supporting asynchronous requests. These requests are batched and processes sequentially by the server in the order of emission. If a request in the batch is aborted by an exception, subsequent requests of the batch are ignored. This provides a fundamental functionality to support transactional applications.

In addition to this, DIS may also support the ability to send code to be executed by a remote service. This provides the same functionality as JavaScript code embedded in web pages and executed by browsers, allowing to implement powerful and impressive web 2.0 applications.

With DIS, remote code execution is taken care by services made available by the server manager if he wants to support them. The services may then process different types of pseudo-codes: JavaScript, Haxe, JVM, Python, ... Many different pseudo-codes services may then coexist and evolve independently of DIS. Such functionality is of course also exposed to security issues. See the secure network fallacy for an insight on how DIS addresses it.

Bandwidth is infinite

This fallacy is the rational of the Information Data Representation (IDR) design. It uses binary and native data representation. In addition to be very fast and easy to Marshall, it is also very compact.

DITP supports also user defined processing of transmitted data so that compression algorithms may be applied to them. DITP is also multiplexing concurrent communication channels in the same connections, allowing to apply different transmitted data processing to each channel. By choosing the channel the user may decide to compress transmitted data or not.

The network is secure

A distributed system designed for a world wide usage must obviously take security in account. This means securing the transmitted data by mean of authentication and cyphering, as well as authenticating communicating parties and enforce access or action restriction rules.

Communication security is provided by the DITP protocol by mean of the user specified transmitted data processing. As data compression, these can also handle data authentication and cyphering. Different authentication and cyphering methods and algorithms can coexist in DIS and may evolve independently of the DITP protocol.

Authentication and access control may use conventional passwords methods as well as user identification certificates. But instead of using x509 certificates, DIS uses IDR encoded certificates corresponding to instances of certificate classes. Users may then derive their own certificates with class inheritance. They may extend the information carried in the certificate or combine different certificate types together.

An authentication based on password checking or user identity certificate matching doesn't scale well for a world wide distributed system because they need to access a reference database. With distributed services, accessing a remote database introduces latencies and replicating it (i.e. caches) weakens its security by multiplying the number breach points.

The authentication mechanism favored in DIS uses member certificates. These certificates are like club or company member access cards. When trying to access a service, the user present the corresponding certificate and the service needs simply to check the certificate validity.

With such authentication mechanism, the service can be scattered all over the Internet and remain lightweight as is required for embedded applications (i.e. smart phones, car computers, ...). The authentication domain can also handle billions of members as well and easily as a few ones. Member certificates may be extended to carry specific informations and connection parameters.

Topology doesn't change

The ability to handle network topology changes initiated the conception of DIS in 1992. It is thus designed from the start to address this issue in a simple, robust and efficient way. It is not a coincidence that the DIS acronym resembles the one of DNS. DIS is a distributed information system as the DNS is a distributed naming system. DIS uses the proven architecture of the DNS and applies it to generic information with additional functionalities like allowing to remotely manage the information. The DNS is known to be a corner stone of the network topology change solution, as will be DIS.

There is one administrator

As the DNS, DIS supports a distributed administration. Information domain administrator have full liberty and authority in the way they organize and manage their information domain as long as the interface to DIS respects some standard rules. As for the DNS, there will be a central administration that defines the operational rules and control their application. If DIS becomes a broadly adopted system, the central administration will be composed of members elected democratically and coordinated with the Internet governance administration if such structures happens to be created.

Transport cost is zero

The transport cost is indeed not zero but most of it is distributed and shared by the users. There remains however a residual cost for the central services and administration for which a revenue has to be identified. The DIS system will allow to obtain such a revenue and there is a rational reason why it ought to.

Imposing a financial cost to some domains or features of DIS which are limited or artificially limited resources provides a mean to apply a perceptible pressure on its misbehaving users (i.e. spam).

The network is homogeneous

DITP is designed to support different types of underlying transport connections. The information published in DIS is treated like an opaque byte block and may be of any type as well as its description language. It may be XML with its DTD description, binary with C like description syntax, python pickles or anything else. Of course it will also contain IDR encoded information with its Information Type Description.

Conclusion

The conclusion is that DIS, DITP and IDR have been designed without falling on any of the common fallacies. This is partly due to the long maturation process of its conception. While this may be considered as a shortcoming, it may also be its strength since it allowed to examine all aspects wisely with time.

4 Comments

A Distributed Information System ? Nice, but what for ?

9/5/2009

0 Comments

Here is a (long) blog note I would recommend reading : "Snakes on the web" written by Jackob Kaplan-Moss (September 4, 2009). It is a talk given at PyCon Argentina and PyCon Brazil, 2009.

It presents an analysis on the current situation of web edition and desirable future system properties.

My impression, and this is not a coincidence, is that DIS matches most of these requirements since it was designed to address the short comings of the actual systems.

0 Comments

DITP and The black triangle

7/11/2009

1 Comment

A hacker news submission references the "The black triangle" blog note. I can only backup the author since I have experienced this many time.

For short, with some programs the visible part of it is merely just a black triangle while the invisible part may be complex or required a lot of efforts to achieve. The black triangle is then generally just a simple visual example to prove that the underlying system works.

That is the state of progress of DITP. I'm working to get the black triangle to become visible. In doing so I'm also writing the protocol specification so that the protocol may be reviewed and implemented by third parties in other languages or libraries.

The black triangle is like the first fruits of a fruiterer tree that may, sometime, took a long time to grow up to the point to be able to produce fruits.

1 Comment

"A note on distributed computing" (1994)

7/7/2009

1 Comment

"A note on distributed computing"

Jim Waldo, Geoff Wyant, Ann Wollrath, Sam Kendall. Nov 1994.

Abstract:

We argue that objects that interact in a distributed system need to be dealt with in ways that are intrinsically different from objects that interact in a single address space. These differences are required because distributed systems require that the programmer be aware of latency, have a different model of memory access, and take into account issues of concurrency and partial failure.

We look at a number of distributed systems that have attempted to paper over the distinction between local and remote objects, and show that such systems fail to support basic requirements of robustness and reliability. These failures have been masked in the past by the small size of the distributed systems that have been built. In the enterprise-wide distributed systems foreseen in the near future, however, such a masking will be impossible.

We conclude by discussing what is required of both systems-level and application-level programmers and designers if one is to take distribution seriously.

1 Comment

What is wrong with HTTP ?

6/25/2009

0 Comments

Here is a document presenting a review on what is good an bad with HTTP. It provides some light on the choices I made for DIS. I couldn't identify the author's name in the text. Sorry.

What is wrong with HTTP ?

In this essay, the first of a pair on browser apps, I explore how they are better than traditional desktop apps in some ways, but worse in others. Some of the disadvantages of browser apps are deeply rooted in the use of HTTP URLs for naming. In the second essay, I will present a design sketch for a new platform, are placement for HTTP combining both styles' advantages.Right now, we're seeing a massive shift to browser apps, largely server-side browser apps. As I warned in "People, places, things,and ideas," [18] this move to server-side browser apps imperils our software freedom; I outlined how to solve this problem in "The equivalent of free software for online services." [19] This pair of essays represents more detail on this problem and proposed solution.

read more ...

0 Comments

Median value selection algorithm

6/9/2009

6 Comments

At work I'm currently working on tomographic reconstruction algorithms. I have to implement a Bayesian iterative algorithm that requires to select the median value in a set of the 27 float values from a cube of 3x3x3 voxels. This operation must be performed for each voxel and for each iteration. We have to expect 256 million voxels to process for each iteration, but "only" 60 to 100 iterations.

Trying to find the most efficient algorithm I came up with a new algorithm considering the one described on the select algorithm page of wikipedia. I then submitted a question on StackOverflow to get some feedback. And I did get valuable feedback. I was first pointed to the C++ nth_element function I didn't know at the time. It was also suggest to optimize by sharing intermediate information which is indeed a smart thing to do to get a better initial guess on the median value.

After multiple tests and changes to the code I finally reached what seem to be a very efficient algorithm. The ratio is so good that I'm still unsure about it, but I checked everything. It could be due to memory cache or particularly favorable parallelization opportunities with sse instructions. I don't know.

Here is the outline of the algorithm. We have 27 values and have to find the value that split the set in two with 13 values smaller or equal to it and 13 values bigger or equal to it. This value is called the median value.

The fundamental idea of the algorithm is to use a heap data structure which has its smallest or biggest value at the top. Such data structure is very efficient for adding an element or to extract its top most element. It can also be very easily mapped into an array.

The algorithm uses two heaps initially empty, with a common top value which is the median value. Each heap has a capacity of at most 14 elements, where one is common to the two heaps, their top value and also the median value.
The algorithm proceed in two phases. In the first phase, the algorithm picks a value as initial median value guess. Subsequent values are then compare with this median value and added to the corresponding heap until one heap becomes full and contains 14 elements.

At this point the median value is a value in the full heap or in the remaining set of values to process. The second phase of the algorithm then starts where the remaining values are processed. Values that would not be inserted in the full heap are ignored. The other values are inserted in the full heap after deleting its top most value. The heap then gets a new top most value and thus also a new median value. When all the remaining elements have been processed, the median of the 27 value set is the top most value of the full heap.

Here are some benchmark results. See StackOverflow for more detailed information.

HeapSort    :2.287
QuickSort     :2.297
QuickMedian1 :0.967
HeapMedian1 :0.858
NthElement     :0.616
QuickMedian2 :1.178
HeapMedian2 :0.597
HeapMedian3 :0.015 <-- best

It thus seem that HeapMedian3 is 33 times faster than NthElement. I used a 3GHz Intel E8400 processor and the Intel C++ compiler with options -03 and -xS for benchmarking.

Here is the code :

EDIT: The code has been removed because it was invalid. I tested it on a single random float value sequence were it gave a valid result by coincidence. A new blog post provides the correct code.

6 Comments

50 scientifically proven ways to be persuasive

5/17/2009

1 Comment

When sketching out your business model or marketing strategy, read the following blog note or referenced book. There are easy ways to increase your efficiency.

Yes! 50 Scientifically Proven Ways to Be Persuasive.

1 Comment

Object deserialization handling

2/7/2009

0 Comments

In the last month I rewrote the IDR prototype from scratch and translated the IDR specification document in English. During this process I made a few enhancements in the IDR encoding. I removed an ambiguity with exceptions decoding in some very unlikely situations. The other change was to integrate the update of IEEE 754 specification in 2008 that now defines four types of floating point values, 2 Bytes, 4 Bytes, 8 Bytes and 16 Bytes. It may take some time until these types reach your desk, but IDR should better stick to the standards. So these will be the floating point encodings supported by IDR.

Beside these, there was a much bigger problem left in the API of object deserialization. The problem is to determine what to do when the decoder doesn't recognize the class type of a serialized object. The solution I came up is very satisfying since it matches all the requirements I had. It remains to check its usage convenience with real examples.

The problem

Object deserialization is a process in which the decoder reconstruct the serialized object aggregate. To do so it has to reconstruct each object of the aggregate and restore their pointers to each other. Objects are reconstructed by using object factories, a classic in design pattern. An object factory is an object that "knows" how to reconstruct some types of objects.

The decoder has thus a collection of factories to which it delegates the reconstruction of the different types of objects found in the serialized aggregate. But what happens if the decoder can't find an appropriate factory for some type of serialized object ? In some use case this should be considered as an error, but in others it might be an acceptable and even desirable situation.

Consider for instance exceptions. In IDR an exception is an object and handled as such. There is no point, and even impossible, for a decoder to have a factory for all possible exceptions in the world. It is enough for the decoder to have a factory for the common exception base classes and, of course, the one it has to deal with. It should then be enough to reconstruct the object as an instance of the parent class it has a factory for, a process called object slicing.

The worst case is when the decoder may not even slice the object because none of the parent class is "known" by the decoder. In this case the best the decoder can do is to ignore the object and set all references to it to NULL. We'll call this process object pruning. As for slicing, it may be considered as an acceptable and even desirable behavior with some use cases (i.e optional properties), and an error in others since the lobotomized data structured may end up too crippled or even invalid.

The problem is thus to define the appropriate behaviour of the decoder when a slicing or pruning occurs. In some case it is an error, in others not, and in some case it depends on what part of the aggregate the slicing or pruning took place.

The solution

The decision whether it is an error or not is obviously context specific and have thus to be put in the hands of the user. So the problem boiled down to determine how the user would able to select the appropriate behavior.

The solution I came up was to provide three object deserialization methods.

1. A strict object decoder that would throw an exception and abort object decoding as soon as a missing object factory is detected. With this you get an exact reconstruction or a failure.

2. A lax object decoder that would slice and prune at will and return whatever comes out of it, and nothing else. This object decoder would for instance be used for exceptions.

3. Another lax object decoder, like the previous one, but that would also return a feedback on the missing object factories. The feedback on slicing would be an associative index mapping sliced object references to the list of their unrecognized classs types. The feedback on pruning would be a list of the different types of pruned object with a list of the unrecognized class type and the number of instance pruned.

The later method would make it possible and easy for the user to determine if slicing and pruning occurred, what are the missing factories and test for specific objects if slicing took place and to what extend. Since this method would give an easy way to test if slicing or pruning took place, the strict object decoder may seem unnecessary. The reason of its presence is that it may stop the decoding process as soon as a missing factory is detected and thus avoid wasting resources when an exact reconstruction is required and no feedback is needed.

I'm very satisfied by this solution because it keeps the API simple with only a small effort on the decoder implementation. What I still need to validate is how convenient it is to use.

0 Comments

StackOverflow: general purpose binary protocols

1/10/2009

0 Comments

I just found a relevant question on StackOverflow asking for a good general purpose binary protocol. If you are interested in distributed information systems and their protocols, this topic might be a good read.

0 Comments

DIS development roadmap

11/12/2008

0 Comments

The following figure shows the kernel components of the Distributed Information System, the road map and how far I am today. The items in black are implemented and operational and the items in gray still needs to be implemented. Progress is going clockwise :).

OID An OID is to DIS what the URL is to the web. It is a unique, binary encoded and non reusable reference to an information published in the distributed information system. It was the first tile I designed and implemented. Its simplicity is inversely proportional to the time and effort required to invent it because I had to explore and compare many different possible and existing solutions.

IDR It is to DIS what HTML or XML is to the web. IDR is the Information Data Representation used in DIS. It is a stream oriented encoding with support of object serialization and exceptions. The prototype implementation is currently being fully rewritten. It still miss the ability to specify the encoding version or a formalization of data description. The later is required to be able to display data in a human readable format or to automatically generate data manipulation functions or containers mapped to different programming languages.

DITP It is to DIS what HTTP is to the web. It is the protocol used to exchange information or invoke remote actions in DIS. It is very simple, modular and extensible through the use of dynamically configurable data processing tasks. Support of compression, authentication or encryption is then provided by some kinds of plugins. The protocol use the object oriented model with remote method invocation. The current prototype does not yet support concurrent asynchronous method invocation.

DIS DIS stands here for Distributed Information Service and is not to be confused with Distributed Information System. It is fundamental to DIS, so a confusion is not really a problem. This service combines the properties of DNS and LDAP and would be a new kind of service on the Internet. I can't disclose more on it because it is still in development. A first prototype has been implemented unfortunately proving the need to support data description.

SEC This part covers authentication and access control in DIS. It requires a functional DIS service. An interesting feature is that it is designed to scale up so that a service could cope with millions of different users without having to keep track of million accounts and passwords.

IDX It is a service simply mapping human readable UTF8 strings to OID references. It is equivalent to the list of named entries in a directory. Like any other services, its access is controlled by ACL and can thus be modified remotely with appropriate privileges. An index may be huge with multiple alternate entry point, exactly like the DNS but exclusively as a flat name space. The OID associated to the UTF8 string is stored in an object so that polymorphism allow to associate images (icons) and other informations to entries by extension.

DIR It is a graph of IDX services with one root entry. Services or information published in DIS can then be referenced by a humanly readable path in the IDX graph relative to the root.

It is an ambitious project but, I am convinced, its added value is worth the effort. I wish I could work full time on this project with the help of some other developers, but this would require funding I don't have access to for now.

An application would help demonstrating the added value of the system. I'm still looking for one with an optimal balance in development effort and success potential.

0 Comments

<<Previous

Forward>>

The 8 fallacies of distributed computing

A Distributed Information System ? Nice, but what for ?

DITP and The black triangle

"A note on distributed computing" (1994)

What is wrong with HTTP ?

Median value selection algorithm

50 scientifically proven ways to be persuasive

Object deserialization handling

StackOverflow: general purpose binary protocols

DIS development roadmap

Author

Categories

Archives