Distributed Information System (DIS)
  • Home
  • The blog
  • Contact

Median value selection algorithm

6/9/2009

6 Comments

 

At work I'm currently working on tomographic reconstruction algorithms. I have to implement a Bayesian iterative algorithm that requires to select the median value in a set of the 27 float values from a cube of 3x3x3 voxels. This operation must be performed for each voxel and for each iteration. We have to expect 256 million voxels to process for each iteration, but "only" 60 to 100 iterations.

Trying to find the most efficient algorithm I came up with a new algorithm considering the one described on the select algorithm page of wikipedia. I then submitted a question on StackOverflow to get some feedback. And I did get valuable feedback. I was first pointed to the C++ nth_element function I didn't know at the time. It was also suggest to optimize by sharing intermediate information which is indeed a smart thing to do to get a better initial guess on the median value.

After multiple tests and changes to the code I finally reached what seem to be a very efficient algorithm. The ratio is so good that I'm still unsure about it, but I checked everything. It could be due to memory cache or particularly favorable parallelization opportunities with sse instructions. I don't know.

Here is the outline of the algorithm. We have 27 values and have to find the value that split the set in two with 13 values smaller or equal to it and 13 values bigger or equal to it. This value is called the median value.

The fundamental idea of the algorithm is to use a heap data structure which has its smallest or biggest value at the top. Such data structure is very efficient for adding an element or to extract its top most element. It can also be very easily mapped into an array.

The algorithm uses two heaps initially empty, with a common top value which is the median value. Each heap has a capacity of at most 14 elements, where one is common to the two heaps, their top value and also the median value.
The algorithm proceed in two phases. In the first phase, the algorithm picks a value as initial median value guess. Subsequent values are then compare with this median value and added to the corresponding heap until one heap becomes full and contains 14 elements.

At this point the median value is a value in the full heap or in the remaining set of values to process. The second phase of the algorithm then starts where the remaining values are processed. Values that would not be inserted in the full heap are ignored. The other values are inserted in the full heap after deleting its top most value. The heap then gets a new top most value and thus also a new median value. When all the remaining elements have been processed, the median of the 27 value set is the top most value of the full heap.

Here are some benchmark results. See StackOverflow for more detailed information.

HeapSort        :2.287 
QuickSort       :2.297 
QuickMedian1 :0.967 
HeapMedian1  :0.858 
NthElement     :0.616 
QuickMedian2 :1.178 
HeapMedian2  :0.597 
HeapMedian3  :0.015  <-- best

It thus seem that HeapMedian3 is 33 times faster than NthElement. I used a 3GHz Intel E8400 processor and the Intel C++ compiler with options -03 and -xS for benchmarking.

Here is the code :

EDIT: The code has been removed because it was invalid. I tested it on a single random float value sequence were it gave a valid result by coincidence. A new blog post provides the correct code.

6 Comments
Rudick link
5/16/2010 03:26:05 pm

It makes me wonder if you don't have the world's fastest guaranteed O(n*log(n)) quicksort lurking in there. I.e., if you continue to fill both heaps until the end, and use that as the pivot step in quicksort, and do all this in-place...

Reply
Christophe Meessen link
5/16/2010 10:15:29 pm

Unfortunately not. With sets bigger than 27 elements quick select is apparently much faster. Check the selected answer on StackOverflow for an even faster method.

Thank you for the comment :)

Reply
karsten
10/23/2017 01:39:43 am

I get compiler warning "variable "left" may be used before its value is set":

if( val < median )
{
// move biggest value to the heap top
unsigned char child = nLeft++, parent = (child -1)/2;
while( parent && val > left[parent] ) <<<<<<<
{
left[child] = left[parent];
child = parent;
parent =(parent -1)/2;
}
left[child] = val;

(also in else branch with right[]) and I cannot see how left[0] and right[0] are set before use.
Maybe we should add left[0]=right[0] = median; before the if ?
Karsten

Reply
Christophe Meessen
10/23/2017 08:13:24 am

OMG! The code is totally wrong. I did some checking, but I kept using the same random sample for testing which by coincidence did not reported an error. By setting the srand to time(NULL) I got a very different story. I'll publish the fixed code ASAP.

I don't know if this will fix your compiler error.

Reply
karsten
12/18/2017 12:02:29 am

Hello Christophe, nice to hear this.

Reply
Jupiter Locksmith link
7/4/2022 02:05:28 am

Gratefull for sharing this

Reply



Leave a Reply.

    Author

    Christophe Meessen is a  computer science engineer working in France.

    Any suggestions to make DIS more useful ? Tell me by using the contact page.

    Categories

    All
    Business Model
    Database
    Dis
    Ditp
    Dvcs
    Git
    Gob
    Idr
    Misc
    Murphys Law
    Programming Language
    Progress Status
    Startup
    Suggested Reading
    Web Site

    Archives

    December 2017
    November 2015
    September 2015
    February 2013
    December 2012
    November 2012
    May 2012
    February 2012
    March 2010
    October 2009
    September 2009
    July 2009
    June 2009
    May 2009
    February 2009
    January 2009
    November 2008
    September 2008
    August 2008
    July 2008
    May 2008
    April 2008
    March 2008
    February 2008
    January 2008
    December 2007
    October 2007
    August 2007
    July 2007
    June 2007
    May 2007

    RSS Feed

    Live traffic feed
    You have no departures or arrivals yet. Wait a few minutes and check again.
    Powered by FEEDJIT
Powered by Create your own unique website with customizable templates.