Sunday 18 January 2009

Rounding Algorithms

There are a zillion different ways to round floating point values to integers. C and C++ provide a couple basic ones in or .
There are two general categories of rounding algorithm: those that are symmetric about zero and those that are biased in some way.

Biased Rounding
A bias is a mathematical notion used in statistics to indicate that samples are not exactly representative of their true values; they are skewed in one way or another.

For example, the floor() function is biased towards negative infinity, because it always chooses the lower integer number --that is, it always chooses the number closer to negative infinity.

floor( 7.5 ) --> 7
floor( -7.5 ) --> -8


Suppose your local big city and wanted to know how fast people are driving on a particular freeway. The first step is to gather the exact speeds that individual drivers are going, and the second would be to convert all the individual values into a single value that would represent the normal rate of speed. For simplicity here, we will just use the average value.

Let us say that the equipment which samples a motorist's speed is far more accurate than the computer's ability to store it. (This is actually not uncommon.) Again, for simplicity, we will say that the computer stores the speed as integers.

With a sampling of twelve motorists, we get the following speeds (in miles per hour)

49.087 57.901 28.500 46.738 51.270 53.096
44.795 47.218 46.347 45.989 47.582 50.563

A quick calculation shows that the average speed is 47.424 mph.

If the city were to simply use the floor() function to convert these to integers, it would get

49 57 28 46 51 53
44 47 46 45 47 50

which averages to 46.916 --> 46 mph (remember, integer arithmetic!)

Either way, the sampling is off by about a whole mile per hour. I don't think that the city would actually care about a single mile per hour, but this does illustrate the bias, or tendancy of the floor() function, to make the numbers closer to negative infinity, thuse skewing the data to an inaccurate number.

This was just a simple example that came off the top of my head, but in many sciences and statistical surveys, that difference can mean quite a lot. Suppose that the Apollo missed the moon by 1%? Suppose a pharmaceutical company put too much iron in a daily vitamin pill by 1%? Suppose a construction firm miscalculated the stresses a bridge can take by 1%? In all these scenarios the results would prove deadly. One percent is a lot.


Symmetric Rounding
A special case of bias is centered about zero. Let us fix the floor() function to tend towards zero.
1
2
3
4
5
6
7
 

double floor0( double value )
  {
  if (value < 0.0)
  return ceil( value );
  else
  return floor( value );
  }


Now, the absolute value of the result will always be the same:

floor0( -7.7 ) --> -7 floor0( 7.7 ) --> 7
floor0( -7.5 ) --> -7 floor0( 7.5 ) --> 7
floor0( -7.3 ) --> -7 floor0( 7.3 ) --> 7

Enough about that.


Unbiased Rounding
So, how do we handle these biases? By adding some rule that accounts for it.

Let us apply knowledge we all learned in gradeschool: In arithmetic rounding we round up if the next digit is 5 or more, and if it less than 5 we round down. We write ourselves a little function to do just that:
1
2
3
4
 

double round( double value )
  {
  return floor( value + 0.5 );
  }


The problem is that this is still biased. We have actually reversed the bias of floor() from negative infinity to positive infinity, because we always choose to round up when exactly halfway between two values:

round( 10.3 ) --> 10
round( 10.5 ) --> 11
round( 10.7 ) --> 11

You can actually see the bias in the above table: the result tends towards 11 and away from 10.

This brings us to the trick: which way do we round when exactly halfway between two values?

One very popular method is variously called "bankers rounding", "round to even", "convergent rounding", and even "unbiased rounding", to name a few. It works by skewing the bias itself.

Given a number exactly halfway between two values, round to the even value (zero is considered even here).

round( 1.7 ) --> 2 round( 2.7 ) --> 3
round( 1.5 ) --> 2 round( 2.5 ) --> 2
round( 1.3 ) --> 1 round( 2.3 ) --> 2

For random data this is very convenient. Bankers like it because money deposited and withdrawn is random. (There are trends, mind you, but you cannot predict exactly how much will be deposited and withdrawn.) The important point is that the bankers rounding is still biased if the data is biased. It is only unbiased with random data.

One solution is called "alternate rounding". It works by simply choosing to bias up or down every other time.

round( 1.5 ) --> 2
round( 1.5 ) --> 1
round( 1.5 ) --> 2
round( 1.5 ) --> 1
etc

This is not always useful though.

The only way to eliminate all bias is to use a random bias... This, of course, is impossible to generate in your typical PC, but it still goes toward solving the problem quite nicely.

If the sample is exactly halfway between two integers, it chooses one or the other randomly.

Of course, the Achilles heel of this method is the random number generator you use. The default pseudorandom generator for C and C++ is not that great. The Mersenne Twister is by far the most popular high-quality pseudorandom number generator, but it is non-trivial to implement so I will not include it below.

Anyway, what follows is a convenient, simple library you are free to use. I'll even permit you to cannabalize it at will (since the algorithms are so obvious...)

I may update it sometime in the future if I can figure out a way around the default epsilon issue. Feel free to make suggestions for improvements.

:-)

0 comments: