Benford Goes to the Casino

By Henk Tijms - Photo by Wikimedia Commons. Canadian-born American astronomer, mathematician and economist Simon Newcomb (1835-1909), photo taken between 1905 and 1909. - September 13, 2018 - , -
print this

You’re strolling past a casino and you notice an eye-grabbing sign:

The Multiplication Game, Carpe Diem!

You’re curious, so you go into the casino to find out what the rules of the game are. The game is played at a table between a player and a casino employee. The employee presses a button to generate a slip of paper printed with a four-digit integer on its reverse side. The player may not see this integer until he chooses one for himself. The player must choose a positive integer having as many digits as the player wishes. The player’s integer is then multiplied together with the four-digit casino integer. The player wins if the product of the two integers begins with a 4, 5, 6, 7, 8 or 9; otherwise, the casino wins. If the player wins, he gets 2.45 dollars for every dollar staked. This sounds tempting, seems too good to be true. Before deciding whether or not to play this game, you go off to calculate for yourself the possible products of two four-digit integers, from 1000 to 9999, in case you should play using a four-digit integer. Your computer program alerts you to the fact that 43.0% of the products begin with 4, 5, 6, 7, 8 or 9. This would mean that, in the long run, you will win on average 0.43 x 2.45−1 = 0.0535 dollar for every dollar staked, giving you on average a winning margin of slightly more than 5% for any four-digit integer chosen. It certainly looks as though the casino has blundered, but hey, that is not your problem. You hightail it to the nearest casino to claim your winnings. After playing the game a great number of times, however, you are baffled to find yourself on the losing end of things. How could this happen? Your calculations were correct, and indicated a comfortable winning margin, but there you are, losing. Unfortunately, this is, in fact, the expected outcome. No matter what strategy the player uses, in the long run the casino wins at least 60.1% of the games. The trick is that the casino uses randomization in generating its four-digit integer. Randomization is a technique that has many applications in mathematics and computer science. For every interaction, i.e., every time the game is played, the random-number generator on the casino’s computer picks a randomly chosen number u between 0 and 1. It uses this number to calculate a number a = 10^u and prints out the largest four-digit integer below 10^3 × a the casino’s slip of paper, generated at the start of the game.

In this way, the casino guarantees itself winning odds of at least log_{10}(4)10^{-3} = 0.60106, no matter what strategy the player applies. This truly surprising outcome was discovered by American mathematician Kent Morrison.

Elementary probability is all you need to shed light on the result. Define the random variable A as A = 10^U, where U is a randomly chosen number between 0 and 1. Then, P(A < a) = P(U < log_{10}(a)) and so P(A < a) = log_{10}(a) for 1 < a < 10. Therefore

    \[$P$ $(k \leq A < k + 1)$ = P$(A < k + 1)$ - P$(A < k)$\]

    \[$= log$_{10}(k + 1)$ - log$_{10}(k)$ = log$_{10}$ $(1 + 1/k)$\]


That’s exactly what we’re looking for. The probability distribution

    \[$log$_{10} (1 + \frac 1k)$ for $k = 1, . . . , 9$\]

is the famous Benford distribution. This is precisely the probability distribution of the discrete random variable B that is defined as the value of random variable A, rounded down. Taking the product of a Benford-distributed random variable B and a positive, integer-valued random variable S, it can be proved that the first digit of the product will also exhibit the Benford distribution when B and S are independent of each other. This describes what is happening in the casino game, in which the random variable B corresponds to the casino’s four-digit integer and the random variable S corresponds to the integer chosen by the player. Keeping in mind that the Benford distribution assigns a total probability mass

    \[\sum_{k=1}^{3}\log_{10}(1 +\frac{1}{k})=0.60206\]

to the numbers 1, 2, and 3, it is no longer surprising that the player is at a disadvantage.

The Benford distribution is named after Frank Benford, an American physicist who published an article in 1938 in which he demonstrated empirically that the first nonzero digit in many types of data (lengths of rivers, metropolitan populations, universal constants in the fields of physics and chemistry, numbers appearing in front page newspaper articles, etc.) approximately follows a logarithmic distribution. In fact, a similar empirical result had already been noted by renowned astronomer Simon Newcomb. In 1881, Newcomb published a short article in which he observed that the initial pages of reference books containing logarithmic tables were far more worn and dog-eared than the later pages. He found that integers beginning with a 1 were looked up more often than integers beginning with 2, integers beginning with 2 were looked up more often than integers beginning with 3, etc. For digits 1 through 9, Newcomb found the relative frequencies to be

30.1%, 17.6%, 12.5%, 9.7%, 7.9%, 6.7%, 5.8%, 5.1%, 4.6%

which is consistent with the mathematical formula log_{10} (1 + 1/k) for k = 1, ...,9.

This result was more or less forgotten until Benford published his article in 1938, in which he presented a mountain of empirical evidence supporting the logarithmic law. Benford’s article received more attention than it otherwise might have done, partly due to a lucky placement in the journal that published it – right after the much-cited article of a famous physicist. This led to the re-discovered result being dubbed Benford’s law rather than Newcomb’s law. Benford’s law has the remarkable characteristic of being scale invariant: if a data set conforms to Benford’s law, then it does so regardless of the physical unit in which the data are expressed. Whether river lengths are measured in kilometres or miles, or stock options are expressed in dollars or euros, it makes no difference for Benford’s law. That said, there are some types of data sets that do not conform to Benford’s law. Take, for example, Olympic 400–meter time trials. This data set will not include many qualifying times that begin with a 1! But if you collect the numbers appearing in front-page newspaper articles, you will find that Benford’s law does more or less apply. This goes against intuition. You would expect that, in a randomly formed data set, each of the digits 1 through 9 would appear as the first digit with the same frequency, but this is, surprisingly, not the case. A satisfying mathematical explanation for Benford’s law was a long time coming. In 1996, the American mathematician Ted Hill proved that if numbers are picked randomly from various randomly occurring number sets, the numbers from the combined sample approximately conform to Benford’s law. This is a perfect description of what happens with numbers appearing in front-page newspaper articles.
Although it may seem bizarre at first glance, the Benford’s law phenomenon has important practical applications. In particular, Benford’s law can be used for investigating financial data – income tax data, corporate expense data, corporate financial statements. Forensic accountants and taxing authorities use Benford’s law to identify possible fraud in financial transactions. Many crucial bookkeeping items, from sales numbers to tax allowances, conform to Benford’s law, and deviations from the law can be quickly identified using simple statistical controls. A deviation does not necessarily indicate fraud, but it does send up a red flag that will spur further research to determine whether or not there is a case of fraud. This application of Benford’s law was successfully applied for the first time by a District Attorney in Brooklyn, New York. He was able to identify and obtain convictions in cases against seven fraudulent companies. In more recent years, the fraudulent Ponzi scheme of Bernard Madoff – the man behind the largest case of financial fraud in U.S. history – could have been stopped earlier if the tool of Benford’s law had been used. Benford’s law can also be used to identify fraud in macroeconomic data. Economists at the IMF have applied it to gather evidence in support of a hypothesis that countries sometimes manipulate their economic data to create strategic advantages, as Greece did in the time of the European debt crisis. This is a different kettle of fish altogether from the quaint application regarding dog-eared pages in old-fashioned books of logarithm tables. Nowadays, Benford’s law has multiple statistical applications on a great many fronts.

Henk Tijms is Honorary Fellow of the Tinbergen Institute. He has written many papers on applied probability and stochastic optimization. This column is a shortened version of Chapter 5 in his book ”Surprises in Probability–Seventeen Short Stories”, Chapman and Hall/CRC Press, September 2018.

Follow TI on Twitter

"Setting foot in the workplace during a downturn doesn’t just mean you earn less – new research indicates it also affects for life what you want from a job"

Cool seeing our REStat article (w/M.Cotofan, @cassar_lea, @meier_steph) covered in The @Guardian

De prachtige @volkskrant-visualisatie van de #KansenKaart is genomineerd voor de #Tegel publieksprijs. Stemmen kan tot vandaag: Mooi werk van @Tieleman87 @xandervuffelen @serena_f @jwitteman @dutchdatadesign @GeartvanderPol @JorisHeijkant @titusknegtel