Benford’s Law

Numbers are all around us and with numbers come number patterns. And when we research number patterns, we come across something very interesting. Also known as the Newcomb–Benford law, the law of anomalous numbers or the first-digit law, Benford’s Law is a statistical statement about the occurrence of digits in lists of data and is an observation that in many real-life sets of numerical data, the leading digit is likely to be small.

According to the law, in sets that obey the law, the number 1 appears as the leading significant digit about 30% of the time, while 9 appears as the leading significant digit less than 5% of the time. If the digits were distributed uniformly, they would each occur about 11.1 % of the time. Benford’s Law also makes predictions about the distribution of second digits, third digits, digit combinations, and so on. The law is named after physicist Frank Benford, who stated it in 1938 in a paper titled “The Law of Anomalous Numbers”, although it had been previously stated by Simon Newcomb in 1881 and is similar in concept, though not identical in distribution, to the Zipf’s law. So according to Benford’s Law, the finding that the first digits or numerals to be exact of the numbers found in series of records of the most varied sources do not display a uniform distribution, but rather are arranged in such a way that the digit “1” is the most frequent, followed by “2”, “3”, and so in a successively decreasing manner down to “9”

The discovery of Benford’s law goes back to 1881 when the Canadian-American astronomer Simon Newcomb noticed that in logarithm tables the earlier pages that started with 1 were much more worn than the other pages. Newcomb’s published result is the first known instance of this observation and includes distribution on the second digit, as well. Newcomb proposed a law that the probability of a single number N being the first digit of a number was equal to log(N + 1) − log(N). The phenomenon was again noted in 1938 by the physicist Frank Benford, who tested it on data from 20 different domains and was credited for it. Benford’s data set included the surface areas of 335 rivers, the sizes of 3259 US populations, 104 physical constants, 1800 molecular weights, 5000 entries from a mathematical handbook, 308 numbers contained in an issue of Reader’s Digest, the street addresses of the first 342 persons listed in American Men of Science and 418 death rates. The total number of observations used in the paper was 20,229.

It has been shown that this result applies to a wide variety of data sets, including electricity bills, street addresses, stock prices, house prices, population numbers, death rates, lengths of rivers, and physical and mathematical constants. Like other general principles about natural data – for example, the fact that many data sets are well approximated by a normal distribution — some illustrative examples and explanations cover many of the cases where Benford’s law applies, though there are many other cases where Benford’s law applies that resist a simple explanation. It tends to be most accurate when values are distributed across multiple orders of magnitude, especially if the process of generating the numbers is described by a power-law, which is common in nature.

Benford’s law tends to apply most accurately to data that span several orders of magnitude. As a rule of thumb, the more orders of magnitude that the data evenly covers, the more accurately Benford’s law applies. For instance, one can expect that Benford’s law would apply to a list of numbers representing the populations of UK settlements. But if a settlement is defined as a village with a population between 300 and 999, then Benford’s law will not apply.

In general, it has been seen a series of numerical records follows Benford’s Law when they
represents magnitudes of events or events, such as populations of cities, flows of water in rivers or sizes of celestial bodies; do not have pre-established minimum or maximum limits; are not made up of numbers used as identifiers, such as identity or social security numbers, bank accounts, telephone numbers; and have a mean which is less than the median, and the data is not concentrated around the mean

This law can be utilised to detect patterns or the lack thereof in naturally occurring datasets. This can lead to important applications in data science such as catching anomalies or fraud detection. It’s expected that a large set of numbers will follow the law, so accountants, auditors, economists and tax professionals have a benchmark what the normal levels of any particular number in a set are.

In the latter half of the 1990s, accountant Mark Nigrini found that Benford’s law can be an effective red-flag test for fabricated tax returns; True tax data usually follows Benford’s law, whereas made-up returns do not. Ponzi schemes can be detected using the law. Unrealistic returns, such as those purported by the Maddoff scam, fall far from the expected Benford probability distribution.

In 1972, Hal Varian suggested that the law could be used to detect possible fraud in lists of socio-economic data submitted in support of public planning decisions. Based on the plausible assumption that people who fabricate figures tend to distribute their digits fairly uniformly, a simple comparison of first-digit frequency distribution from the data with the expected distribution according to Benford’s law ought to show up any anomalous results. In the United States, evidence-based on Benford’s law has been admitted in criminal cases at the federal, state, and local levels.

Walter Mebane, a political scientist and statistician at the University of Michigan, was the first to apply the second-digit Benford’s law-test (2BL-test) in election forensics. Such analyses are considered a simple, though not foolproof, method of identifying irregularities in election results and helping to detect electoral fraud. Benford’s law has been used as evidence of fraud in the 2009 Iranian elections. An analysis by Mebane found that the second digits in vote counts for President Mahmoud Ahmadinejad, the winner of the election, tended to differ significantly from the expectations of Benford’s Law and that the ballot boxes with very few invalid ballots had a greater influence on the results, suggesting widespread ballot stuffing. Another study used bootstrap simulations to find that the candidate Mehdi Karroubi received almost twice as many vote counts beginning with the digit 7 as would be expected according to Benford’s law, while analysis from Columbia University concluded that the probability that a fair election would produce both too few non-adjacent digits and the suspicious deviations in last-digit frequencies as found in the 2009 Iranian presidential election is less than 0.5%. Benford’s Law has also been applied for forensic auditing and fraud detection on data from the 2003 California gubernatorial election, the 2000 and 2004 United States presidential elections, and the 2009 German federal election.

Benford’s law has also been misapplied to claim election fraud. When applying the law to Joe Biden’s election returns for Chicago, Milwaukee, and other localities in the 2020 United States presidential election, the distribution of the first digit did not follow Benford’s law. The misapplication was a result of looking at data that was tightly bound in range, which violates the assumption inherent in Benford’s law that the range of the data is large.

Macroeconomic data the Greek government reported to the European Union before entering the eurozone was shown to be probably fraudulent using Benford’s law, albeit years after the country joined the EU. In genome data, the number of open reading frames and their relationship to genome size differs between eukaryotes and prokaryotes with the former showing a log-linear relationship and the latter a linear relationship. Benford’s law has been used to test this observation with an excellent fit to the data in both cases. The law has also been used successfully in scientific fraud detection. A test of regression coefficients in published papers showed agreement with Benford’s law. As a comparison group subjects were asked to fabricate statistical estimates. The fabricated results conformed to Benford’s law on first digits but failed to obey Benford’s law on second digits.

So if you want to test Benford’s Law yourself, it’s very simple. Just pick up a random book or magazine and list or sort the numbers. You will find about 30% of the numbers collected from any issue will start with the number 1. Let me know in the comments section if the law fit in your experiment. 

2022 Week 08 Update

It’s been a week now since R has gone back to India and I am slowly getting into a routine of managing the housework, cooking, work and writing. But in all this, my meditation and walking seem to have suffered the most. This month, I have not walked as much as I usually do, so I am sure my steps this month will be highly impacted. But next week is a new month and the chance to redo things, so that will be my restart for my steps and my meditation.

My parents are now more or less settled in their new home and say they are far busier here than they were in Mumbai. With the stress of cooking and cleaning taken care of, they can focus more on making my father better. Also, our previous flat was on the third floor of a walk-up building and so my father, who has Parkinson’s, was unable to walk which is now easily possible because they are on the second floor of a building which has a lift. They have a nurse on-site and any potential medical problems will be easily dealt with. This is such a huge relief for my sister and me.

Singapore is firmly in the grip of the Omicron wave and all we can do is stay masked and sanitise our hands whenever we go out. But if fingers crossed, we do get infected, we know now that since we are all triple vaccinated, we will not be impacted so much. It will be, hopefully, as bad as a bad cold, and we will come out of this healthy.

Today’s quote comes from the American talk show host, television producer, actress, author, and philanthropist, Oprah Winfrey who is best known for her talk show, The Oprah Winfrey Show. She says the greatest lesson of life is that you are responsible for your life. When we think about it, this is so true and resonates with me. I know I am responsible for everything that happens, both good and bad, and so I need to work to make sure that more good things are happening in my life as opposed to the bad.

So, signing off this week with the usual caution of staying safe, staying masked and getting vaccinated if you are not already.

In My Hands Today…

The Sense of Style: The Thinking Person’s Guide to Writing in the 21st Century – Steven Pinker

Why is so much writing so bad, and how can we make it better? Is the English language being corrupted by texting and social media? Do the kids today even care about good writing? Why should any of us care?

In The Sense of Style, the bestselling linguist and cognitive scientist Steven Pinker answers these questions and more. Rethinking the usage guide for the twenty-first century, Pinker doesn’t carp about the decline of language or recycle pet peeves from the rulebooks of a century ago. Instead, he applies insights from the sciences of language and mind to the challenge of crafting clear, coherent, and stylish prose.

In this short, cheerful, and eminently practical book, Pinker shows how writing depends on imagination, empathy, coherence, grammatical knowhow, and an ability to savor and reverse engineer the good prose of others. He replaces dogma about usage with reason and evidence, allowing writers and editors to apply the guidelines judiciously, rather than robotically, being mindful of what they are designed to accomplish.

Filled with examples of great and gruesome prose, Pinker shows us how the art of writing can be a form of pleasurable mastery and a fascinating intellectual topic in its own right.

Friday Pinterest Fun

One of my favourite books for BB & GG was Dr Seuss. Theodor Seuss Geisel was an American children’s author, political cartoonist, illustrator, poet, animator, and filmmaker known for his work writing and illustrating more than 60 books under the pen name Dr. Seuss. His work includes many of the most popular children’s books of all time, selling over 600 million copies and being translated into more than 20 languages by the time of his death.

Here are some quotes from Dr Seuss to think about this weekend!

In My Hands Today…

The Living Road – Ajit Harisinghani

A solo motorcycle ride across India, and into Bhutan, becomes much more than just a test of physical endurance when 57-year-old, Pune-based, speech therapist Ajit Harisinghani decides to go in the pursuit of that most elusive of all human desires – Happiness.

With the idea of Bhutan’s Gross National Happiness on his mind, he traverses a potpourri of terrain; riding through landscapes that change daily. From arid land to verdant fields, from jungles with glimpses of elephants and tigers to tea gardens…

Along the way, he meets a yogi and his singing goat, explores ancient caves, is frightened in a wild life sanctuary, sees a schizophrenic bicycle and helps a police inspector overcome his stammering problem. A variety of experiences later, he is finally in Thimpu where a Buddhist monk reveals the road-map to being happy.