25 total views
Benford’s law applies to some data in the blockchain industry, such as the number of blockchain companies, the number of patents, and company financial data.
Original title: “Oke Cloud Chain Research Institute: The U.S. election is settled. Has Biden’s ballot been faked? 》
Written by: Ouke Yunlian Research Institute
The U.S. election has been reversed several times and the dust has finally settled. Biden currently defeats Trump, who has 232 votes by 290 votes, and will become the next US president. However, rumors of Biden’s election fraud are also rampant. Initially, the question of Biden’s vote fraud was based on Benford’s law, and later news of repeated counting of votes came out. Benford’s law is a law applicable to many data in the universe. Although it cannot be used as direct evidence, it is often used to detect data fraud. This article introduces Benford’s law and applies it to the blockchain industry to find data indicators that satisfy Benford’s law and examine the rationality of the data on the chain from the perspective of Benford’s law.
Benford’s Law: A Widely Available Law of Data
Widespread laws of nature
Just as Newton discovered the law of universal gravitation because of the apple landing, Simon Newcomb and Frank Benford discovered the Benford’s law by turning the logarithm table. According to Benford’s law, for many data samples, the probability of the first digit being 1 is much greater than other numbers, and the larger the number, the smaller the probability of occurrence. Specifically, for the most commonly used decimal numbers, the probability of the first digit is as follows:
Figure 1: Probability of the first digit in Benford’s law, data source: Ouke Yunlian Research Institute
Data such as population, GDP, area have been verified to comply with Benford’s law, and even the absolutely natural data such as the Fibonacci array and the half-life of radioactive elements also meet the Benford’s law. But Benford’s Law is an empirical natural law, and there is no strict proof and deduction. Generally speaking, the applicable conditions of Benford’s Law are as follows:
- The sample size and order of magnitude span are as large as possible. For example, height data with a small span is not applicable, but practice shows that it is also applicable to some smaller sample data;
- The data must not be traces of human manipulation. Artificially prescribed numbers such as telephone numbers, postal codes, etc. do not meet the Benford Law. When the data is artificially tampered with, it is very likely that it does not conform to the Benford’s law. For this reason, the Benford’s law can be used to detect data fraud;
- For data that grows exponentially over time, Benford’s Law must fit. This point can be rigorously proved mathematically. The probability of the number n in the base b is P=log b (n+1)/n. This type of data is characterized by slow growth in the early stage, and later it grows faster and faster; the data itself has its own distribution law, which may not conform to Benford’s law. For example, the rate of return does not satisfy Benford’s law.
Data such as population, GDP, operating income, broadcast volume, transaction volume, etc. are difficult to scale from 1 to 2 in the early stage due to scale effects or network effects, but after developing to a certain scale, it is relatively easy to grow from 8 to 9, so stay at smaller numbers Long time, short stay in larger numbers, and finally the first place distribution of numbers presents Benford’s law.
Used to detect data fraud
Benford’s law is often used to detect data fraud, especially in financial data. In the 2003 fraud case in Washington State, the accountant Darrell Dorrell used Benford’s Law to discover the irrationality of the check remittance data. After further investigation, it was discovered that fraud involving up to 100 million US dollars was involved. Coincidentally, Enron’s earnings per share from 2000 to 2001 are also far from the Benford law. In fact, since the 1970s, Benford’s Law has been widely used to detect fraudulent accounting practices.
In addition to the financial and financial fields, Benford’s Law is also used for data in other fields, such as the 2009 Iranian election, the macroeconomic data of the Greek government, public planned economic data, Bill Clinton’s tax declaration data…
However, it is worth noting that the Benford Law cannot be used as evidence in court, and can only be used as a conjecture of data falsification, and subsequent forensic investigation is required. Even in the successful application of the Washington State fraud case, the accountant Darrell’s test with Benford’s Law was only the beginning. After many efforts and three years of evidence search, the principal culprit Kevin Lawrence was sent to prison and sentenced. 20 years imprisonment. In addition, the applicability of Benford’s law in certain fields is controversial. For example, a Harvard University study showed that Benford’s law does not apply to vote data. Based on the above reasons, netizens tested Biden’s ballots by Benford’s Law, which has problems in applicability and persuasiveness, and cannot be used as direct and strong evidence of ballot fraud.
The application of Benford’s law in the field of blockchain
The above describes the general application of Benford’s law. The following is based on the blockchain industry, mining which indicators meet the Benford’s law, and discusses the rationality of the data on the chain based on the characteristics of blockchain technology.
Data applicable to Benford’s law in the blockchain industry
It can be seen from the foregoing that some macro data such as population, GDP, area, etc. comply with Benford’s law. In the blockchain industry, macro data such as the number of blockchain patents and the number of companies also satisfy Benford’s law. The figure below shows the number of blockchain patents in various provinces and cities from 2020 to the present and the number of blockchain companies in the Wind global enterprise library. The first distribution is more in line with Benford’s law.
Figure 2: The number of blockchain patents and Benford’s law, source: National Patent Statistics Bureau, Ouke Yunchain Research Institute
Figure 3: The number of blockchain companies and Benford’s law, source: Wind Global Enterprise Library, Ouke Yunchain Research Institute
In addition, the financial data in the blockchain industry is also a typical application scenario of Benford’s Law. The following data comes from the constituent stocks of the blockchain index.
Figure 4: Profit and Benford’s Law, Source: Wind, Ouke Yunchain Research Institute
Figure 5: Stock price and Benford’s law, source: Wind, Ouke Cloud Chain Research Institute
The rationality of data on the chain from the perspective of Benford’s law
Blockchain technology itself has the characteristics of distributed and data transparency, which is conducive to the multi-party supervision of data. The immutability of data also increases the cost of data fraud. Fraud will leave traces permanently. Therefore, blockchain technology can effectively suppress data. Fake. At present, blockchain has been applied in many fields such as finance and public welfare to help solve the pain points of data fraud.
This article first examines the general characteristics of transaction volume data, and then compares similar volumes of blockchain-based and non-blockchain-based platform data. First of all, after examining the trading platform data with sample sizes of more than 100, 1,000, and 2,000, it is found that the transaction volume is very consistent with the Benford’s law, and the larger the sample, the closer the data is to the theoretical value of the Benford’s law. Next, obtain transaction volume data from a blockchain-based trading platform, sort out 114 valid samples and compare the distribution of their first digits with the theoretical value of Benford’s law. It is observed that the transaction volume on the chain is more consistent with Benford’s law, except for numbers. 8. For comparison, a similar volume trading platform that is not based on blockchain technology is selected. The effective sample number is 195, but its turnover is higher than 6 and 7. Taking into account the disadvantages of the data sample size on the chain and the coincidence of the overall situation, starting from the Benford’s Law, the transaction volume data based on the blockchain is more reasonable.
Figure 6: Based on blockchain transaction volume and Benford’s law, source: Ouke Yunchain Research Institute
Figure 7: Not based on blockchain transaction volume and Benford’s law, source: Wind, Ouke Yunchain Research Institute
The public welfare project Waterdrop Chips claims to be based on the application of big data and blockchain. The following takes the blacklist of 122 dishonest fundraisers published on the official website as an example to explore the authenticity of the data on the blockchain technology chain from the perspective of Benford’s law. According to the previous discussion, the artificially prescribed data such as mobile phone numbers and ID numbers do not conform to Benford’s law, so the research object is set as the number of dishonest people each month from 2017 to the present. The results are as follows.
Figure 8: The number of untrustworthy personnel and Benford’s law, source: Waterdrop Chip, Ouke Cloud Chain Research Institute
Limited by the number of samples, the first distribution of the number of untrustworthy people does not completely conform to Benford’s law, but generally presents a trend that the larger the number, the lower the probability of occurrence.
Although it has not been rigorously proven, a large number of practical tests have shown that Benford’s law is a widespread and interesting data law, which is used to mine data fraud, especially in financial data. The issue of Biden’s vote not complying with Benford’s law in the US general election has problems with the applicability and persuasiveness of the data, so it cannot be used as strong evidence to overturn the election results. As this article has repeatedly emphasized, Benford’s Law is only a method for discovering suspiciousness, not sufficient evidence. It is only the starting point rather than the end point of data fraud.
Benford’s law also applies to some data in the blockchain industry, such as the number of blockchain companies, the number of patents, and company financial data. In addition, the transparent and non-tamperable characteristics of blockchain technology itself help to maintain the authenticity of the data. And compared with a set of actual data, it is found that from the perspective of Benford’s law, the first distribution of data on the chain is reasonable.