56 total views
The data element market will be distributed in the overall structure, but there will be some “data intermediaries” as core nodes, and the blockchain can be used as an organizational tool for the data element market.
Original title: “Zou Chuanwei: Application of Blockchain in the Data Element Market | Wanxiang Blockchain “Integrated Innovation” Series Research Report”
Author: Zou Chuanwei, Chief Economist of Blockchain
This article is a series of industry research articles on Wanxiang Blockchain “Integrated Innovation”. The author is Dr. Zou Chuanwei, Chief Economist of Wanxiang Blockchain. This article adopts the method of “integrating into parts”. First, the data value chain is divided into four links: data recording and acquisition, data collection, verification and storage, data analysis, and data element configuration, and then discuss the blockchain in these links in turn Can play a role.
Blockchain and the data element market are currently two areas of concern. In April this year, the Central Committee of the Communist Party of China and the State Council’s “Opinions on Building a More Complete Factor Market Allocation System and Mechanism” listed data as one of the elements for the first time. The National Development and Reform Commission positioned blockchain as a new element in the definition of “new infrastructure”. Technical infrastructure. Many professionals and scholars discussed the application of blockchain in the data element market and highly affirmed the importance of this application in protecting and using personal data and improving the data foundation for AI development. However, unlike the application of blockchain in central bank digital currency, stable currency, supply chain finance, deposit certificates, and anti-counterfeiting traceability, the data element market itself is in the early stage of development, and there is still no conclusion on many core issues. It is difficult to in-depth discussions on the application of chains in the data element market.
Based on the previous research, this article discusses the role of blockchain in different links of the data value chain. According to the 2018 report of the Global Association for Mobile Communications Systems, the data value chain can be divided into four main links (Figure 1): The first is data generation, which refers to data recording and acquisition. The second is data collection, verification and storage. The third is data analysis, which refers to the processing and analysis of data to generate new insights and knowledge. The fourth is exchange, which refers to the use of data analysis results, either for internal use or external transfer. This link is more appropriately called “data element allocation”. This article is divided into 5 parts. The first 4 parts are carried out according to the above 4 links in turn, focusing on the discussion of the 4th link, and the fifth part summarizes the full text.
Figure 1: Main links of data value
The application of blockchain in data recording and acquisition
Blockchain is a distributed ledger about Token. Token is essentially a state variable defined in the blockchain (Part 4 will discuss another meaning of Token in the payment field). There are not only data related to Token and its transactions, but also data not related to Token and its transactions.
Data related to tokens and their transactions — how many tokens are in each address of the blockchain and the Token transaction records between different addresses — are native to the blockchain and recorded by the blockchain. It is a product of mathematical rules, true The accuracy is guaranteed by cryptography, consensus algorithms, etc. Measured from the proportion of storage space in the blockchain and the computing resources invested by verification nodes (miners), this part of the data is dominant in the data in the blockchain and is also the data with the highest “value content” in the blockchain. For example, in central bank digital currency and stable currency applications, this part of the data is the basis for analyzing capital flows and implementing anti-money laundering and anti-terrorist financing supervision. For another example, in cryptocurrency pricing, in-chain transaction data is an important valuation reference.
Data irrelevant to the token and its transaction is written into the blockchain as an addition to the token transaction. Writing to the blockchain means that the entire network is visible, cannot be tampered with, and there is no error in copying and dissemination, but the blockchain itself cannot guarantee the authenticity and accuracy of these data at the source and writing links. Because of the limitation of the storage capacity in the blockchain, this part of data can only be written into the blockchain in the form of hash digest in many cases, and only a small amount of structured information can be uploaded to the chain in the form of original data. Therefore, in the vast sea of data generated all the time in the real world, the proportion of raw data that can be chained is almost negligible. This shows that blockchain is not a general-purpose ledger or database, and its strengths should be used. Only data with high enough value is worth chaining in the form of original data.
The main function of the hash digest on the chain is to store evidence , to increase the credit of the original data stored on the local device or the cloud-by revealing the original data after the fact (for example, allowing external institutions to penetrate to the local device storing the original data) , To prove two points: one is that the original data does exist at the upload time of the blockchain record; the other is that the uploader does know the original data. However, it is not advisable to understand the role of blockchain storage and credit enhancement for data. In particular, for data that is not native to the blockchain, its credibility is inseparable from the support of specialized data recording and acquisition technologies and related systems, such as the “blockchain + Internet of Things” that will be discussed next. Data management.
Internet of Things devices continuously obtain data such as geographic location, temperature and humidity, speed, and altitude from the surrounding area. With the current end-to-side anti-attack technology, the true accuracy of IoT data at the source is guaranteed to a considerable extent. IoT data is mainly stored on the cloud and locally on IoT devices. Most of the Internet of Things can run hash algorithms and public and private key signature operations. In the IoT data chain, only a small amount of structured data can be directly written to the blockchain, and most of the data is chained in the form of hash digests. Therefore, in the management of IoT data by “blockchain + IoT”, related operations are automatically executed by IoT devices, which is very efficient and reduces human intervention.
“Blockchain + Internet of Things” provides a benchmark for understanding the application of blockchain in data recording and acquisition. In addition to the data of the Internet of Things, many data are greatly affected by human factors in the recording and acquisition. Whether it is worth going on the chain requires a detailed account of costs and benefits.
Application of blockchain in data collection, verification and storage
Data collection, verification and storage mainly rely on database technology, and the direct role of blockchain is limited. For example, the management of personal data in the financial sector now generally emphasizes the application of API technology to generate compound value through data aggregation.
As discussed in the first part, the data that the blockchain can store is very limited. Most of the data is stored on the local device or in the cloud, but the trust can be increased through the hash digest on the chain. In addition, if data collection, verification and storage are carried out through a market division network composed of different institutions, then in theory, this market division network can be built on the blockchain. The distributed storage project Filecoin can be regarded as an attempt in this direction . To achieve large-scale success in this direction, it is necessary to design the mechanism of a distributed economy. I summarize related economic issues as a Decentralized Data Economy, which will be discussed in Part 4.
The application of blockchain in data analysis
The role that blockchain can directly play in data analysis is also very limited. Because of the limitation of computing performance in the blockchain, complex data analysis work is generally not performed through smart contracts in the blockchain, but mainly depends on statistics, econometrics, data visualization, big data analysis, and AI. Related calculations occur Outside the blockchain.
If data analysis is also carried out through a market division network composed of different institutions (for example, some institutions provide computing power, and other institutions provide algorithms), then in theory, a distributed data economy based on blockchain can also be introduced. For example, the PlatON project is committed to building a high-performance computing network to promote the circulation of data and computing power. The main market participants include computing coordinators, data providers, and computing power providers .
The application of blockchain in data element configuration
As an integrated technology with the color of production relations, the application of blockchain in the data element market will mainly be reflected in the data element configuration link. Next, we will discuss this issue at two levels: the confirmation of data elements and the organization of data element markets.
Data element confirmation
Economic research shows that the premise of any effective allocation of resources is to determine the property rights of resources, and data elements are no exception. Property rights is a complex economic concept that refers to an executable social structure that determines how resources are used or owned. Property rights have three core dimensions: first, the right to use resources; second, the right to obtain benefits from resources; and third, the right to transfer resources to others, change resources, give up resources, and destroy resources. Property rights can be subdivided into “right bundles” such as ownership, possession, control, use, income, and disposal rights.
Data has both the characteristics of goods and services. A lot of data is non-exclusive and non-competitive. The ownership of data is a complicated issue both in law and in practice, especially for personal data. In reality, the typical representative of data that can clearly define ownership is patents, but the complexity of data confirmation can be seen from patents.
The prerequisite for obtaining patent rights is to disclose the technical content of the invention so that the public can make further improvements and avoid the waste of resources for repeated research and development. For example, the patent examination authority will generally disclose the contents of the patent specification about 18 months after the invention patent application. The patentee enjoys the exclusive right of the patented technology within the statutory period and enjoys the privileged interests of business. This is to protect the rights of inventors and encourage the public to invent. When the statutory period of the patent right expires, the patent right is extinguished, and the public can freely use its patented technology according to the content disclosed in the patent specification.
From the perspective of global practice, the confirmation of data element rights is a product of the combined action of law and technology. Generally, the law first determines the institutional framework of data property rights, and then technology ensures the enforceability of these institutional frameworks. For example, many newspapers and magazines are now paid, and only paid accounts can read articles, and technology is used to restrict the copying and screenshots of articles. If plagiarism is found, the law is used to protect rights. In many cases, technology alone cannot confirm the rights of data elements. The first part discusses the evidence function of blockchain. Data storage certification does not mean data confirmation. For example, the inventor can put the hash digest of the invention document on the blockchain to prove that he made the related invention first, and will have the function of “self-certification of innocence” when disputes arise in the future. However, if it is not approved by the patent examination authority, the winding of the invention document does not mean the patent right.
Some experts and scholars believe that only data with clear ownership can enter the data element market. This is a big misunderstanding. The “clear ownership + buyout transaction” model is only suitable for special types of data like patents (for example, many corporate mergers and acquisitions include the pricing of patents), but it will not become the mainstream of the data element market. In practice, the premise of the establishment of the data element market is effective control of data, that is, controlling who (Who) can use data under what conditions (What) and how (How). In other words, data property rights are ultimately reflected in the effective control of data. This perspective helps to understand the role of blockchain in the confirmation of data elements.
In the blockchain, the address can hide the identity of the actual controller, and the hash digest can hide the original data, but the blockchain itself is not a privacy management technology. In particular, the data in the public chain is visible to the entire network, and technologies such as ring signatures, coin mixing, and coin combination are needed to hide the flow of funds in the chain. The consortium chain can realize the differential opening of data, allowing different users to have different permissions to read the data in the blockchain. But as discussed in the first part, the data stored in the blockchain is limited after all, and the direct role of the blockchain in data control is also limited. For example, in a “blockchain + government data sharing” project, government data is stored on local equipment (usually a confidential network within government departments), and data calls across government departments are still carried out through traditional methods. The original data cannot be Circulate on the blockchain, but the blockchain will record data applications, authorizations, calls, and access records, so that it is non-repudiation, mainly for post-auditing.
Among various data control technologies, cryptography has the greatest relationship with blockchain, including verifiable computing, homomorphic encryption, and secure multi-party computing. For complex calculation tasks, verifiable calculations will generate a short proof. As long as the short proof is verified, it can be judged whether the calculation task is executed accurately, and there is no need to repeat the calculation task. Under homomorphic encryption and secure multi-party computing, when data is provided to the outside world, cipher text is used instead of plain text. These cryptographic techniques make “data available and invisible”, but because of the high requirements for computing resources, it can only be done outside the blockchain.
Among various data control technologies, the most easily confused with blockchain is payment tokenization, which is also briefly explained here. Payment tokenization in English is Tokenization, which refers to the use of specific payment tokens (Payment Token in English) to replace payment elements such as bank card numbers and payment accounts of non-bank payment institutions, and to limit the application scope of the tokens, reducing the number of merchants and merchants. The risk of bank account and payment account information leakage on the side of the accepting institution reduces transaction fraud and guarantees user transaction security. There is a mapping relationship between payment tokens, bank accounts and payment accounts, and this mapping relationship is managed by the token service provider through two processes: tokenization and de-tokenization. Payment tokenization is the basic core element of digital payment. For example, in mobile payment, the user uses the Token number as the device card number stored in mobile devices such as mobile phones, and can use mobile devices to make contactless near-field payments on offline POS machines, ATM machines and other terminals, or in mobile phone customers Remote payment is initiated directly in the terminal.
At present, UnionPay mobile QuickPass and online payment products have fully applied payment tokenization technology. It can be seen from the above introduction that the token in payment tokenization represents sensitive information such as bank accounts and payment accounts. It has standardized compilation standards and does not rely on complex cryptographic technology; the token in the blockchain is used in the central bank digital currency and In applications such as stablecoins, it represents legal currency reserve assets, but Token itself is a product of blockchain technology.
The organizational form of the data element market
Because of the diverse types and characteristics of data elements, lack of objective valuation standards, and in many occasions will not adopt a buyout transaction model, the data element market will not become a centralized and liquid trading market like the stock market. . This can be verified from the experiments of big data trading centers or big data exchanges in many provinces and cities in the past few years. None of these trials achieved the expected success. Although there are reasons such as insufficient policy support and insufficient supporting technologies, the more important reason is that the economic attributes of the data elements do not support a trading model with a high degree of standardization, bid matching and active transactions.
In the big picture, the data element market will be closer to the over-the-counter market such as the bond market and the over-the-counter derivatives market, with a lower degree of standardization, peer-to-peer transactions and negotiated pricing, low transaction frequency but will always happen. But this does not mean that the final data providers (such as individuals and IoT devices) and the final data demanders (such as AI algorithm companies) will directly enter the market. The data element market will evolve some “data intermediaries” to better flow data from the ultimate provider to the ultimate demander.
Therefore, the data element market will be distributed in the overall structure, but there will be some “data intermediaries” as core nodes. The application of blockchain in the organizational form of the data element market must be analyzed in this framework.
First, the main functions of “data intermediaries” are data collection, verification, storage and analysis. The second and third parts have already analyzed how these “data intermediaries” use blockchain. It needs to be added that the blockchain can be used to improve the data release link. For example, in the central bank digital currency prototype system in 2018, Yao Qian proposed the application of blockchain to the central bank’s digital currency confirmation registration. His vision is that the central bank and commercial banks will build a central bank’s digital currency distributed verification account, provide a website for external verification and inquiry through the Internet, and realize the central bank’s digital currency online currency detector function. This is to use the non-tamperable and non-forgeable characteristics of the blockchain to improve the data and system security of the confirmation query.
Second, as discussed earlier, most data in the real world will not be stored and circulated through the blockchain, but the blockchain can record data authorization, invocation, and access activities, which is similar to the blockchain in the supply chain. Application in scenarios such as management and commodity traceability. This application direction is valuable, but the significance of innovation is not very strong. First of all, data analysis and use will generate new data, making the traceability of data circulation meaningless. Second, if you want to track and trace data circulation from the perspective of data confidentiality and leakage prevention, analyzing TCP/IP data packets is a more direct and effective method than blockchain.
Thirdly, as an organizational tool for the data element market, blockchain is the concept of distributed data economy introduced earlier:
- The basis of a distributed data economy is data confirmation, which is reflected in the ability of data providers to effectively control the use of data by data demanders.
- The distributed data economy is a rich data ecology. Different participants can communicate with each other in terms of data, algorithms (data analysis methods), and computing power. This is essentially a large-scale collaborative calculation through a market mechanism, which realizes the effective allocation of data elements while protecting data property rights to promote economic development and enhance social welfare.
- Blockchain records the economic activities in the distributed data economy, but not for evidence and traceability, but for accounting for economic activities.
- In a distributed data economy, the central bank digital currency or stable currency is used as the transaction medium. The reason is that some participants in a distributed data economy can be impersonal, such as IoT devices as data providers and AI algorithms as data demanders. The central bank’s digital currency and stable currency can be compatible with the openness of the distributed data economy, and can guarantee the safety and efficiency of payment.
There are many interesting application scenarios for distributed data economies. For example, in “Blockchain + Internet of Things”, the device ID of the Internet of Things is bound to the address of the digital currency wallet. Data storage, transmission, mining and value interaction in the Internet of Things can be carried out in a trusted manner. Relevant economic activities are accounted for through central bank digital currency or stable currency. It is conceivable that when an IoT device continues to provide high-quality data, it will receive more central bank digital currency or stable currency as a “remuneration” (actually attributable to the owner of the IoT device). This economic incentive will significantly promote the collection and use of IoT data.
This direction helps to realize the distributed cognitive industrial Internet proposed by Dr. Xiao Feng . Distributed cognitive industrial Internet adopts a distributed governance structure. All enterprises can join in with confidence, adopting cognitive intelligence technology based on knowledge graph and data collaboration based on privacy computing, and the integration of manufacturing and service based on full life cycle management.
Blockchain is of great significance to the construction of the data element market. However, because the data element market itself is at an early stage of development, many core issues are still inconclusive, which makes it difficult to discuss the application of blockchain in the data element market. This article adopts the method of “breaking into parts” to discuss the role that blockchain can play in different links of the data value chain.
First, the data recording and acquisition link. Blockchain, as a distributed ledger for Token, cannot be used as a general-purpose database. The data related to Token and its transactions are native to the blockchain and recorded by the blockchain, which is the data with the highest “value content” in the blockchain. However, among the massive data in the real world, the proportion of data that can be chained in the form of raw data is almost negligible, and most data can only be written to the blockchain in the form of hash digests. The hash digest on the chain has the function of storing evidence and adding credit to the original data. “Blockchain + Internet of Things” manages IoT data with high efficiency and little human intervention, which provides a benchmark for understanding the application of blockchain in data recording and acquisition. Whether other data is worth the chain, you must carefully balance the costs and benefits.
Second, data collection, verification, storage and analysis. Blockchain can directly play a limited role in these links. But if these links are carried out through a market division network composed of different institutions, then they can be built on the blockchain and become a distributed data economy.
Third, the link of data right confirmation. Data confirmation is the basis of data element configuration. The confirmation of data elements is the product of the combined action of law and technology. Evidence of data through the blockchain does not mean data confirmation. In practice, data confirmation is mainly reflected in the ability of data providers to effectively control the use of data by data demanders. In this sense, blockchain (especially public chain) is not a privacy management technology. The consortium chain can open to different data, allowing different users to have different permissions to read the data in the blockchain. However, the data stored in the blockchain is limited, and the direct role of the blockchain in data control is also limited. Cryptographic technologies such as verifiable computing, homomorphic encryption, and secure multi-party computing make “data available and invisible”, but because of the high requirements for computing resources, it can only be performed outside the blockchain.
Fourth, the configuration of data elements. The data element market will be distributed in the overall structure, but there will be some “data intermediaries” as core nodes. The non-tamperable and non-forgeable characteristics of the blockchain help to improve the data release link. Blockchain can record data authorization, calling, and accessing activities, which has a certain value, but the significance of innovation is limited. The innovative value of the blockchain in this link is mainly reflected in the distributed data economy, which is essentially a large-scale collaborative calculation through the market mechanism, and the effective allocation of data elements is realized while protecting the data property rights. The distributed data economy helps to realize the distributed cognitive industrial Internet.
 GSMA, 2018, “The Data Value Chain”.
 Another main purpose of the hash digest is to cooperate with the preimage, as a multi-party coordination tool in the hash time lock contract (HTLC) and the discrete log contract (DLC). Please refer to “Hash Time Lock Application” (Wanxiang Blockchain Research Report, 2020 Issue 12), https://www.chainnews.com/articles/365768981629.htm .
 The analysis of Filecoin economic model can be found in “A Brief Introduction to Filecoin Economic Model” (Wanxiang Blockchain Research Report, 2020 Issue 29), https://www.chainnews.com/articles/974219932958.htm
 Interested readers can refer to PlatON’s Blue Book of Economics:
 Tokenization is related to Encryption, but there are also big differences. Please see:
 Qian Yao, 2018, “Experimental Research on Central Bank Digital Currency Prototype System”, Journal of Software, September 2018, Volume 29, Issue 9.