The design of “data fusion infrastructure” should be based on the concepts of controllable, calculable, verifiable and measurable.
Speech: Sun Lilin, founder and CEO of Matrix Yuan
On October 27th, the 6th Blockchain Global Summit hosted by Wanxiang Blockchain Lab opened in Shanghai. Sun Lilin, the founder and CEO of Matrix Yuan, delivered a speech on “Secure Multi-Party Computing and Data Fusion Infrastructure”, sharing the expansion and application of secure multi-party computing in the commercial field. The following is the full text of the speech:
Hello everyone!
I am very fortunate to report to you at the Wanxiang Blockchain Summit for the sixth time. In 2015, more than 60 people were squeezed in a conference room of less than 30 square meters to discuss how blockchain can solve the problem of financial infrastructure payment and settlement. In 2016, we began to study the bottom layer of the blockchain seriously; in 2017, it was proposed at the summit for the first time that MPC can be commercialized, and the first privacy computing sub-forum was held. In 2018, there was a round table on the stage. At that time, there was a good conversation with Wu Jihan of Bitmain. At that time, he asked me that multi-party security calculations are good, but how long do you think it will take? My answer at the time was that it would take five years for large-scale commercialization.
Last year, I introduced you to the combination of the Internet of Things and blockchain on the stage. Today I want to report to you how to understand privacy computing, blockchain and data fusion infrastructure.
I am very glad that this year many companies, institutions, and governments, including the front-end People’s Bank President Fan, have mentioned “secure multi-party computing” MPC. What value can private computing play? What does it have to do with secure multi-party computing?
The essence of MPC is multi-party calculation. The zero-knowledge proofs, homomorphisms and other algorithms that you see can be regarded as a kind of MPC in a broad sense, while MPC in a narrow sense is a specially constructed scene. Whether it is blockchain or private computing, it is essentially multi-party computing. The future digital convergence infrastructure is being constructed based on multi-party computing.
This is our understanding of the whole thing. The change brought about by multi-party computing is to turn the data Internet that everyone has been accustomed to in the past 20 years into the computing Internet. In the era dominated by Internet companies that everyone is accustomed to today, all data is migrated to the cloud in full and partly, and the data is migrated. This matter will cause privacy not to be protected, and data cannot become an asset.
Internet companies have used data, not only depriving the data of natural rights, but also seeking huge profits. In the United States, Facebook was fined nearly 5 billion U.S. dollars; Google filed antitrust lawsuits in 11 states. In fact, they are all aimed at data monopoly.
However, under the premise of the secure multi-party computing that we constructed, changes have come. The data stays locally, and the calculations are migrated. The calculation process of the data is not available.
There is a very simple principle that proves that the existing network is difficult to maintain. The total amount of data is getting larger and larger, and it cannot afford the large-scale network transmission and cost, and can only be processed locally.
We cannot emphasize the personal attributes of data in general. It is difficult to prove that almost all data completely belongs to you personally. I am only here at this time when physical life is on the stage, but digital life is not. When you listen to my report, you may still be watching WeChat. Your ID may be projected on WeChat. Maybe you are still looking for a place to eat in Dianping, and your other ID is projected on Dianping. In fact, we have been discretized.
At this time, many data assets are related to specific scenarios and applications, and the public attributes of data are very, very strong, and most individuals and institutions are unable to provide complete data storage, security protection, and computing capabilities. This means that it will be quite long in the future. During the period of time, most people and institutions will still choose data processing agents and agents. The agents will process the data under entrustment, calculate the data in the confidential state according to the promise and contract, and divide the corresponding data Profit is similar to today’s real estate agency and investment banking services.
The data itself cannot be priced. It is not the data that can be priced and traded, but the calculable part and the calculable value of the data. Only the part that can be calculated by the algorithm is meaningful and can be calculated valuation, rating, and pricing , Transaction, can not simply understand this matter.
After the new infrastructure this year, everyone is mentioning new infrastructure, but I think it is not that easy, although this year, many friends have entered the track and battlefield of private computing and data fusion.
The upper level law has not yet been fully established. Everyone has seen the text of the Personal Information Protection Law, as well as the 3-5 upper level laws such as cryptography and network security law, which will stipulate the rights and interests of data and how everyone uses it from different angles. Before the legal data ownership of the upper level law is determined, the supervision of each industry is very cautious, so the People’s Bank of China has the layout of financial data centers and credit reporting centers, and there is a large number of Shahe Rivers.
Before last year, not many people really understood what we were talking about. This year, a large number of financial institutions found us to do privacy calculations and to process data securely. Therefore, compliance is very important. Before the data formatting and standardization of today are fully processed, they basically tend to think that only the financial industry is relatively manageable. The problem that other medical and government affairs often face is that the quality of data is very insufficient. It is difficult to be processed by the algorithm and the workload is too large. It is similar to the relationship between “artificial intelligence” and “worker intelligence” as everyone said.
Today, most people are still in the first stage-technical problems, and the difficulty of technical problems is far beyond everyone’s imagination.
The whole life cycle is a very commonly used term. It must appear from the entrance, whether it is IOT or IUOT. Finally, we understand that blockchain and privacy computing are infrastructure components at different levels of multi-party computing, which is to treat data as an asset, essentially It is the financial infrastructure of the new era, which can almost correspond to today’s bank deposit, loan and foreign exchange business processing transactions.
Thank you very much for participating in the cryptographic project of the People’s Bank of China’s Digital Banking Institute in 2017. At that time, we have been advancing the concept put forward by the leaders of the central bank, not only for digital currency but also for data. When we analyze the problem, what is the fundamental contradiction in the era of truly new infrastructure? This concept has been mentioned since 2017 and is summarized into three contradictions:
(1) Individual privacy vs central supervision.
We believe that the vast majority of institutions and individuals need to complete the strong real-name identity registration in licensed institutions issued by the state and government. Correspondingly, as long as you register in this place, there is no need to register with other equivalent commercial entities. This is a problem that everyone encounters today. As long as you buy a pair of shoes the day before, Douyin will push shoes for you every day No one wants to see this kind of problem. The solution is to solve the problem through a licensed business. The problem to be solved is called the “identity registration center in the digital age”, which solves the problem of distributed identity.
The era of ID cards for all of you here today has passed. IDs are distributed in commonly used apps, such as Didi, WeChat, and Meituan. Different identities form a distributed digital us.
(2) Transaction privacy Vs registration confirmation.
Not only has it not been changed in the past four years, but it has become more and more intense. Today’s Internet companies have not only taken away their identity information, but also taken away transaction information. All transactions are hosted on his platform. According to the logic of financial supervision, this Things should be broken up. Whether it is a technology company or an Internet company, it is not allowed to process my data, my identity and my transactions. This is not possible and should be completely split. Only in this way can a distributed user portrait be realized. Meituan, Didi, Alipay, and WeChat all know only part of you. How can there be complete user portraits for financial institutions and the government? Let you perform a relatively accurate scribing when you need it. Scribing is a computable part that pays part of the data. The counterparty calculates you, of course, it must be done in the ciphertext state.
(3) Data privacy vs. collaborative computing.
Whether it is Yinzhengbao, the government big data center, or the various commissions and bureaus, they are reluctant to hand over the data to a big data center or a single digital agency. The only way is to deal with the problem through the generalized MPC multi-party calculation.
These three steps are already very clear, from the distributed identity-distributed user profile-distributed credit system.
In the physical world, only your name and ID number are known, but in the digital world, no one can identify you as a complete hybrid loose-coupled system.
This year, in the process of communicating with local governments last year, it is recommended that today’s data governance structure be sufficient, and the original data transaction platform should be decoupled according to the new structure. This is a very big change to urban and national data governance. I believe this is almost the only way.
To be a “super clearing party” of data elements, first there is a data entry. The problem of data quality must be solved. The scale, quality, and data labeling are complex and not entirely technical issues. In addition to getting data from the Internet of Things and Industrial Internet portals In addition, it is handled by hand and a lot of manpower.
The second is the data exchange network. The essence of the block chain is a data exchange network. The problem that may be solved in the exchange network is that the block chain is a public infrastructure. My personal opinion does not think that putting all transactions on one chain is the right choice, which is difficult to deal with. The performance of large, small, high-frequency and low-frequency is different. There is no need to put it on a chain, but a global blockchain system similar to Ethereum can become a public infrastructure, just like TCPIP , To grow a specific inter-agency trading platform on Ethereum. The specific scenario is implemented on a specific chain, and the business is realized on the business chain, rather than being completely stacked on a chain.
There is no use chain alone. I particularly agree with Vitalik’s point of view. In the past so many years, people have been asking me what is the use of blockchain? Why is there a blockchain? What is the killer question? I usually don’t answer this question. Blockchain itself is a financial infrastructure, a species of the whole body, whether it is used or not, there is no third option, and there is no need to ask me whether it is useful or not.
The third is collaborative computing. When the chain provides a highly interactive platform for payment and settlement, computing will appear. My personal understanding is that we cannot say what trust in Trust means in general. Many believers will say that blockchain is a network of trust, which is not enough. Accuracy, my definition is measurable institutional transaction costs.
For example, UnionPay, the institutional transaction cost of the UnionPay network can be simply understood as the license cost plus UnionPay’s income per person. These are all measured costs. What is trust? How to measure the cost if you trust you after being licensed? It is the cost and expense of the network every year.
Three things are listed: verifiable security, sustainable economic models, and measurable institutional transaction costs. On the basis of this, we will do private computing and distributed AI data processing.
There is no doubt that the financial industry is the most mainstream application, with the most standardized data, the strongest demand, the highest compliance requirements, and the richest. I was surprised to see that the demand in the field of advertising and marketing Internet is very fierce. In the process of cooperating with a large amount of big data, I found that even the most basic label business is among the top ten big data companies, and the daily label exchange exceeds 10 billion. pen. Even WeChat Pay is only 10 trillion transactions. What is the data age? Just by looking at the quantity, enough quantity can prove the arrival of the era. Recently, we are cooperating with the Ministry of Public Security in conducting anti-fraud business. Today, the challenges that society encounters in anti-fraud business have exceeded everyone’s imagination.
We are cooperating with many hospitals, and frankly speaking, data standardization cannot be so complete.
“Data Convergence Infrastructure” has a basic design concept. When today’s compliance is incomplete, legislation is incomplete, business model is incomplete, and technology is immature, what concepts are used to construct infrastructure? Four concepts:
(1) New and controllable underlying technology.
For example, if the Ethereum chain cannot support KYC and anti-money laundering, it is difficult for financial institutions to use it, and only then will new business chains appear to solve problems.
(2) Verifiable.
Scientific problems can be falsified, but this sentence is not accurate. To be more accurate, every step of the system can be verified. If everyone knows about the blockchain, it is a verifiable system. The process of generating blocks is verifiable, including future data-based multi-party calculations and privacy calculations.
(3) Computable.
If the data has no calculable value, then the data has no value. It is like someone digs out gold and you dig out a pile of stones. This is completely different. It does not mean that the larger the amount of data, the more valuable it is. The more valuable the computing part is, the more the original cost and benefit of the data ontology can be reflected.
(4) Measurable.
We must create a new set of pricing and incentive models to deal with, so we use a lot of new technologies, many of which are not yet mature, like VC verifiable calculations are very complicated.
Based on these four principles, Matrix Yuan hopes to become part of the technical operation and access large amounts of data. However, access, transaction, payment, clearing, and equity distribution are also required in the network. When we discussed with government departments and national institutions, we came up with a concept-contribution. How can credit be counted when various commissions, bureaus, military industries, and various industries provide data? How do I know how much I contributed? This matter requires the allocation and pricing benchmarks of rights and interests. It is very valuable to use the blockchain to do contract-based liquidation in a closed state. It cannot be said that privacy computing + blockchain can do business. This is wrong. of.
The manifestation of the value of data still depends on the emergence of the ecosystem, so there is a relatively long process.
This is the technical architecture of the infrastructure. Hardware and cryptography are made from the bottom layer, and the alliance chain system is on the top. Recently, in the testing and verification of the Electronic Standards Institute, the 99 PlatONE standards have passed 93, and the other 6 are pure business. We did not enter the specific business scenario. The 93 scenes are very detailed, thanks to the support of the Standards Institute. But it is far from enough. According to actual experience, at least 100 standards have not been tested.
Technical stations from top to bottom are very expensive. If there is not enough long-term vision and patience to wait for this matter, there will be no good results. It is not something that can be accomplished overnight.
Old friends know that they entered the MPC field in 2017. Thanks to the trust of our comrades and shareholders, we have been continuously investing in scientific research on this battlefield. In a recent test conducted by a financial infrastructure institution, the performance of MPC far surpassed that of all the same industry, almost by more than 20 times, and by more than 500 to 600 times in specific business scenarios, so that they are very experienced that your advantage is too obvious. Not fake? In fact, there is nothing real or false, that is, you have to continuously invest. If you just use open source code to change it, it is difficult to get the effect, it is rewritten from the compiler, and the process is very complicated. We really believe in the concept of open source, and all codes and technical architecture are open source.
Recently, I have had a very good discussion with Google. One of the most important work done is to completely dismantle the system and refactor the system so that all AI developers can fully use the secure, multi-party system and use private computing.
There is a process for technological advancement. The early 3-5 years are very slow. Once the turning point is passed, the acceleration will exceed my personal imagination. I hope to share with you our understanding and practice of data fusion infrastructure for your reference.
The company Slogan appointed in 2016 is “for the flow of data”. I hope to give you a little bit of inspiration and help, and I hope to develop various cooperation with you. Thank you!