Vitalik Buterin: Technical analysis of specific attributes and trade-offs of sharding

Vitalik Buterin: Technical analysis of specific attributes and trade-offs of sharding

Loading

“Sharding” is the key to the future of Ethereum’s scalability, but it is also one of the most misunderstood concepts.

Original Title: “Vitalik: Revealing the Advantages of “Sharding” from a Technical Perspective”
Written by: Vitalik Buterin, co-founder of Ethereum Translation: Eth Chinese Station

Sharding is the future of Ethereum’s scalability. It is the key to enabling the Ethereum ecosystem to achieve thousands of transactions per second, so that most people can become users of the Ethereum at an affordable cost. However, in the Ethereum ecosystem, sharding is one of the concepts that are easily misunderstood, and the same is true in the broader blockchain ecosystem. It refers to a set of very specific concepts, these concepts have their own characteristics, but people often confuse the former with some technologies, the latter’s security features are weaker and different from the former. The purpose of this article is to introduce the specific properties of sharding and distinguish it from other non-sharding technologies, and what sacrifices the sharding system needs to make in order to achieve these properties.

Vitalik Buterin: Technical analysis of specific attributes and trade-offs of sharding _Legend: Ethereum Sharding System, the original image is from Hsiao-wei Wang, designed by Quantstamp_

The impossible triangle of scalability

To introduce sharding, the best way to start is to describe a problem, the impossible triangle of scalability, which led to the birth of the solution.

Vitalik Buterin: Technical analysis of specific attributes and trade-offs of sharding

According to the impossible triangle of scalability, a blockchain wants to achieve three characteristics. If simple technical means are used, only two of the three characteristics can be achieved. The three characteristics are as follows:

  • Scalability: The blockchain can process and verify more transactions than a single ordinary node, such as a consumer-grade laptop.
  • Decentralization: The operation of the blockchain can be independent of a small group of large centralized participants. This is generally understood to mean that even if most nodes are honest, one should not trust a group of nodes that cannot be accessed with consumer-grade laptops.
  • Security: The blockchain can resist a large number of nodes that are trying to attack. Ideally, 50% of nodes must be resisted. Under normal circumstances, more than 25% of nodes must be resisted, but only 5% of nodes cannot guarantee security. .

The following are three different types of “simple solutions”, but these solutions can only achieve two of the three features.

  • Traditional blockchain: including Bitcoin, Ethereum before PoS/sharding, Litecoin and other similar blockchains. These blockchains rely on each participant running a full node to verify each transaction, thus ensuring decentralization and security, but not achieving scalability.
  • High TPS blockchain: DPoS chain is included, but many other blockchains are also covered. This kind of blockchain relies on a small number of nodes to maintain consensus, usually between 10-100, and users must trust most of the nodes. According to the above definition, the solution achieves scalability and security, but does not achieve decentralization.
  • Multi-chain ecosystem: generally refers to the “extend” of the blockchain, that is, to allow various applications to be deployed on different chains and communicate using cross-chain communication protocols. This achieves decentralization and scalability, but it is not safe, because the attacker only needs to control most of the consensus nodes of one of the chains (usually less than 1% of the number of nodes in the entire ecosystem) to cause damage and possibly cause The chain reaction causes huge damage to applications in other chains.

The sharding technology can achieve the three features mentioned above at the same time. A fragmented blockchain has the following characteristics:

  • Scalability: The transaction volume processed by it is much higher than that of a single node.
  • Decentralization: It can run entirely on consumer-grade laptops without relying on super nodes, etc.
  • Security: The attacker cannot launch a local attack on the system through a few resources, but can only try to control the entire system to carry out the attack.

The rest of this article will discuss how sharded blockchains can achieve these advantages.

Randomly sampled shards

The easiest to understand fragmentation version is by random sampling. Compared with the sharding form built in the Ethereum ecosystem, the trust attribute of randomly sampled shards is weaker, but the technology of Ethereum sharding is simpler.

The core idea of ​​sharding is explained below. Suppose there is a PoS blockchain with a very large number of validators, such as 10,000 validators, and the number of blocks that need to be verified is very large, such as 100 blocks. Before the next set of blocks is generated, no computer can verify these 100 blocks.

In order to solve this problem, we need to allocate verification work in a random manner. We randomly shuffle the list of validators, then select the first 100 validators in the list to verify the first block, the second group of 100 validators to verify the second block, and so on. Random sampling shards verify blocks or perform other tasks in this way. These randomly selected verifiers are called committees.

Vitalik Buterin: Technical analysis of specific attributes and trade-offs of sharding

After the verifier verifies a block, it will prove it by issuing a signature. All other nodes only need to verify 10,000 signatures instead of 100 complete blocks, which will reduce a lot of work, especially after applying BLS signature aggregation technology. The broadcast of each block does not need to go through the same P2P network, but through different subnets. Nodes only need to join the subnet corresponding to the block they are responsible for or other blocks they want to verify.

Imagine what the effect will be if the computing power of each node is doubled. For each node, the number of signatures that can be safely verified is now doubled, so the minimum staking number can be reduced, and the number of verifiers can be doubled, so that 200 committees can be generated instead of 100. Therefore, the number of block verifications per time slot can reach 200 instead of 100. In addition, the capacity of each block can be expanded by 2 times. Therefore, the overall blockchain capacity will increase 4 times.

We can explain the principle behind it in mathematical terms. According to the Big O notation, we use “O(C)” to represent the computing power of a single node. O(C) represents the block size that the traditional blockchain can handle. As mentioned above, the shard chain can process blocks of size O(C) in parallel (remember that the indirect cost of each node to verify each block is O(1), because each node only needs to verify a fixed Number of signatures). Therefore, the capacity of each block is O(C), and the total capacity of the shard chain is O(C^2). This is why this type of sharding is called quadratic sharding. Based on the key role of quadratic sharding, we believe that sharding is the best way to expand the blockchain in the long run.

People often ask the question: “What is the difference between randomly forming 100 committees and splitting into 100 independent blockchains?”

The main difference lies in the following two aspects:

  • Random sampling can prevent attackers from concentrating computing power in a certain shard. In a multi-chain ecosystem composed of 100 blockchains, an attacker can cause damage as long as they have 0.5% of the total pledge amount, which means that a 51% attack can be launched against one of the blockchains. In a shard chain, the attacker must have 30-40% of the total pledge amount to achieve the same goal. In other words, the security performance of the chain is shared with the shards. Of course, the attacker can wait until he is lucky and accidentally obtain 51% of the computing power in a single shard. Although the pledge amount is less than 50%, for an attacker whose pledge amount is far below 51%, The difficulty of launching an attack has risen exponentially. If the pledge amount is less than 30%, it is almost impossible to launch an attack.
  • If there is a bad block in one shard, the entire chain will be reorganized to avoid accepting the block, which is called tight coupling. According to the social contract, even if a bad block appears in a single shard, it cannot be accepted by the main chain. Once a bad block is found, the shard will be rejected. The later chapters of this article will introduce some technical methods to enforce the social contract. With this mechanism, from the perspective of the application, the sharding chain enjoys perfect security. Contract A can trust contract B. Even if the blockchain is attacked, contract B fails and rolls back the entire history. It also includes transactions in contract A that are affected by a problem with contract B.

These two differences ensure that sharding creates an environment for the application that retains the key security attributes under single-chain conditions, which cannot be achieved by a multi-chain ecosystem.

Improve sharding through a better security model

I totally agree with a common belief in the Bitcoin community that blockchains like Bitcoin (or Ethereum) do not completely rely on the “honest majority” assumption. If a 51% attack is launched on these blockchains, the attacker can do some destructive bad things, such as rolling back or reviewing transactions, but cannot insert invalid transactions. And even if he does, users who run regular nodes can easily detect this behavior. Therefore, if the community wants to deprive attackers of power through forks, and defend against attacks in a coordinated manner, they can act quickly.

For the more centralized high TPS chains, their main weakness is the lack of this additional security. This kind of blockchain does not, and cannot have a culture that allows ordinary users to run nodes, so major nodes and ecosystem participants can more easily get together and enforce a protocol change, even if the community dislikes the change very much. To make matters worse, by default, the user’s node will accept this change. After a while, users will notice, but by then, this change has become a fait accompli, which means that the main coordination burden, that is, rejecting the change, will be borne by the user, and will have to make a painful decision to roll back one day or More transaction records, and other users think that these records have been finally confirmed.

Ideally, we hope to adopt a form of sharding whose verification method can avoid the 51% trust assumption mentioned above and retain the high security of traditional blockchains. This security can only be fully verified. Can only be achieved under the next. And this is the majority of our research results in the past few years.

Scalable computing verification

We can divide the scalable verification problem that can withstand 51% attacks into two situations:

  • Validation calculation: Check whether certain calculations are completed correctly, and assume that you have all the input data to complete the calculation
  • Verify the availability of data: Check whether the data entered by the calculation itself is stored in some form and can be downloaded if necessary; when performing this check, there is no need to actually download all the input data, because the data may be too large to be available for every block To download.

Block verification in the blockchain involves both calculations and data availability checks, that is, you need to be sure that the transactions in the block are valid, and the new state root hash in the block is the correct execution result of executing these transactions, but you also need Make sure that enough data in the block is actually released, so that the user who downloaded the data can calculate the status and continue processing the block. The second point is related to a very subtle but important concept, the data availability problem. This issue will be discussed below.

Scalable computing verification is relatively easy to implement, which will use two types of technologies: fraud proof and ZK-SNARKs

Vitalik Buterin: Technical analysis of specific attributes and trade-offs of sharding

The fraud proof can verify the calculation while ensuring scalability

The following is a brief introduction to the two types of technologies:

  • Fraud proof is a system that accepts calculation results. You can ask people who have pledged deposits to sign a message in the following form: “I prove that if you use input X to calculate C, you will get output Y”. You will trust the message by default, but other people who have pledged deposits will have the opportunity to challenge the calculation result. They can sign a message saying “I disagree, the output result should be Z, not Y.” Only after launching the challenge, All nodes will perform operations. If either of the two parties makes a mistake, the margin will be lost, and all calculations based on the wrong calculation will be repeated.
  • ZK-SNARKs is a form of cryptographic proof, which can directly verify “After inputting X, perform calculation C, and output Y”. At the level of cryptography, the verification mechanism is “reliable”, because if X is input and C is calculated, and the result is not equal to Y, it cannot be calculated to generate a proof of validity. Even if it takes a lot of time to run the calculation C itself, the proof can be verified quickly. For more mathematical explanations of ZK-SNARK, please refer to this article.

The reason for the scalability of fraud proof-based calculations is that in “usually” you don’t need to run complex calculations, you only need to verify a single signature. In special cases, you must verify the calculations on the chain after the challenge occurs, but special cases rarely happen because the cost of triggering this situation is very expensive, because one of the original declarers or challengers will lose a large amount of margin. The concept of ZK-SNARKs is simpler. They are only verified through lower-cost proofs instead of calculations, but the mathematical principles behind them are much more complicated.

There is a semi-scalable system that can verify calculations in a scalable form, but requires each node to verify all data. If the system can use a series of compression techniques to replace most of the data through calculations, the efficiency can be greatly improved. This is what Rollup does.

It is more difficult to verify the scalability of data availability

Fraud proof cannot be used to verify data availability. The fraud proof of the calculation is based on the condition that once the original statement is submitted, the input data of the calculation must be released on the chain. Therefore, if someone initiates a challenge, the execution of the challenge is completely consistent with the original execution “environment”. For the data availability check, the above operations cannot be achieved because if it is to be published on the chain, the amount of data that needs to be checked is too much. Therefore, for data availability, how to generate a fraud proof scheme has become a key issue. Someone can claim that “data X is available” but not publish it on the chain. Wait for the challenger to appear. After the challenge is initiated, the data will be released to the entire network. Make other participants in the network think that the challenger is incorrect

The “Fisherman’s Dilemma” in the picture below can explain the truth well:

Vitalik Buterin: Technical analysis of specific attributes and trade-offs of sharding

The core concept of “Fisherman’s Dilemma” involves two situations. One situation is that V1 is a malicious publisher, but V2 is an honest challenger, and the other situation is that V1 is an honest publisher and V2 is a malicious challenger. The two cases make no difference to anyone who did not try to download that particular data at the time. Of course, in a scalable decentralized blockchain, each individual node only expects to download a small part of the data, so only a small part of the node can understand all the conditions beyond the divergence.

Since it is impossible to identify which party is correct, it is also impossible to generate an effective fraud proof solution for data availability.

People often ask: “What if some data is unavailable? ZK-SNARK can ensure the validity of all data, but is it not enough?”

Unfortunately, just guaranteeing the validity of the data is not enough to maintain the normal operation of the blockchain. The reason is that if the blockchain can be verified, but all data is unavailable, the user will not be able to update the data and generate a proof to verify future blocks. If an attacker can generate a block, although the block can be verified, but the data is not available, it can effectively hinder the operation of the blockchain. Some attackers may not upload the account data of a specific user until the user pays the ransom, so this is not just a question of activity.

There are some strong information theory views that this problem is fundamental and there is no good solution (such as the application of cryptographic accumulators). See this article for details.

So, how to check if 1 MB of data is available without downloading it? This sounds impossible!

The key solution is a technique called data availability sampling. The way the technology works is as follows:

  • Through the erasure coding tool, the data with N segments is divided into 2N segments, so the entire data can be restored with only any N data segments.
  • If the user wants to check the availability, he does not need to download all the data, but randomly selects the location in the block (constant, for example, 30), and only accepts the block when all the data at the selected location is found in the block.

Vitalik Buterin: Technical analysis of specific attributes and trade-offs of sharding

Through erasure coding, we can change the problem from “checking 100% data availability” (that is, ensuring that every piece of data is available) to “checking 50% data availability” (that is, at least half of the data is available). Random sampling solved 50% of the usability problem. If the amount of available data is less than 50%, then at least one of these two checking methods is not feasible, and if at least 50% of the data is available, then although some nodes may not know the availability of a block, only one is required The honest node can restore the remaining 50% of the block data by running the erasure code reconstruction program. Therefore, in order to check the availability of 1 MB blocks, you do not need to download 1 MB of data, only a few KB. In this way, each block can be checked for data availability. For information on how to use P2P subnets for effective data inspection, please refer to this article

Through the ZK-SNARK proof, the correctness of the data erasure code can also be verified, and then the branches of the Merkel tree are used to verify each data block. Another verification method is to use polynomial commitments, such as Kate commitments (KZG commitments). In essence, the commitments are erasure-encoded through a simple component to prove that each element is verified for correctness. This is an Ethereum split The technology used in the film.

Summary: How to ensure the correctness of all data?

Suppose there are 100 blocks, and you don’t want to rely on the committee to effectively verify the correctness of all blocks. In order to achieve this goal, we need to take the following measures:

Each client performs data availability sampling on each block to verify whether the data in each block is available. At the same time, it needs to download a few KB of data for each block, even if the overall size of the block is MB or larger . The customer point will only accept the block after all data availability challenges have been correctly responded.

After the data availability is verified, it will become easier to verify its correctness. To verify the correctness through the following two techniques:

  • We can use fraud proofs. Some participants who pledged the deposit can provide signatures to prove the correctness of each block. Other challengers or fisherman nodes will conduct random checks and try to process the entire block completely. Because the data availability has been checked, other nodes can always download the data and fully process any specific block. If an invalid block is found, the node will issue a challenge that everyone can verify. If the block is proved to be a bad block, all blocks based on this block need to be recalculated.
  • We can use ZK-SNARK technology. In this way, the correctness of each block can be verified by this technology.

In both cases, no matter how big the block is, each client only needs to perform a small amount of verification work on the block. For fraud proofs, blocks occasionally need to be fully verified on the chain, but this rarely happens because even if a challenge is initiated, the cost is very high.

The above is the summary of the full text! As far as Ethereum sharding is concerned, the short-term plan is to make the blocks in the shard only contain data. In other words, the role of these shards is purely a “data availability engine”, the job of Layer2 rollup is to use a secure data space, and also use fraud proof or ZK-SNARK technology to achieve high transaction throughput while maintaining security Sex. However, we can also create an internal system to achieve high-throughput execution “in situ”, which is entirely possible.

What are the key characteristics of the sharding system? What are the trade-offs?

The main goal of sharding is to inherit the most important security attributes of traditional non-sharded blockchains as much as possible, without requiring every node to verify every transaction.

Fragmentation can basically meet these requirements. The following are the characteristics of traditional blockchains:

  • Invalid blocks cannot be added to the blockchain because the validating node will detect that the block is invalid and ignore the block.
  • Blocks with unavailable data cannot be added to the blockchain because the verification node cannot download the data and chooses to ignore it.

The following are the characteristics of a strong security sharded blockchain:

  • Invalid blocks cannot be added to the blockchain for the following reasons: the fraud proof will quickly detect the block, inform the entire network that it is an incorrect block, and punish the creator heavily. Or verify its correctness through ZK-SNARK, because it is impossible to generate a valid ZK-SNARK certificate for invalid blocks.
  • Blocks with unavailable data cannot be added to the blockchain for the following reasons: If the amount of available data in the block is less than 50%, it is almost certain that at least one data availability sampling check for each client will fail, resulting in The client rejects the block. If at least 50% of the block data is available, then the entire block data is actually available because only one honest node is needed to recover the rest of the data.

The traditional high TPS chain cannot achieve the above characteristics because there is no fragmentation. The problem with multi-chain systems is that if an attacker chooses a chain to attack, he can easily gain control, and the chains in the system can also share security, but if the security is low, it will be no different from the traditional high TPS chain. It will also inherit all the shortcomings of the traditional blockchain. If the security is high, the shared security is only a more complex implementation of the above-mentioned fragmentation technology.

Sidechains are highly dependent on the implementation method. If they share miners or verifiers, they are usually vulnerable to the weaknesses of traditional high TPS chains; if they do not share miners or verifiers, they will also face the weaknesses of the multi-chain ecosystem. The sharding chain avoids these problems.

However, the sharding system also has some hidden dangers. Especially in the following areas:

  • In the event of an adaptive adversary attack, the shard chain that only relies on the committee is difficult to cope with, and it is difficult to hold accountable. In other words, if an attacker can invade or choose to shut down any set of nodes in real time, then only a few nodes need to be attacked to destroy a committee. In addition, whether the attacker has strong adaptability or has 50% of the total pledge, if a committee is destroyed, the entire network can only identify a small number of nodes participating in the attack, that is, the nodes in the committee, and the penalty amount only accounts for a small amount of pledge. This is another key reason to explain why data availability sampling should be combined with fraud proof or ZK-SNARK to become an important supplement to random sampling technology.
  • Only when the number of online clients is large enough to generate enough data availability sampling requests, these repeated responses always constitute at least 50% of the block data. In practice, this means that hundreds of clients must be online, and the larger this number, the higher the ratio of system capacity to the capacity of a single node. This is a few-of-N trust model-usually very trustworthy, of course, it is not as robust as the 0-of-N trust model of non-sharded chain nodes in terms of data availability.
  • If the shard chain relies on fraud proofs, it must be based on timing assumptions, that is, if the network is too slow, the node may have finalized a certain block before the fraud proof shows that the data is incorrect. Fortunately, if you strictly follow the rules, once an invalid block is found, all invalid blocks will be rolled back. The time period parameter is set by the user. Each user can set the waiting time before confirming the block. If they don’t want to Waiting too long may suffer losses, but more cautious users are also safer. Even so, this mechanism will weaken the user experience. Using ZK-SNARK to verify the validity can solve this problem.
  • The amount of raw data that needs to be transmitted is much larger, increasing the risk of failure under extreme network conditions. Compared with a large amount of data, a small amount of data is easier to transmit, and it is easier to hide it safely if a powerful government tries to censor the blockchain. If the blockchain browser wants to maintain the information of the entire chain, it needs to store more data.
  • The sharding chain relies on a sharded P2P network, and each individual P2P “subnet” is more vulnerable to attacks due to fewer nodes. Because there is some redundancy between subnets, the subnet model of data availability sampling can alleviate this situation, but there are still risks.

These are issues that need to be paid attention to for data verification, although in our opinion, letting more applications run on the chain instead of centralized layer 2 services and reducing the centralization of the user layer will be more noteworthy than the above aspects. In other words, in fact, these problems, especially the last two problems, will really limit the increase of the shard chain throughput, making it impossible to exceed a certain scale. Quadratic sharding can only achieve finite quadraticity.

By the way, if the throughput is too high, the security risk of the sharding chain will increase day by day. To a large extent, this is also the main reason for giving up the expansion to super-secondary sharding. Keeping the quadratic shard to maintain its finite quadraticity seems to be an appropriate intermediate value.

Block production is centralized, and is it feasible to verify fragmentation?

People often propose an alternative to sharding, which is to use a structure similar to a centralized high TPS chain. In addition, data availability sampling and sharding are used to verify data validity and availability.

This solution can improve the existing centralized high-tps blockchain, but it is still far less powerful than the sharding system. Some of these reasons are as follows:

  • In a high TPS chain, it is more difficult to monitor the censorship of block producers.

Monitoring and review activities need to meet any of the following: (i) be able to see each transaction and verify that there is no reasonable transaction that has not entered inexplicably, or (ii) use the 1-of-N trust model in the block producer and verify that there is no The block cannot be chained. In a centralized high TPS chain, the first point is impossible to achieve, and the second point is more difficult to achieve, because the number of nodes is small, and even the 1-of-N trust model is easier to be destroyed, and if the block time of the chain Too fast for DAS (Data Availability Sampling) (like most centralized high TPS chains), it is difficult to prove that the nodes’ blocks will not be rejected simply because they are too slow to publish.

  • If most block producers and ecosystem members try to enforce a protocol change, although the change is not welcome, the user’s client will definitely detect the change, but for the community, it is difficult to refuse the change and make a fork. Much larger because of the need for new high-throughput nodes that are expensive to run and maintain a blockchain based on the old rules.

  • In a centralized infrastructure, it is easier for external attackers to perform censorship. Block production nodes have high transaction throughput and are very easy to be detected, and it is easy to shut down these nodes. Auditing dedicated high-performance computing is much easier politically and logistically than on a single user’s laptop. Modification: Compared with laptops that track individual users, it is easier to review dedicated high-performance computing in both logic and practice.

  • The transfer of high-performance computing to centralized cloud services will face greater pressure and increase risks, because the entire chain will run in the cloud services of 1-3 companies. If many block producers fail at the same time, it will increase the area. The risk of block chain collapse. Shard chains where all verification nodes are running on personal hardware will not be so vulnerable to this attack.

After the system is appropriately fragmented, it can be more suitable as a base layer. Based on a sharded basic layer, you can always create a centralized product system by building Rollup (for example, a high-throughput field with synchronous composability for DeFi). However, if the base layer relies on centralized block production, it is impossible to build a more decentralized Layer 2 based on this.

Adblock test (Why?)