Can expand 10 times performance without sharding? A brief understanding of the Ethereum Turbo-Geth client

Can expand 10 times performance without sharding? A brief understanding of the Ethereum Turbo-Geth client

Loading

Migrating Ethereum to other programming languages ​​one by one, even if sharding is not introduced, it is possible to expand the throughput by at least 10 times.

Original title: “Introduction | Turbo-Geth Client: Past and Future”
Written by: Alexey Akhunov
Translation: Ajian

Turbo-Geth, as a purely curiosity project, started in 2017 (yes, it was during the crazy congestion period caused by CryptoKitties). In the beginning, it was to explore alternatives to trie-based database schema. In March 2018, the Turbo-Geth project received a small bonus ($25,000) from the Ethereum Foundation. In the first and second quarters of 2019, Turbo-Geth was used as a state analysis platform for State Rent research. By the third and fourth quarter of 2019, Turbo-Geth was also used to perform back testing of stateless Ethereum. Before Devcon5 was held, I thought it was already very reliable in concept.

On Devcon5, I propose to stop accepting EIP within a year so that all implementations can be converted to similar data models. But because everyone was skeptical and the “core developers” group did not have this enthusiasm, my proposal was not accepted.

The skepticism mainly revolves around the method of efficiently calculating and updating the state root hash. At the EthCC 2020 conference in March 2020, we proposed a solution: an additional data structure called “Intermediate Hashes”. In the next few months, we fully realized this plan.

The idea of ​​staged sync comes from observing the measured value of per-table write churn. The solution to data churn is to insert data in a sequence of pre-ordered numbers. We carefully observed these phenomena at the end of 2019, but our first experimental implementation only showed significant performance advantages in February 2020.

Phased synchronization is a very significant change at the architectural level (but there is no major change in the data model), and we implemented this feature from March to July 2020. It is with it that we can significantly (10 times) compress the synchronization time.

Introduction | Turbo-Geth Client: Past and Future

Introduction | Turbo-Geth Client: Past and Future

In August 2020, we discovered a way to reduce the status representation data from 50 GB to 10 GB.

In September 2020, the granularity of the “intermediate hash value” function has been made finer, and the speed of calculating the state root hash has been increased by 4 times (reduced from 200 ms to 50 ms), and its data size has been reduced from 7 GB Reduced to 2.5 GB.

We are currently developing a suitable indexing of logs

So, what does all of this mean?

In fact, this does not mean anything, because the current implementation has not reached the limit of efficiency.

There are also several “unsolved mysteries”:

  1. The Merkel proof of the state in the long history cannot be efficiently generated (the efficiency of the recent history Merkel proof is no problem. It can be alleviated by introducing a snapshot of the intermediate hash value (the data is relatively also Not big)
  2. Some consensus calculations cannot be synchronized with phases. Ideally, both should be designed together

Silkworm

The idea of ​​creating a modular Ethereum implementation that conforms to the Apache 2.0 protocol and implemented in C++ began in early 2019, because at that time we saw that the “Aleth” project was basically abandoned.

But that is not a good time.

From May to June 2020, the time has finally arrived. There are 4 major turning points:

  1. We switched from BoltDB to LMDB (database implemented in C language), which can ensure database compatibility between Turbo-Geth and Silkworm.
  2. The phased synchronization mode _ naturally _ decomposes the implementation into relatively independent components. These components basically interact through the records in the database (or through the page in the memory to interact. If the interaction occurs in a database transaction Within). This means that we can create C++ implementations component by component.
  3. Earlier EVM experiments (using the EVMC interface) exposed the huge overhead of using a cross-language interface, and the dual interface of EVMC exacerbated this.
  4. We feel that we have enough experience to accomplish all this within a predictable time (within 1 year, not 5 to 10 years) and with the help of some experts.

future

Starting the Silkworm project also opened up our thinking. For example, we can migrate the implementations one by one to other programming languages ​​(such as Rust).

I believe that even if Ethereum 1.0 does not introduce sharding, it can extend the throughput by at least 10 times. We face three main challenges:

  1. A higher gas limit for a block will be more likely to cause DOS attacks. The safety limit of Turbe-geth may be 10 times higher than other implementations; while Silkworm may be higher.
  2. A higher gas limit results in larger blocks (data volume). This in turn creates two problems:
    • Block transmission problem. This can be handled through pre-consensus (essentially, transaction latency is sacrificed in exchange for transaction throughput)
    • Block download and storage issues. This can be solved by using specialized storage networks such as BitTorrent (these work is already in progress).

Source link: github.com