Filecoin introduces the blockchain to the field of distributed storage, and Filecash explores new technologies to weigh security and efficiency.
Written by: Luff
Just as the Bitcoin expansion dispute forked out of Bitcoin Cash, three years after Filecoin was born, community members based on different ideas launched the first forked project Filecash. Filecoin is the first crab-eater to introduce blockchain into the field of distributed storage. It hopes to design an incentive-compatible network system for storing important human data through a series of mechanisms.
However, the complicated system design makes the progress of Filecoin difficult. The original intention of “storing valuable data” has caused Filecoin to be controversial, which is mainly reflected in two points:
- Technical solution: high certification costs lead to high thresholds for online participation
- Economic model: Mortgage, punishment and other mechanisms are too harsh on miners
As a forked project of Filecoin, Filecash attempts to balance the conflicts of interests in the community through program improvements. This article mainly explains the operation of Filecoin and the new exploration of Filecash from a technical perspective.
Filecoin market mechanism
In order to better understand the technical principles of Filecoin, we first briefly introduce the market mechanism of Filecoin. Filecoin has constructed two markets: a data storage market and a data retrieval market. There are miners and users in the two markets.
Filecoin storage market and retrieval market operation process, source Filecoin white paper
Storage market
The storage market is a market where storage miners and users with data storage needs participate. In the storage market, customers put forward data storage requirements, and storage miners provide their storage space and storage services. A complete storage cycle is as follows:
First, the storage miners provide their own prices and storage requirements to the order book. The order book is public and anyone can view it. The service price of the storage market is determined by the market.
Second, when the customer’s demand price matches the storage miner’s order, the transaction is automatically matched.
Third, the verifier verifies whether the miner stores valid data, and uses zero-knowledge non-interactive proof for verification to effectively protect privacy.
Search market
The retrieval market is an off-chain market. Users can access the data they need through the retrieval market, and retrieval miners provide retrieval services. Retrieval miners do not participate in the block generation process, and directly obtain corresponding service fees from the client.
A complete retrieval cycle is as follows:
First, the user and the search miner broadcast bids and quotations, and if they find that the order matches, they initiate a transaction in the off-chain order book.
Second, after the transaction is concluded, both parties establish an off-chain payment and data transmission channel to complete the transaction.
Third, after the transaction is completed, the order and transaction are submitted to the blockchain for recording and the transaction result is verified.
How does Filecoin realize such a blockchain-based data market? The consensus mechanism is the key. It determines the basic operation logic of the blockchain and maintains network security.
Filecoin consensus mechanism: Expected consensus is the main one, and copy proof + time and space proof is supplemented
Filecoin adopts a hybrid consensus technology that is based on expected consensus and supplemented by proof of replication + proof of time and space. Among them, the expected consensus determines who generates the TipSet (a collection of blocks in the Filecoin network) within a cycle, while the proof of replication and proof of time and space maintain the stable and safe operation of the network.
Expected consensus (EC)
The expected consensus is born out of the Proof of Stake Consensus Mechanism (PoS), but the token rights in the Stake Consensus are replaced with storage. In each round of election of one or more leading miners to create new blocks, the probability of a miner winning the election is proportional to the current storage capacity of the miner.
In each round, the expected number of elected leading miners is e (a certain constant, such as 5). The elected miners create new blocks and broadcast them to the network. In Filecoin blockchain, each block height corresponds to a block set (Tipset), and each block set contains an indefinite number of blocks. This chain structure is close to a directed acyclic graph (DAG).
Filecoin hopes to build an open economic system that everyone can participate in without trust. The first thing is to ensure the safe and stable operation of the system to prevent network attacks. From a storage perspective, the network faces two problems:
- The miner does store the corresponding backup of the data according to the user’s needs, and the data can be accessed;
- During the validity period of the contract, miners are not allowed to delete user data.
How to restrict miners to achieve the above two points? Filecoin storage proof is the solution. Filecoin storage proof is composed of two parts: Proof of Copy (PoRep) and Proof of Time and Space (PoSt). The copy proof is used to solve the first problem, while the space-time proof is used to solve the second problem.
Copy proof
Simply put, proof of replication is the miner proving that he really stored the user’s data.
To explain in the words of its inventor Ben Fisch:
“Proof of Replication (PoRep) is an interactive proof system. In this system, the storage provider needs to provide a publicly verifiable proof to show that it allocates unique space resources for a data file copy, and the stored The data is retrievable.
Furthermore, PoRep enables the prover to prove that they are using no less than the minimum space required to store information, and actually use that space to store useful information. At the same time, PoRep can effectively extract any stored data. “
In the process of copying the proof, the storage miner stores user data in sectors (a sector size is 32G or 64G). After the sector is full, the miner encapsulates the sector. The encapsulation is a computationally intensive process, and it generates a unique identification code for the data. Once the data is encapsulated, the storage miner generates the proof, then compresses the proof with zero-knowledge proof, and finally submits the compressed result to the chain, which becomes the proof of the completion of the storage commitment.
Proof of reproduction is completed in four stages:
- Sealed pre-delivery phase 1 (P1): In this phase, PoRep SDR coding is performed. This stage is limited by the CPU and is single-threaded. This stage is expected to take several hours. The precise time depends on the size of the sealed sector, and of course, it also depends on the specifications of the sealing machine.
- Sealed pre-delivery phase 2 (P2): In this phase, the generation of the Merkle tree is performed using the Poseidon hash algorithm. This process is mainly GPU limited, but it should be expected to be much slower. When using GPU, this stage is expected to take 45 minutes to 1 hour.
- Sealed submission stage 1 (C1): This is an intermediate stage that performs the preparation work required to generate evidence. It is limited by the CPU and is usually completed in tens of seconds.
- Sealed submission phase 2 (C2): Finally, this sealed phase involves creating a SNARK, which is used to compress the necessary evidence before it is broadcast to the blockchain. This is a GPU-intensive process and is expected to take 20-30 minutes to complete.
Time and Space Proof
After the proof of copying is completed, the storage miner must prove that it continues to store user data, which will be completed through time and space proof. Space-time proof is a process of issuing encryption challenges to storage miners, and only miners who directly encapsulate sectors can answer correctly. Storage miners must meet this challenge within strict time constraints. There are two major challenges of time-space proof in Filecoin: WindowPoSt and WinningPoSt, which will not be expanded here.
The underlying mechanism of time-space proof, source Filecoin white paper
Filecoin security mechanism
The Filecoin white paper mentions that proof of copying can effectively prevent sybil attacks, generation attacks, and outsourcing attacks. But one question needs to be considered: As a proof algorithm, can copy proof itself be forged? The answer is yes. In fact, any public proof can be forged (attacked). The purpose of the mechanism design is to increase the cost of the attacker, so that the attack cost is higher than the profit to avoid the attack.
It is not difficult to imagine such an aggressive behavior:
The miner initiates a copy certificate when storing user data, and then deletes the data after the copy certificate is completed; and makes a copy certificate again when the space-time certificate is needed. In this way, storage miners don’t keep saving backups of user data, but they can still complete the entire storage proof and get rewards.
In fact, in Filecoin storage proof, the copy proof and the space-time proof are interdependent. Filecoin uses two time requirements for the proof process to avoid attacks. Specifically, the space-time proof must be completed within a short time, otherwise it is invalid; and if the copy proof cannot be completed within a short time, the attack cannot be launched. The greater the gap between these two times, the higher the security. Filecoin network’s requirements for proof time:
- The time-space proof must be completed in less than one block time (less than 30s), and the time of the time-space proof is set by the network;
- The proof of copy is designed to take several hours to complete, and the time consumption of the proof of copy is caused by the complexity of the algorithm.
From a purely security perspective, the longer it takes to copy the proof, the safer it is for the network. You can consider extending the time it takes to copy the proof from the following two perspectives:
- Algorithm length: an algorithm with more steps will take longer
- Algorithm parallelism: parallelism can reduce computing time by increasing resources
Specifically, the current SDR algorithm used in Filecoin Proof of Copy guarantees sufficient strength in both aspects.
The length of the algorithm is realized by multi-step calculation: for example, 11 layers of labels need to be calculated in the copy certificate. The number of layers can be adjusted. The more layers, the more steps and the longer the time;
There is a strong dependency between each layer of calculation to de-parallelize: when calculating labels, each step of calculation depends on the result of the previous step, so the calculation time cannot be shortened by increasing resources;
However, the price of security is often high cost. Filecoin brings several problems while ensuring network security:
- The high computational cost reduces the economic efficiency of the network. Like PoW, it only achieves storage proof by consuming expensive computing resources. This is contrary to the “useful consensus” advocated by the Filecoin white paper;
- The complex certification process has increased the requirements for the hardware configuration of the mining machine, especially the cost of the CPU, GPU and RAM.
Comparison of CPU and GPU configurations of mainstream Filecoin miners
Filecash’s technical trade-offs
Although Protocol Labs has been optimizing the replication proof algorithm, such as planning to upgrade to the NSE algorithm to improve cost and retrieval delay, it still cannot solve the efficiency problem in the short term. In the trade-off between security and cost, Filecash chose to appropriately reduce security in exchange for a lower barrier to participation.
Filecoin computing resource consumption is mainly concentrated in the first stage of the proof of replication, and the optimization of the Filecash solution is also mainly in this stage, which is reflected in the following points:
- Upgrade the P1 core algorithm, and upgrade the SHA256 algorithm to SHA512. Since AMD processors support the SHA256 extension, this will give AMD processors a great advantage in the P1 stage. It may take 30 hours to calculate the P1 process with Intel processors, but only 4 hours with AMD processors. This will cause a large number of idle Intel machines to be unable to participate in the Filecoin network because they are not economically beneficial. And upgrade to SHA512 algorithm Filecash can accept different processor machines at the same time.
- Reduce the number of calculation layers in the P1 stage from 11 to 8. With reference to some capacity certification projects, the 8-layer calculation layer can already provide sufficient security, but it can greatly improve economic benefits.
- Modify the sector size to 16G to reduce memory usage. The current sector size of Filecoin is 32G and 64G, which requires the miner’s computer to also need at least 64G of memory. Ordinary users or home computers cannot meet this configuration requirement, and they will be excluded from the system. By reducing the sector size, more relatively low-configuration devices (such as home computers) can participate in the early network.
The core logic of the Filecash technical solution is to lower the threshold of ecological participation, so that a large number of idle devices and family customers can participate in the network, and provide consensus for the network. A strong consensus can escort the ecology. A strong consensus bottom layer can attract more developers and users to the ecology. Only an active ecology can provide miners with stable mining income. Stable mining income promotes more miners to participate in mining. In mining behavior, ecology forms a virtuous circle.
Filecoin’s expected consensus mechanism is equivalent to introducing a complex proof process on the basis of PoS. The network performance shown in the first phase of the space race is worrying, so that transaction fees have soared and deviated from actual application scenarios. In response to the lack of TPS and the inability to effectively review the content, Filecash adjusted the consensus mechanism and adopted: DPoS + PoRep + PoSt hybrid consensus. The network carries out a series of economic activities around miners, and the mining work provides consensus and tokens for the underlying support; DPOS nodes serve as the core bearer, providing reliable high TPS and network availability. At the same time, Filecash has redesigned and optimized multiple components such as virtual machines, cross-chains, oracles, so that the Filecash network can seamlessly interact with mature blockchains such as ETHDOTBTC, and solve the application ecology between blockchains. Difficult problem of intercommunication.
The new pattern of distributed storage
Filecoin has long been famous, and whether Filecash, as a rising star, can shake Filecoin is still unclear. However, under the dispute between the Filecoin team and the community miners, the voices about the fork have become more and more popular. It can be expected that with the launch of Filecoin’s mainnet, many new faces will emerge in the field of distributed storage, and they will explore a different path from Filecoin.
Rather than saying that Filecash is challenging Filecoin, it is better to say that Filecash is exploring new possibilities in the field of distributed storage.
After all, history does not remember the first person to eat crabs, but the deliciousness of crabs.