40 total views
The distributed file system will give birth to more brand-new Internet applications instead of the technical iteration of the traditional Internet.
Original title: “Guosheng Blockchain | New Blockchain Infrastructure (3): What does distributed storage bring to the Internet? 》
Written by: Song Jiaji, Ren Heyi
Based on the prospect of distributed storage: Distributed storage will bring about changes and innovation space for the Internet infrastructure and business model, which will open up new storage application markets. Distributed storage uses resources and market incentives that are different from traditional centralized storage, which can solve the security, timeliness and cost issues under the centralized Internet architecture, and will bring about changes in the Internet infrastructure. On the other hand, distributed storage creates a foundation for individual nodes to join the market for exchange, which can deeply mine the value of data content and open up a new storage space and application market. The current distributed storage still faces technical bottlenecks such as I/O performance problems, data value layering, application service quality, etc., and it is necessary to introduce a centralized organization to make up for it in actual applications.
The distributed storage system represented by the IPFS protocol brings new ideas for storage and will become the next-generation Internet infrastructure. The IPFS protocol is a file storage and content distribution network protocol that integrates a variety of successful distributed systems and blockchain technologies to provide users with unified addressable data storage. Its essence is a P2P distributed storage system. Everyone can act as a server for storing files and access file resources on the network through a unique code generated based on the content of the file. Distributed storage systems can be combined with decentralized blockchain technology to solve the problems of data storage security, user collaboration timeliness, storage and bandwidth costs under the centralized Internet architecture, and will bring about changes in the Internet infrastructure .
Distributed storage will fully stimulate the market value of personal storage resources and content contributions, and innovate Internet business models. The explosive growth of global data volume promotes the rapid development of the cloud storage market. Edge cloud computing and small data centers have become industry trends. Distributed storage is expected to take the lead in opening up the personal cloud storage market. Individuals can put their idle storage resources into distributed storage systems for market exchange, and can securely publish, exchange, and share value on the Internet. Distributed storage drives the allocation of resources in the personal storage resource market. It is impossible for traditional Internet giants to control center cloud mode.
Distributed storage has been continuously integrated with traditional storage. The existing technical bottleneck needs to be compensated by the introduction of a centralized organization. Existing storage solutions usually combine distributed technology with traditional storage solutions: on the one hand, data is backed up and stored in a distributed manner to bring the data closer to the edge while avoiding physical damage and artificial tampering of the data; on the other hand, Reduce system operation and maintenance costs and improve service quality through a certain degree of centralized storage and centralized management. The existing distributed storage still faces several technical bottlenecks: First, the current distributed storage is temporarily unable to achieve data value stratification, and it is difficult to achieve effective incentives. Consider combining the underlying architecture and application layer strategies; second, distributed storage Storage has a lot of room for optimization from code implementation to protocol layer, and it is also limited by network scale, and there are I/O performance problems; third, users with higher storage data value need to bear greater service quality risks and willingness to pay Weak and require application layer solutions. In short, considering issues such as system operation and maintenance costs, service quality, and macro-supervision, the future distributed storage system needs to introduce a centralized organization to make up for operating costs.
Investment advice: Among the current A-share listed companies, storage targets will benefit from the changes brought by distributed storage in the long term, including Zhaoyi Innovation, Beijing Junzheng, Tongyou Technology, Netac Technology, Ziguang Guowei, Yangtze River Storage, etc. , Please refer to the depth of individual stocks for specific content.
Risk warning: The distributed storage business model is not as expected; the development of distributed storage technology does not meet expectations.
Core recommendation logic
Based on distributed storage, a new storage application market will be opened up. Distributed storage uses resources and market incentives that are different from traditional centralized storage. It not only makes full use of distributed node resources, but also creates a foundation for the exchange of content contributed by individual nodes in the market, thereby in-depth mining of the value of data content and development The new application market, this is impossible to achieve in the case of traditional Internet company control center cloud platform data. At the same time, distributed storage and centralized storage will continue to merge, changing the existing Internet architecture and business model.
Our view that is different from the market
The market underestimates the space for change and innovation that distributed storage brings to the Internet infrastructure and business model. The market usually regards distributed storage as a new technology, while ignoring the potential of personal storage resources, user content contribution value mining and market exchange brought by distributed storage. On the Internet of the distributed file system, personal storage resources can be put into the market for resource exchange, and the content value contributed by users is provided with a platform for market exchange on the basis of data confirmation and security. Therefore, the distributed file system will give birth to more brand-new Internet applications, rather than traditional Internet technology iterations.
Distributed storage will become the next generation Internet infrastructure
At present, the Internet connects a large number of computer (smart mobile) terminals together, allowing users to access and store massive amounts of data on other computer terminals. The transmission and access of data is realized based on the Internet protocol represented by HTTP (Hypertext Transfer Protocol). The data is stored centrally with the address of the computer (server) terminal IP (or domain name), and the specific data storage server The node is like a centralized warehouse, which has to bear huge traffic access and data transmission pressure. Can the data files be distributed on different server nodes on the network to revolutionize the Internet infrastructure?
Distributed storage protocols like IPFS have gradually emerged. As a supplement to HTTP, a global, point-to-point distributed version of the file system can be created, which can connect all computing devices with the same file system. As far as IPFS is concerned, users are looking for content stored in a certain place (the content is scattered across different server nodes) instead of a certain address. They only need to confirm the hash of the verification content, so that they can get faster and faster. Safe, robust, and durable web pages.
We will explore how distributed storage will bring about changes in the Internet infrastructure and what new application scenarios and markets will be created.
Distributed storage represented by the IPFS protocol brings new ideas
The IPFS protocol is a file storage and content distribution network protocol that combines a variety of successful distributed system ideas with blockchain to provide users with unified addressable data storage. IPFS (Inter-Planetary File System) was proposed by Protocol Lab, which literally means Inter-Planetary File System. Its essence is a P2P distributed storage system that connects all computing devices with the same file system together. The goal is to supplement or even replace the hypertext transfer protocol HTTP. Different from the existing web protocol, for a file resource stored on the IPFS network, it is not accessed by an address based on the domain name, but is accessed through a unique code generated based on the content of the file. There is no need to verify the identity of the sender. The hash of the content needs to be verified, which can make the web page faster and more secure. A blockchain is running on the IPFS network, which is a hash table used to store Internet files. Every time there is a network access, the address of the content (file) must be queried on the chain. The biggest feature of the IPFS protocol is the system coupling and the comprehensive design. Its integrated distributed technology includes BitTorrent protocol, version control system Git, MerkelDAG, distributed hash table DHT and self-certified file system SFS. Therefore, in the IPFS system, everyone can serve as a file storage server.
The IPFS protocol draws on the many advantages of the BitTorrent protocol and innovates to create a persistent and distributed network transmission protocol for storing and sharing files. BitTorrent (BT for short) is a widely used content distribution protocol, which is characterized by making full use of users’ upload bandwidth, so that the more download users, the faster the download speed. In the FTP and HTTP protocols of centralized storage, each user downloads required files, and there is no interaction between users. Such as HTTP, every time when there are too many users accessing and downloading files at the same time, due to the limitation of server processing capacity and bandwidth, the download speed will drop sharply, and some users may even be unable to access the server. Under the BT protocol, the distributor or file holder sends the file to a user, and then the user forwards the file to other users, and the users forward the file part they own to each other until the download of each user is completed. . This method allows the download server to process multiple download requests for large files at the same time without occupying a large amount of bandwidth. Therefore, it is often used for the release of large documents and free software to reduce the burden on the server.
The IPFS team innovated BitTorrent, adding a credit and billing system to encourage each node to share data, called the BitSwap protocol. Users sharing data in BitSwap will increase their credit points, and receiving data from other nodes will reduce their credit points. If the user only retrieves the data without sharing the data, the credit score will get lower and lower, which will be ignored by other nodes.
Similar to the seven-layer protocol model of the Internet, the IPFS architecture is divided into eight sub-protocol stacks. As a distributed storage protocol, IPFS’s core functions include: multi-person coordination of file content, retrospective version, non-tamperable, discreteness, scalability and good fault tolerance brought by DHT management, and a file domain name system based on IPNS.
In terms of content version, IPFS uses the distributed version control system Git, which supports multi-person collaborative work, records each update and marks a different version number. Once a problem occurs, the file can be traced back to any previous version. Both the local version control system and the centralized version control system use a single server to save the revised versions of all files. Once the server fails, it faces the risk of losing all data. Git is a type of Distributed Version Control System (DVCS). In addition to saving the latest version of the file, the client also mirrors the code repository and historical records completely. In this way, any server that works together can be restored with any local warehouse. Git can also compare the details of file changes to find out who made what changes, so that you can quickly and accurately find out the cause when a problem occurs. Furthermore, many DVCS systems can be specified to interact with a number of different remote code repositories. Users can collaborate with people in different working groups in the same project, and set up different collaboration processes according to their needs. It is impossible in the system.
The IPFS team transformed the Git data structure and obtained Merkel DAG on the basis of Merkle Tree, which has three functions: content addressing, tamper resistance, and deduplication. IPFS divides the file into a single data block with a size of no more than 256kB, each data block has a unique hash value, and constructs a Merkel DAG to organize all file fragments. Merkel DAG is a core data structure that implements a versioned file system. It has fewer restrictions than Merkle Tree, but retains its two essence: 1) The hash of the parent node is determined by the hash of the child node, that is, the hash of the parent node is determined by the child node. The hashed string is hashed again; 2) The parent node contains information pointing to the child node. Any change of a lower-level node will result in a change in the hash value of the upper-level node, and ultimately the hash value of the root node will also change. Therefore, the three functions of Merkle DAG can be realized: 1) Content addressing: using multiple hashes to uniquely Identify the content of a data block; 2) Anti-tampering: The data recipient only needs a hash value on the Merkle path to check whether the data has been tampered with; 3) Deduplication: The hashes of data blocks with the same content are the same , You can delete duplicate data accordingly and save storage space.
The routing function of IPFS uses a distributed hash list DHT to help customer nodes quickly find the node where the required data is located, with discreteness, scalability and good fault tolerance. DHT is a distributed hash table that provides query services through stored key-value pairs: key-value pairs are stored in DHT, nodes can retrieve the value corresponding to a given key, and the mapping of key-value pairs is maintained by all nodes in the network . Without a server, each node is responsible for a small part of routing and data storage, so as to realize the addressing and storage of the entire DHT network. Even if a node joins or leaves, the impact on the entire network is small, so DHT can be extended to very large nodes (tens of millions). DHT has the following properties: 1) Discrete: The nodes that make up the system are equal, and there is no central control mechanism to coordinate; 2) Scalability: No matter how many nodes the system has, it requires efficient work; 3) Fault tolerance: Nodes are constantly joining and leaving without affecting the work of the entire system.
IPNS is the file domain name system of IPFS. Like the domain name (URL) of the HTTP system, the user only needs to query the file name when searching for a file, and is not affected by the change of the file content. The hash value of a file in IPFS depends entirely on the content of the file, which is not only difficult to remember. Once the file content is modified, the hash value will also change. Every time the file is updated, the referenced hash value needs to be updated, which is very inconvenient. In order to be able to change the content of the file without breaking its link, the IPFS team used a domain name system that marks the update of URL hashes, the Interplanetary Name System IPNS. IPNS is a decentralized naming system that uses hash-like addresses to safely point to variable content. Each file can be collaboratively named with a readable name, and the file can be found by searching. The self-certified file system SFS names files, and at the same time provides IPNS to solve the propagation problem, which solves the problem that current users are not used to inputting hash values to access files. A system is built between the existing Internet system and the IPFS system. bridge.
Simply put, files stored based on the IPFS protocol are broken up into many verifiable fragmented files (data is uniquely marked by hash value encoding), distributed on the network, and visitors find the location of these files through content encoding and download them Because it is distributed storage (the same content may be stored by multiple servers), it is not necessary that all node servers must be online. In this way, IPFS hopes to achieve the goal of creating a persistent and distributed storage and network transmission protocol for sharing files. The difference between HTTP, which represents traditional centralized storage, is very obvious-HTTP files are stored in a centralized manner, accessed through the file’s domain name, and the domain name file server needs to be kept online, otherwise it will not be accessible.
Distributed storage will bring about changes in Internet infrastructure
With the development of the Internet and communications, artificial intelligence, Internet of Things, cloud computing/edge computing and other technologies, everything can be recorded and expressed with data, and data has changed from single internal small data to multiple dynamic big data. According to IDC predictions, the scale of the global data circle will increase from 33ZB in 2018 to 175ZB in 2025, and unstructured data such as text, pictures, and videos will have a higher growth rate, and the proportion of the overall data circle will continue increase. Therefore, a more advanced Internet infrastructure is needed to collect, store and utilize data.
At present, the main problems under the centralized Internet architecture are concentrated in the three aspects of security, timeliness and centralization, and the distributed storage protocol represented by IPFS will bring about changes in the Internet architecture by solving the following problems:
The traditional HTTP protocol uses an asymmetric architecture to achieve high concurrency in the network, but the central server cannot afford to transmit too much data, which affects the user experience. Cloud computing vendors and telecom operators need to pay a large equipment cost for this. The IPFS protocol solves the storage problem of hot files, but a file can only be accessed continuously to ensure its storage effectiveness. Unpopular and valuable files are easy to lose. The main reason is the instability of the node caused by the lack of incentive layer. At present, HTTPX (Grid Fission System), a distributed technology benchmarking IPFS, is also quietly emerging, providing decentralized CDN services, storage services and GPU computing power services. HTTPX takes into account the advantages of the HTTP protocol, redefines routing and transmission logic, adopts a symmetrical architecture, and splits the network to an unprecedented degree.
HTTPX is a lighter, more flexible, and more complete P2P technology. The technical architecture design of HTTPX belongs to the grid design. Each node is both an independent individual and a global function body, which can support storage, calculation and transmission of data. The user connects to the nearest node to access the HTTPX network, the node will find the neighboring node, find millions of levels of information, locate the resource storage node, and transmit it back to the user’s neighboring node through the optimal network transmission path. HTTPX has obvious advantages over IPFS and is expected to push cloud computing services to new heights:
High performance: The grid system design greatly shortens the physical distance and network distance from the user to the node. In the actual test, the TTL is reduced by 60%, providing a higher-quality service response with lower latency;
* *Low cost: Serving the industry chain, with low pricing; high hardware compatibility, and can be deployed to homes, communities, and office locations;
Strong compatibility: Compatible with HTTP and HTTPS protocols, while providing advanced HTTPX open source code access mode;
* *Strong strength: Adopting P2P thinking, CDN support ability is outstanding; provide a mining mode of storage and GPU resources, and truly achieve multiple functions in one machine.
* Quick release: The bandwidth demand is large, the release cycle is short, and the resource provider does not have to worry about capital turnover problems caused by project delays.
Distributed storage opens up a new pattern for the Internet infrastructure industry
Distributed storage develops a new storage market
The explosive growth of global data volume promotes the rapid development of the cloud storage market. Cloud storage is a cloud computing service with data storage and management as the core. It refers to the collection of a large number of different types of storage devices in the network through application software to work together through cluster applications, network technology or distributed file systems. A system that provides data storage and business access functions. In other words, cloud storage is to place resources on the cloud for people to access. Users can connect to the cloud through a networked device to conveniently access data at any time and any place.
According to the nature of cloud storage services, it can be divided into public cloud, private cloud and hybrid cloud. Among them, the public cloud is oriented to a variety of customers including individuals, families, and enterprises; the private cloud is used and maintained by enterprises or organizations, and users have more control over personalization; the hybrid cloud combines public and private clouds. Mix and match to achieve a relatively cost-effective solution. According to IDC forecasts, China’s data scale will reach 48.6ZB in 2025, of which more than 80% will be unstructured data, and due to the 4-5 years behind North America, the growth rate of China’s cloud market is higher than the global level. In 2018, the overall market size of China’s cloud computing reached 96.28 billion yuan, with a growth rate of 39.2%. Among them, the public cloud market reached 43.7 billion yuan, a growth rate of 65.2%, and it is expected to continue to grow rapidly in the next three years.
Distributed storage will open up new application scenarios, fully stimulate the market value of personal storage resources and content contributions, and innovate Internet business models. With the development of distributed storage technology and ecology, resource allocation in the personal storage resource market will be fully stimulated, and more personal storage resources will be encouraged to enter the market-that is, individuals can invest idle storage resources into the distributed storage system for market exchange This is impossible to achieve under the cloud model of the traditional Internet giant control center. More importantly, personal content sharing on the Internet will be able to publish, exchange and share value safely. For example, D.Tube is an encrypted distributed video platform built on the STEEM blockchain and IPFS peer-to-peer network. The goal is to become an alternative to YouTube, allowing users to watch or upload videos on the basis of IPFS, and use the immutable STEEM Share or comment on the blockchain and earn encrypted tokens. All data of D.Tube is public, and anyone with an Internet connection can analyze it, and it can run without advertising, providing the best user experience. It can be said that almost any existing Internet application can be migrated to the distributed file system to obtain new experiences and innovative business models. There is limitless room for imagination.
Edge cloud computing and small data centers have become industry trends, and distributed storage is expected to be the first to open up the personal cloud storage market. In November 2019, the scale of mobile Internet users reached 1.31 billion, and the monthly active users of personal online disks exceeded 100 million. The personal storage market still has a huge potential user base and available storage space. Faced with the rapid growth of data scale, edge cloud computing and distributed storage have become industry trends. Using distributed file systems to put personal idle storage resources into the network for market exchange will become one of the first areas for distributed storage to enter , There are already start-up applications in this area.
Distributed storage has been continuously integrated with traditional storage
In the actual application of distributed storage, a certain degree of centralization cannot be avoided, so it is often integrated with traditional storage solutions. Distributed storage will bring about system performance and management costs. Therefore, existing storage solutions usually combine distributed technology with traditional storage solutions: on the one hand, data is backed up and stored in a distributed manner to bring data closer to the edge , While avoiding physical damage and artificial tampering of data; on the other hand, through a certain degree of centralized storage and centralized management to reduce system operation and maintenance costs and improve service quality.
Case 1): Chuxun’s distributed storage cloud service
Shanghai StorSwift Information Technology Co., Ltd. (StorSwift) is a high-tech enterprise focusing on the storage and management of enterprise production data. The core team comes from the American storage company Rasilient and has more than 15 years of storage industry R&D and operation and maintenance experience. Chuxun has core hardware and software technologies in the fields of large-scale storage operation and maintenance, storage security, and performance optimization. So far, it has deployed and stored more than 300PB of key business storage data. The storage and processing of image data is in a leading position in the industry. Chu Xun has provided successful storage solutions in the security, medical, media and other industries, and has developed business cooperation with many companies such as Intel and China Mobile.
Chuxun provides enterprises with professional distributed data storage solutions and has rich experience in data storage optimization, I/O optimization, and large-scale system operation and maintenance management. The main products include high-performance distributed file systems, distributed block storage, distributed object storage gateways, etc., while providing a complete set of Filecoin solutions from hardware selection to mining program optimization, storage performance optimization, and operation and maintenance scheduling. Compared with traditional centralized data center storage, the advantages of distributed storage are that data storage is more dispersed, and the dependence on geographic location is weaker. It can avoid multiple risks, realize enterprise light-asset operations, and reduce operation and maintenance costs.
Case 2): CRUST link distributed cloud
CRUST is a digital encryption application layer based on the Meaningful Workload Proof Mechanism (MPoW) and Consensus of Proof of Security Interests (GPoS). It is also a new generation of blockchain technology that supports decentralized storage and computing. CRUST implements the incentive layer protocol for decentralized storage, adapts to a variety of storage layer protocols including IPFS, and provides support for the application layer: the first layer quantifies resources and workloads, and provides a way that everyone can recognize The calculation method of MPoW, which is the basis for solving the problem; the second layer uses GPoS to reach a consensus and jointly maintain the network; the third layer provides users with decentralized storage and retrieval services, and the CRUST architecture is also capable of decentralization The computing layer provides support to build a distributed cloud ecosystem.
The biggest difference between CRUST and Filecoin is the use of a trusted execution environment (TEE). The core concept is to use a third-party hardware as a carrier to ensure that the data created and run in it cannot be attacked and tampered with. Mainstream chip manufacturers, such as Intel, AMD, ARM, etc., have TEE space in their CPUs, and can run open source packages approved by CRUST community members to realize the supervision of resource quantification work, and then pass the quantitative certification The signature is sent to the blockchain network. Filecoin’s method of proof of node workload is zero-knowledge proof and network cross-validation. It is also an open source algorithm, but it brings about hardware consumption and bandwidth requirements. It also needs to deliberately increase the complexity of the algorithm to ensure that the node is within a short time Can’t cheat. TEE solves these problems and can complete the resource certification process locally, reducing network resource occupation, and simplifying the process of quantifying workload. In addition, because the program in the TEE does not need to worry about tampering when encapsulating and saving data, the algorithm is more efficient and users can get a better experience.
Technical bottlenecks and development opportunities faced by distributed storage
Data value stratification is the key to economic incentives for distributed storage
The market value of different data is different, and different individuals have different judgments on the value of the same data. When the storage node does not know the content of the data and it is difficult to judge the value of the data, how to effectively optimize the allocation of storage incentives and data market value?
Data value stratification is the key to identifying data value and realizing effective incentives. Distributed nodes are responsible for storing data fragments, but they do not know the content of the data and the value of the data. In other words, it is difficult to achieve a more optimized market incentive adjustment if the workload measurement of the miners cannot consider the value of the data. Taking the proof of time (PoST) mechanism adopted by Filecoin as an example, the measurement of the storage workload of the miner node has nothing to do with the data value of the file fragment content. Filecoin currently does not specifically layer the value of data, but only distinguishes junk data from verifiable data. The existing consensus mechanisms are limited to the measurement of the storage workload of miners, and cannot represent the value of data. The cost of physical damage to miner nodes and poor network service quality is measured by economic incentives, but the resulting loss of user data service quality and data value is not equal. Simply put, what miners lose most is the system’s economic incentives, while users may lose data damage or poor business service quality. After all, for miners, the core measurement factor of the value of data is storage capacity.
The combination of the underlying architecture and the application layer should be considered to solve the data layering problem. Solving data value stratification is critical to the economic incentives of distributed storage. It is difficult to solve alone at the infrastructure level and must be implemented in combination with the application layer. Combining application scenarios, the layering of data is realized at the application layer, so as to realize the layering of miner nodes; for example, for some data with higher service quality and higher content value, a miner market with higher incentive prices can be delineated, and miners The hardware configuration and service quality of the nodes are required to be higher. Such applications are more convenient to implement in private networks and local area networks. For different application scenarios, different application layer strategies are adopted. In other words, it is difficult for a wide and unified distributed storage network to meet the needs of various scenarios and individual users. It is feasible to adopt different application layer solutions for different scenarios to complete the data value layering.
I/O performance bottlenecks require joint optimization and solution of the bottom and application layers
Distributed storage will cause system I/O performance problems. Compared with traditional storage systems, distributed storage requires files to be broken, multi-node backup storage, and a large number of fragmented files need to be scheduled when querying and using data, and the amount of engineering is huge. In addition, when the file is large, the hash table used for content addressing will be large and the addressing time will be longer. More importantly, there are also many uncertainties in the impact of the network resource status of the miner node on the network I/O performance, especially for those streaming media data. If the node network status of some data fragments is poor, it will affect the entire network. The access service quality of data files. Under centralized storage, these problems can be solved through CDN and other means, and the customer experience can be better. Therefore, the I/O efficiency of the existing distributed storage system is one of the primary considerations in terms of new technology. There is a lot of room for optimization from code implementation to protocol layer, and further breakthroughs are needed.
Experimental tests show that the I/O performance of IPFS needs to be further improved. Due to the use of distributed storage, file reading is affected by the node itself and other nodes in the entire network, mainly including: the number and stability of nodes, bandwidth, network (geographic) location, etc. Researchers from Fudan University did an experiment on the I/O performance of IPFS and compared it with HTTP to verify the latency and throughput performance of the IPFS network when processing requests. In terms of the average latency of remote read operations, when the request is a small request, between 1k-4k, the HTTP latency is lower than IPFS. When encountering files between 16-256k, the delay of IPFS is due to HTTP. When processing large file requests, IPFS’s performance in latency is not satisfactory, especially when the request size is 16MB, the processing time of IPFS is close to 20 seconds. When it exceeds 64MB, the delay can reach 70 seconds, which is 7 times that of HTTP (10 seconds). Of course, this is the result of laboratory test conditions. In the actual application process, there are no convincing cases. In any case, if distributed storage solutions such as IPFS want to replace centralized methods such as HTTP, there are many underlying technical frameworks. , Protocol and application ecology improvement and exploration.
The transmission efficiency of the distributed network is still very dependent on the network scale, and the incentive mechanism needs to be improved. The P2P file interactive transmission protocol uses the method of breaking up files and multi-point resuming transmission. The transmission efficiency is very dependent on the number of nodes in the distributed network. Therefore, it is necessary to improve the incentive mechanism to enable node users to actively access the network when they have no download requirements and provide storage services for other users. When the number of online nodes in the decentralized network is relatively stable, the transmission speed will be faster. In the long run, the decentralized storage system after coordinating a lot of resuming transfers has a number of 100,000 or even millions of nodes, and the system I/O efficiency is expected to be comparable to the current centralized + CDN storage system efficiency.
Service quality assurance
There is still a lot of room for optimization of the service quality of decentralized systems. There are not many existing landing applications in the distributed storage market, and they generally face problems such as insufficient number of nodes and insufficient application layer development. User experience cannot be compared with mature centralized storage products, so the willingness to pay is also weak. Therefore, in order to solve the problem of how to use unreliable distributed nodes to provide reliable storage services, it is first necessary to set up a joint recognition incentive and punishment mechanism, and secondly, it is necessary to regulate miners through means other than economic incentives (such as operating mechanism inspection) behavior.
Users with higher stored data value need to bear greater service quality risks, and application layer solutions are urgently needed. Blockchain is only responsible for monitoring the completion of miners’ workload and rewards and punishments, but it cannot make up for user losses, so service quality issues are left to the application layer to solve.比如，可以根据矿工的历史惩罚记录区分服务质量等级，用户需要存储重要数据时，可以资源选择收费更多的、质量更高的存储服务。只有当越来越多的用户愿意消费，网络中的节点越来越多时，去中心化存储系统的整体效率才能提升，服务质量才更有保障。因此，考虑到系统运维成本、服务质量和宏观监管等问题，分布式存储系统无可避免地会存在一定程度的集中管理和控制。
显然，分布式存储在I/O 瓶颈、数据价值分层和应用服务质量方面需要引入中心化组织形式来弥补运营成本。分布式存储所面临的上述问题，给应用带来了较高的运营成本，因此可以引入中心化的组织方式来弥补高昂的运营成本，就像BitTorrent 采用中心化的方式来管理哈希表DHT 一样。简单说，数据碎片可以分布式存储，但在不同的应用场景可以引入一些限制。例如，I/O 瓶颈方面，针对那些对I/O 性能较高的应用，如流媒体数据，则以激励方式鼓励节点在适当的物理位置或提高节点I/O 性能。在数据价值分层方面，对于一些特别重要的数据，核心数据采用中心化的存储、一般数据采用分布式存储，这类相结合的方式是较为现实的解决方案。应用服务质量方面，限制数据文件存储的物理和网络位置、QoS 保障，来确保数据文件的安全，因此，需要对此类矿工进行一些激励补偿。另外，在应用层面，复杂且较长IPNS 对于用户是较难记忆和操作的，类似于DNS 服务实现IP 地址和域名之间的管理一样，利用中心化的方式解决IPNS 用户不友好的，引入类似文件存储域名的服务，这也是中心化与分布式存储进一步融合的方向。
感谢分布式资本提供研究支持，以及储迅信息技术、Crust Network 等代表性企业的交流分享。