The Swarm white paper is released: Explain the Swarm storage mechanism and API functions in detail

The Swarm white paper is released: Explain the Swarm storage mechanism and API functions in detail

Loading

The Swarm 1.0 mainnet has been launched, let’s find out what features it has.

Original title: “Swarm Latest Official White Paper”
Written by: Swarm

This article is the full text of the latest white paper published by Swarm, compiled and translated by Blue Shell Cloud Storage. The content is for reference only. Please refer to the official white paper as final.

Introduction

Swarm’s mission is to shape a self-sovereign global society and an open market without permission by providing a scalable underlying infrastructure for the decentralized Internet. Swarm’s vision is to extend the blockchain through a point-to-point storage and communication system to make the “world computer” a reality. This “world computer” will be used as an operating system and deployment environment for decentralized applications.

Swarm can provide uninterrupted services and more effectively resist network interruptions or targeted DoS attacks. Swarm, as a license-free publishing platform, effectively promotes freedom of information. Swarm responds to the growing network security needs with its unique privacy features, such as anonymous browsing, deniable storage, untraceable messaging, and file format that does not leak metadata.

Swarm’s built-in incentives are designed to optimize the allocation of bandwidth and storage resources to make it economically self-sustaining. Swarm nodes track their corresponding bandwidth contribution in the connection with each node, and use BZZ to solve the additional debt due to unequal consumption. Publishers in Swarm must spend BZZ to purchase the right to write data to Swarm, and prepay rent for some long-term storage.

Swarm’s modular design consists of clearly separable layers. Technically speaking, layer 2 “immutable storage overlay network” and layer 3 “high-level data access via API” constitute the core of Swarm.

Swarm white paper released: Detailed explanation of Swarm storage mechanism and API functions

DISC: Distributed immutable storage of blocks

DISC (Distributed Immutable Storage of Chunks) is the underlying storage model of Swarm. It consists of nodes that store and provide data. In the collaboration between these nodes, assuming that each node pursues a strategy that maximizes the profits of its operators, the network as a whole will exhibit the following characteristics:

  • Privacy protection and permissionless upload and download
  • Strong defense measures, once the content is published, it is difficult to block or change access rights
  • Automatically expand as demand increases
  • Integrity-protected content
  • Content that no longer needs to be saved will eventually be forgotten

Anyone with excess storage space and bandwidth can participate in DISC as a node operator and receive rewards. When the operator installs and runs the Swarm client software, a new node will be created and become a part of the Swarm network, which is basically equivalent to taking care of a small part of the Swarm, a global hard drive.

Next, we will further define DISC and explain why it produces the above characteristics.

Connection, topology and routing

The initial responsibility of DISC is to establish and maintain a network of nodes so that all nodes can send messages between each other. This message exchange is carried out through a persistent and secure communication channel that exists between nodes using the p2p network protocol (libp2p). Swarm expects nodes to establish Kademlia connections: when connecting to other specific node sets, the node’s local decision on the sending address will eventually allow the transmission of messages to find the global optimal route.

Kademlia assumes that each node is assigned a Swarm address that is different from its network address. By calculating the common value of the two Swarm addresses in the value of the prefix bits, we can define their proximity. The nodes closest to each other will form a fully connected neighborhood. In addition, each node is connected to multiple peer nodes from each discrete proximity class.

Swarm white paper released: Detailed explanation of Swarm storage mechanism and API functions

  • Note 1: Libp2p is a network framework that is convenient for users to develop decentralized peer-to-peer applications.
  • Note 2: Kademlia is a P2P overlay network transmission protocol designed by Petar Maymounkov and David Mazières to build a distributed P2P computer network. It is a P2P information system based on XOR operation. It lays down the structure of the network and regulates the way of communication and exchange of information between nodes.

The resulting topology ensures that relaying moves the message at least one step closer to its intended destination in each information transition. This technique enables messages to be routed between any two nodes, even if the two nodes do not maintain a direct connection. The upper limit of the number of transitions required to deliver a message is the logarithm of the total number of nodes, so that even in an extremely large network, any two nodes can always be connected to each other.

Swarm white paper released: Detailed explanation of Swarm storage mechanism and API functions

Block and storage

The standard storage unit in Swarm is called a chunk. The block consists of up to 4 kilobytes of data and is accompanied by an address. Since the address of the block and the address of the node come from the same address space, their proximity can be calculated. Swarm’s storage scheme states that each block is stored by the node, and its address is close to the address of the block itself.

In order to facilitate the confidentiality of the data, the block can be encrypted after being filled to 4 kilobytes, so that people without the key cannot distinguish it from other random data. Even for unencrypted blocks, node operators cannot easily determine what content each block comes from. Since Swarm nodes cannot choose which data blocks to store and encrypt on their own, the ambiguity of the source and the inability to leak metadata provide them with effective protection, freeing them from responsibility for the content they store.

In order to insert the block into the Swarm, the node forwards the block through the push-sync protocol until it reaches the neighborhood to which it belongs. Then, the storage confirmation of the block will be sent back along the same path. To retrieve a block, simply use the retrieval protocol to route the request with the block address to the relevant neighborhood. If any node on the way has a corresponding block in its local, it will be sent back in the form of a response.

Swarm white paper released: Detailed explanation of Swarm storage mechanism and API functions

Nodes use the pull-sync protocol to continuously synchronize their block storage. This ensures that each neighborhood redundantly stores all blocks belonging to its neighborhood. This redundancy increases the flexibility of data transmission and can maintain block availability even when some nodes in a certain neighborhood cannot be accessed. The synchronization protocol also ensures that the stored content of the neighborhood remains consistent when the node is offline and when a new node joins the network.

Forwarding, privacy and caching

In Swarm, the routing of a message is achieved by forwarding it recursively to a location closer to its destination, and then sending a response back along the same route. This routing algorithm has two important attributes:

  • The person making the request is vague.
  • Automatically expand as demand increases.

The message sent by the node that initiated the request is the same as the message sent by the node that only forwards the request. This ambiguity allows the initiator of the request to ensure that their privacy is not violated, thereby facilitating permissionless content publishing and private browsing.

Since the nodes participating in the route retrieval request may choose to store the blocks forwarded by them, an auto-expandable distribution system must be enabled. The bandwidth incentive mechanism discussed below provides economic motivation for this kind of opportunistic caching.

Swarm Accounting Agreement

The Swarm Accounting Protocol (Swarm Accounting Protocol, SWAP) ensures that node operators will collaborate when routing messages, while protecting the network from random use of bandwidth.

When nodes forward requests and responses, they track the relative bandwidth consumption between them and each node. Within a certain limit, services are exchanged between nodes. However, once the limit is reached, the debtor can either choose to wait until its debt is amortized over time, or it can pay by sending checks, which can be cashed as BZZ on the blockchain.

Swarm white paper released: Detailed explanation of Swarm storage mechanism and API functions

This agreement can ensure that those who download or upload a small amount of content can use Swarm for free, and those who are willing to wait can also use Swarm for free after reciprocal services with various nodes until they obtain sufficient credit. At the same time, when uploading or downloading a larger amount of content, it provides a faster experience for those who wish to pay.

When helping each node to forward a message, the node has economic motivation, because every node that successfully routes a request to a node closer to the destination can obtain BZZ when the request is successfully delivered. If the node itself does not store data, then it only needs to pay a small fee to request a data block from a closer node. Through such a transaction, the node can obtain a little profit when processing the request. This means that the node has an incentive to cache the block, because after buying a block from a closer node, any subsequent requests for the same block will get a pure profit.

Insufficient capacity and garbage collection

With the addition of new content in Swarm, the limited storage capacity of each node will be exhausted sooner or later. At this point, the node needs a strategy to decide which blocks should be deleted in order to make way for the new block.

The local storage of each Swarm node has two built-in subsystems, namely “reserve” and “cache”.

“Reserve” is a fixed-size storage space dedicated to storing blocks belonging to the node’s neighborhood. Whether a block remains in the “reserve” depends on the “postage stamp” attached to it. The contract on the blockchain allows the purchase of “postage batch” through BZZ. The owner of a “batch” has the right to issue a limited number of postmarks. These postmarks then act as a kind of trust mark, indicating to the user the specific value of storing certain related content in Swarm. By using the size of this value to determine which blocks in the “reserve” should be deleted first, so that the node of the store can maximize the utility of DISC. The value of each postmark will decrease over time, just as the storage rent is regularly deducted from the balance of the “batch”; once the value of the postmark is insufficient, the relevant block will be evicted from the “reserve” and placed Enter “Cache”.

The function of “cache” is to reserve blocks that are not protected by “reserve” due to insufficient “batch” value or too far from the node address. When the capacity reaches the limit, the cache will be trimmed periodically, and the blocks that have not been requested for the longest time will be deleted. The popularity of blocks can be predicted by the time of the last request, and blocks with more SWAP revenue will be reserved first. Combined with speculative caching, this garbage collecting strategy maximizes operators’ profits from bandwidth incentives, and at the network level, it realizes the automatic expansion of popular content.

Swarm white paper released: Detailed explanation of Swarm storage mechanism and API functions

Block type

In the above, we defined the block as the standard unit of data in DISC. There are two basic block types in Swarm: content-addressed chunks and single-owner chunks.

The address of the content addressing block is based on the hash digest of its data. Using the hash as the address of the block can verify the integrity of the block data. Swarm uses the BMT (Binary Merkle Tree) hash algorithm based on the Merkle tree on a small part of the block data.

Swarm white paper released: Detailed explanation of Swarm storage mechanism and API functions

The address of the single owner block is obtained by hashing the owner address and an identifier. The integrity of the block data of a single owner is guaranteed by the owner’s cryptographic signature, which proves the association between the data of any block and the identifier. In other words, each identity has a part of the Swarm address space, where they can freely assign content to an address.

Swarm white paper released: Detailed explanation of Swarm storage mechanism and API functions

Features of the Swarm API

In addition to blocks, Swarm also exposes APIs for implementing higher-level concepts, such as files, hierarchical collections of files with various metadata, and even inter-node messaging. These APIs try to mirror those APIs that are already in use on the web. More novel ideas and data structures can be drawn on these higher levels, thereby bringing rich and diverse possibilities to everyone who wants to benefit from the privacy and decentralized core products provided by DISC.

Files and collections

Data larger than the 4 kilobytes allowed in a single block will be split into multiple blocks. A group of identical blocks is represented by a Swarm hash tree (hash-tree), which encodes the way the file is divided into blocks during upload. This tree is composed of a set of leaf node chunks, which contain the data itself, which is referenced by one or more layers of intermediate blocks, and each intermediate block contains references to its sub-blocks.

Then, the content address of the entire file is determined by the hash digest of the root block, that is , the Merkle root of the hash tree that spans the entire file. In this way, the address of the file becomes its checksum (checksum), so that the integrity of the content can be verified. The balanced Merkel tree that represents the file as a block also provides efficient random access to the file, and as a result, the range query can be performed efficiently.

Swarm uses “manifests” to represent collections. The manifest encodes a generic string reference mapping, allowing it to model the directory tree, key-value store, or routing table. These respectively enable Swarm to implement a file system, act as a database, and even provide virtual hosts for websites and DApps.

If we interpret the host part of the URL as a reference to the manifest, then the manifest provides URL-based addressing, and the URL path is used as the key to look up in the mapping represented by the manifest, and is only used to reach the file reference.

Swarm white paper released: Detailed explanation of Swarm storage mechanism and API functions

The lists encode the mappings they represent in the form of a compacted Merkle trie, and the block serializes the nodes of the prefix tree. When searching for a path, we only need to retrieve the corresponding block along the node of the branch we traversed. This ensures efficient search for files/records, the latency and bandwidth of which are logarithms of the collection size.

Swarm white paper released: Detailed explanation of Swarm storage mechanism and API functions

The child node reference in the middle block of the hash tree in the file and the list prefix tree node in the set are aligned with the BMT hash segment in position. As a result, Swarm supports compact proof that a particular data segment is part of a file at a given offset (offset) from a given URL, which is the basis for publicly provable database indexing and trustless aggregation.

Tracking updates: feeds and domain resolution

A feed is an example of a single-owner block that allows the impression of a mutable resource. A feed can represent a versioned revision of a variable resource, a sequential update to a topic, or a continuous message published by a party in a communication channel.

The way the feed works is to define the identifier of the single-owner block to be derived from the subject and index. When the publisher and content user agree on the update method and update time of the index, they can construct and find a specific reference to the feed update.

Similar to DNS resolves the domain to the IP address of the host server, Swarm uses the Ethereum domain name resolution service ENS (Ehereum-Name Service)-a set of smart contracts on the blockchain to resolve it into a reference to support human readable Domain name (for example, Swarm.eth).

Whenever a web application or the website it represents gets a new Swarm reference due to an update, the reference registered in ENS can be updated. Or, when the domain name refers to the feed, users can benefit from the human-readable domain name and at the same time update its content without having to interact with the blockchain and pay related transaction costs every time a change is made.

Messaging

PSS (Postal Service on Swarm) is a direct messaging protocol between nodes in Swarm. It is achieved by encrypting the message of the intended recipient and wrapping it with a subject in the content addressing block. Since the block is created in such a way that its content address falls in the recipient’s neighborhood, the delivery is naturally handled by the synchronous push protocol.

In addition, for any third party, the message cannot be distinguished from the randomly encrypted block, so it is also called a “Trojan” block. A node expecting to receive a PSS message will try to decrypt and open all blocks that reach its neighborhood. After successfully decrypting and unpacking the Trojan horse block as a legitimate recipient, the client node can send the message in plaintext to the application that uses the PSS API to subscribe to the topic.

PSS also provides asynchronous delivery, because the block will continue to exist and eventually be synchronized to all neighboring nodes, even if these nodes are online later.

Since PSS allows users to receive messages from unknown individuals, it is an ideal communication primitive, used to send anonymous messages to public individuals (such as registration), or to set up secure communication channels through feeds To initiate an information flow to the contact. Since PSS does not require the recipient to perform any actions (such as polling), it can be used as a recommended primitive for push notifications.

Pinning and restoring

DISC will eventually forget about rarely accessed and unpaid content. By ” pinning ” blocks, nodes can ensure that they retain specific content locally. At the same time, this ” pinner who saves the pinned content locally ” can participate in passive or active recovery of the content to benefit all users.

Passive recovery involves a recovery protocol. When the retrieval fails, a recovery request is sent by using the PSS to notify the pinner of the missing block. Pinners will listen to the recovery request and respond by re-uploading the missing blocks so that the downloader can find these missing blocks when retrying. This response recovery function also allows the original content to be seeded directly from the publisher node, similar to the main operating mode in some existing file sharing solutions (BitTorrent, IPFS).

On the contrary, Swarm also provides active recovery or data stewardship, so when Pinner actively checks the availability of content in the network and finds that some blocks are missing, they can actively redeploy these missing blocks.

in conclusion

Swarm is a peer-to-peer network, and all its nodes work together to provide decentralized storage and communication services. The permissionless and private Swarm satisfies the needs of freedom of speech, data sovereignty, and network open market, while ensuring its security through integrity protection, anti-censorship and anti-attack. This article introduces the functions included in the initial mainnet launch of Bee1.0.

This is a milestone, and the journey has just begun: Join Swarm and complete the mission of giving digital freedom together.

Adblock test (Why?)