Jason Ginsberg / 6.28.2022Home / guides

What's decentralized storage, and how is it part of Web3?

A deep analysis of the decentralized networks IPFS, Arweave, and Storj - and how they're facilitating Web3 apps.
Globe surrounded by connections in the decentralized web.
Skiff recently announced a collaboration with Protocol Labs, the team behind the Interplanetary File System (IPFS), to build fully decentralized storage into Skiff. In the short time since the release of Skiff’s IPFS integration, many of our users have already chosen to store the contents of their end-to-end encrypted documents using IPFS. Given the amount of interest we’ve seen from users, we wrote up this wiki to give a technical breakdown of IPFS and some of its peers in the world of decentralized storage.

IPFS

IPFS stands for the Interplanetary File System. IPFS is a peer-to-peer storage network. It allows you to find, store and share files. IPFS has three unique characteristics that set it apart from existing file systems:
  1. Unique Identification via content addressing
  2. Content linking via directed acyclic graphs (DAGs)
  3. Content discovery via distributed hash tables (DHT’s)

Unique Identification via content addressing

Currently when you want to find some form of content on the internet, you have to know where that content is stored. If you, for example, wanted to find the Wikipedia entry for “End-to-end encryption,” you would have to know the address on the internet where that Wikipedia page is stored. (That address is https://en.wikipedia.org/wiki/End-to-end_encryption.) This form of identifying content by where its stored is called location addressing.IPFS, on the other hand, uses content addressing. Instead of locating content by asking where it is stored, IPFS asks what content is being requested.It does this by giving every piece of content on the IPFS protocol something called a content identifier, or CID. A CID is a cryptographic hash. This hash is unique to the original piece of content it was derived from.Many distributed systems make use of CID’s, but they do not necessarily have the same underlying data interoperability. To solve this problem, IPFS makes use of the Interplanetary Linked Data (IPLD) Project. The IPLD provides libraries for combining pluggable modules (parsers for each possible type of IPLD node) to resolve a path, selector, or query across linked nodes. This allows you to explore data regardless of the underlying protocol.

Content linking via directed acyclic graphs (DAGs)

A Directed acyclic graph or DAG is a directed graph with no directed cycles. This means that a closed loop can never occur in a DAG. In the case of IPFS, the exact type of DAG used is something called a Merkel DAG. In addition to all the attributes of a DAG, a Merkel DAG also contains, within each node, a unique identifier stored as a hash of the content of the node. The Merkel DAG’s used in IPFS are optimized for representing directories and files.For storage, IPFS may split content into blocks. These blocks can be stored in different locations and authenticated quickly. This is similar to how BitTorrent allows you to fetch a file from multiple peers at once.To bring everything together, If you have a folder stored in IPFS, the CID of that folder is a hash derived from the folder’s content. However, each of the files themselves will also have a CID which will itself be a hash derived from the contents of the files. The contents of any individual file can be split into blocks, each block itself having its own CID etc.One of the major benefits of storing content in this manner, is that any two similar files can reference the same underlying block . This means if you need to update the contents of a file, you only have to update the specific blocks that have been altered and can keep the references to unaltered blocks. This makes dealing with large amounts of data much more efficient than if you had to change or re-create the entire file structure each time an edit was made or new content was added.

Distributed hash tables (DHTs)

When looking for content from your peers in the IPFS protocol, IPFS makes use of a distributed hash table. A hash table is a data structure that stores key-value pairs. In a distributed hash table, the data structure is stored across all the peer nodes in a given distributed network.The libp2p project is the protocol inside IPFS that handles the Distributed Hash Table and all the communication among peer nodes.When retrieving content using the IPFS system, nodes use libp2p to query the Distributed Hash Table twice — first to find which peers in the network are storing particular blocks, and then to find the current location of these desired peers in the network.Once a node has queried the DHT to find out which peers have the blocks you want and where those peers are currently in the network, IPFS uses a module called Bitswap to connect and exchange blocks between peers. When requesting a block, a node connects to a chosen peer and sends a wantlist, which is a list of desired blocks. Once the desired blocks have been received, they can be verified by hashing their content and comparing the result to the associated CID of any received block.Protocol Labs, the makers of the IPFS protocol, are also the creators of the complementary Filecoin protocol. The difference between IPFS and Filecoin is while IPFS allows peers in a network to store, transfer and retrieve data from one another, Filecoin is designed as a system to incentivize persistent data storage.Filecoin allows clients to pay to store data at various levels of availability and redundancy. The storage providers are paid in Filecoin to not only continuously store the data but also to cryptographically prove they are storing the data they say they are.IPFS and Filecoin are complementary protocols. This means that you can use both of them together or they can also be separable and used on their own or in conjunction with other protocols.

Arweave

Arweave is another distributed storage solution that takes a different approach to incentives and permanence.One of the main differentiating factors between Arweave and IPFS is that the Arweave protocol promises permanent storage through the creation of what they call the permaweb.What makes Arweave’s storage so permanent? The difference lies in the protocol’s incentive structure.Using Arweave, an end user can theoretically pay just once to store data forever.

How does Arweave work?

Areave utilizes 4 core technologies to deliver a low cost, high throughput, permanent storage solution:
  1. Blockweave
  2. Proof of Access
  3. Wildfire
  4. Blockshadows

Blockweave

The Blockweave is different from most blockchains in that it doesn’t require every block in a network to participate in order to validate a transaction. With Arweave, nodes do not have to have possession of the whole chain. This is possible because nodes can still fulfill network functions by having a block hash list and a wallet list. The block hash list contains the hashes of all previous blocks. This allows for old blocks to be verified and for new blocks to be quickly evaluated. The wallet list contains all the active wallets in the system. By introducing these two concepts, transactions can be verified without possession of the most recent block.Additionally, miners do not need to verify the entire blockchain from genesis to present. Instead, they use a system of ‘on-going verification’. Miners verify that the transaction has been signed by the wallet owner’s private key.

Proof of Access

The Arweave protocol uses Proof of Access and Proof of Work as its consensus mechanisms. For miners to mine or verify a new block, their mining node needs access to that block’s recall block. The recall block is a historical block which is generated based on the current block. This proof of access is required as part of block construction and verifying this proof is how Arweave validates a new block. This requirement inherently incentivizes storage since miners will need access to arbitrary blocks in order to receive mining rewards through mining new blocks. This Proof of Access system works through its probabilistic and incentive-driven structure.

Wildfire

The Arweave protocol also uses a similar mechanism as bittorrent, called the Adaptive Interacting Incentive (AIIA) Wildfire Agent.This creates a sort of “meta-game” on top of the $AR rewards which incentivizes pro-social behavior from miners. Being a responsive node means gaining a higher rank from peers. Less responsive nodes can either choose to improve or continue dropping in rank.What is this ranking or score? It’s a rolling average of bytes per second over a number of recent requests to that peer.This allows nodes to properly choose where to use their bandwidth and have a high probability of accurate and prompt communications. This system prevents sending messages to defunct nodes in the network, thereby enabling a system of efficient communications given the finite bandwidth of nodes.

Blockshadows

Blockshadows allow for the reconstruction of full blocks without needing to send each whole block to every node in the network when it’s mined. This allows for transactions to be mined into a block at the same speed that they are distributed around the network since blockshadows are only a few kilobytes in size. Blockshadows enable the Arweave protocol to support unlimited sized blocks, thus allowing for a network with permanent on-chain storage.

$AR

$AR is to $BTC as a satoshi is to a winston. In other words, a winston is the smallest denomination of $AR.$AR is a utility token since it is used to pay for permanent data storage. However, users can also use it as a means of value exchange. 55 million $AR were created in the genesis block at network launch in 2018. Maximum circulation will be 66 million $AR as more $AR is introduced into circulation in the form of block mining rewards.How is $AR actually used? In order to write a transaction into a block, a user has to pay some $AR as a transaction fee. Most of the fee goes towards a storage endowment which over time is gradually distributed to the miner wallets.

What are Arweave’s use cases and how are they different from other decentralized storage providers?

Since Arweave’s main value proposition is the permanence and upfront cost of its storage, using it as an archive makes a lot of sense since once a record is added to the blockweave, it cannot be removed. Additionally, uploaded data is signed by the uploading user, thus making the origins of anti-social behavior, such as misinformation, highly traceable. However, Arweave is not optimized for changes to data stored on the permaweb, nor is it designed for privacy.

Storj DCS

Another decentralized storage solution is Storj DCS (Decentralized Cloud Storage). Storj DCS is a secure cloud object for developers that is S3-compatible at up to 80% reduced costs.Storj DCS focuses on providing a decentralized “pay only for how much you store” secure cloud storage platform. When you use Storj DCS, you get your first 150GB of storage free and only have to pay for any additional storage capacity needed beyond 150GB. For example, storing 1TB of data using Storj DCS will cost you $4 a month.To understand how Storj DCS works, we will use the breakdown given in the Storj DCS whitepaper to split the framework into eight individual components. These components are:
  1. Storage Nodes
  2. Peer-to-peer communication and discovery
  3. Redundancy
  4. Metadata
  5. Encryption
  6. Audits and reputation
  7. Data Repair
  8. Payments
1. Storage NodesThe role of the storage node is to store and return data within the network. The storage node must also
  • Provide network bandwidth
  • Have appropriate responsiveness
The storage node is selected using an explicit non-deterministic process. This is because nodes are selected via changing variables external to the protocol. These external changing variables include things such as Ping time, geographic location and a nodes history of responding accurately to audits.In return for storing content, nodes are paid by the framework.Each node will also have its own certification authority which requires a public-private key-pair and self-signed certificate. The ID of any node will be a hash of the node’s public key.2. Peer-to-peer communication and discoveryEach node in the network can authenticate the identity of any of its peers by validating the certificate chain and hashing its peer’s certificate authority’s public key.Storj DCS uses a Kademila distributed hash table (DHT) with a basic decentralized caching service on top of the DHT. The reason for the decentralized caching service is to achieve millisecond-level response times which are difficult to obtain using an unmodified Kademila DHT.3. RedundancyStorj DCS uses Reed-Solomon erasure code to implement redundancy for data in the network.Files in Storj DCS are split into segments of a standardized size. Splitting files into standardized segments allows for numerous advantages such as more equally spread bandwith demands across the network and allowing parallel transfers to occur.Segments in Storj DCS can also be further split into stripes of data if the network deems it appropriate.4. MetadataMost of the Metadata stored in Storj DCS is done so by storing pointers. Individual components of the network that need certain metadata can perform desired actions as needed by retrieving a pointer from a pointer database.This pointer database is managed by the client using their preferred trusted database, such as MongoDB or SQLite.5. EncryptionTo ensure a user knows if anything (or anyone) has tampered with their data, Storj DCS uses authenticated encryption with support for both the AES-GCM cipher, and the Salsa20 and Poly1305 combination.6. Audits and reputationsTo ensure that every node in the network is storing the data it says it is, Storj DCS implements an audit system. Auditors in the network will send a challenge to a storage node and expect to receive a valid response. A challenge in this context is a request to the storage node to prove it has the expected data.7. Data RepairsAs nodes go offline, it is important that backups of the data stored on them are recreated on other nodes in the network to ensure no data is lost. Storj DCS does this by making use of the network’s cache system to keep track of which nodes have been online recently. If a node goes offline, a lookup in a reverse index within a user’s metadata database can occur. Any segment pointer stored on a recently offline node can then be downloaded and reconstructed and subsequently uploaded to new nodes on the network. The pointers stored in the metadata database can then be updated to reflect the new changes. This ensures no data is lost as nodes go offline within the network.8. PaymentsClients can pay into the Storj DCS through any method of payment (the native STORJ token, credit card, invoice, etc.). However, payments to storage nodes are made via the Ethereum-based ERC20 STORJ token.Storage nodes are not paid for the initial transfer of data to the node. They are, however, paid on a month-by-month basis for storing the transferred data.

Conclusion

IPFS, Arweave and StorjDCS are only a few of the options in the exciting new field of decentralized storage and we hope you learned something new reading this Skiff Page. In fact, Skiff allows you to enable decentralized storage on IPFS, which will store your files in Skiff Drive and content inside Skiff Pages on IPFS.If you enjoyed this content follow us on Twitter or join our Discord community to continue the discussion and learn more!

Join the community

Become a part of our 1,000,000+ community and join the future of a private and decentralized internet.

Free plan • No card required