Table of contents
Start for free
Jason Ginsberg / 6.28.2022Home / guides
What's decentralized storage, and how is it part of Web3?
A deep analysis of the decentralized networks IPFS, Arweave, and Storj - and how they're facilitating Web3 apps.Skiff recently announced a collaboration with Protocol Labs, the team behind the Interplanetary File System (IPFS), to build fully decentralized storage into Skiff. In the short time since the release of Skiff’s IPFS integration, many of our users have already chosen to store the contents of their end-to-end encrypted documents using IPFS. Given the amount of interest we’ve seen from users, we wrote up this wiki to give a technical breakdown of IPFS and some of its peers in the world of decentralized storage.
IPFS
IPFS stands for the Interplanetary File System. IPFS is a peer-to-peer storage network. It allows you to find, store and share files. IPFS has three unique characteristics that set it apart from existing file systems:- Unique Identification via content addressing
- Content linking via directed acyclic graphs (DAGs)
- Content discovery via distributed hash tables (DHT’s)
Unique Identification via content addressing
Currently when you want to find some form of content on the internet, you have to know where that content is stored. If you, for example, wanted to find the Wikipedia entry for “End-to-end encryption,” you would have to know the address on the internet where that Wikipedia page is stored. (That address is https://en.wikipedia.org/wiki/End-to-end_encryption.) This form of identifying content by where its stored is called location addressing.IPFS, on the other hand, uses content addressing. Instead of locating content by asking where it is stored, IPFS asks what content is being requested.It does this by giving every piece of content on the IPFS protocol something called a content identifier, or CID. A CID is a cryptographic hash. This hash is unique to the original piece of content it was derived from.Many distributed systems make use of CID’s, but they do not necessarily have the same underlying data interoperability. To solve this problem, IPFS makes use of the Interplanetary Linked Data (IPLD) Project. The IPLD provides libraries for combining pluggable modules (parsers for each possible type of IPLD node) to resolve a path, selector, or query across linked nodes. This allows you to explore data regardless of the underlying protocol.Content linking via directed acyclic graphs (DAGs)
A Directed acyclic graph or DAG is a directed graph with no directed cycles. This means that a closed loop can never occur in a DAG. In the case of IPFS, the exact type of DAG used is something called a Merkel DAG. In addition to all the attributes of a DAG, a Merkel DAG also contains, within each node, a unique identifier stored as a hash of the content of the node. The Merkel DAG’s used in IPFS are optimized for representing directories and files.For storage, IPFS may split content into blocks. These blocks can be stored in different locations and authenticated quickly. This is similar to how BitTorrent allows you to fetch a file from multiple peers at once.To bring everything together, If you have a folder stored in IPFS, the CID of that folder is a hash derived from the folder’s content. However, each of the files themselves will also have a CID which will itself be a hash derived from the contents of the files. The contents of any individual file can be split into blocks, each block itself having its own CID etc.One of the major benefits of storing content in this manner, is that any two similar files can reference the same underlying block . This means if you need to update the contents of a file, you only have to update the specific blocks that have been altered and can keep the references to unaltered blocks. This makes dealing with large amounts of data much more efficient than if you had to change or re-create the entire file structure each time an edit was made or new content was added.Distributed hash tables (DHTs)
When looking for content from your peers in the IPFS protocol, IPFS makes use of a distributed hash table. A hash table is a data structure that stores key-value pairs. In a distributed hash table, the data structure is stored across all the peer nodes in a given distributed network.The libp2p project is the protocol inside IPFS that handles the Distributed Hash Table and all the communication among peer nodes.When retrieving content using the IPFS system, nodes use libp2p to query the Distributed Hash Table twice — first to find which peers in the network are storing particular blocks, and then to find the current location of these desired peers in the network.Once a node has queried the DHT to find out which peers have the blocks you want and where those peers are currently in the network, IPFS uses a module called Bitswap to connect and exchange blocks between peers. When requesting a block, a node connects to a chosen peer and sends a wantlist, which is a list of desired blocks. Once the desired blocks have been received, they can be verified by hashing their content and comparing the result to the associated CID of any received block.Protocol Labs, the makers of the IPFS protocol, are also the creators of the complementary Filecoin protocol. The difference between IPFS and Filecoin is while IPFS allows peers in a network to store, transfer and retrieve data from one another, Filecoin is designed as a system to incentivize persistent data storage.Filecoin allows clients to pay to store data at various levels of availability and redundancy. The storage providers are paid in Filecoin to not only continuously store the data but also to cryptographically prove they are storing the data they say they are.IPFS and Filecoin are complementary protocols. This means that you can use both of them together or they can also be separable and used on their own or in conjunction with other protocols.Arweave
Arweave is another distributed storage solution that takes a different approach to incentives and permanence.One of the main differentiating factors between Arweave and IPFS is that the Arweave protocol promises permanent storage through the creation of what they call the permaweb.What makes Arweave’s storage so permanent? The difference lies in the protocol’s incentive structure.Using Arweave, an end user can theoretically pay just once to store data forever.How does Arweave work?
Areave utilizes 4 core technologies to deliver a low cost, high throughput, permanent storage solution:- Blockweave
- Proof of Access
- Wildfire
- Blockshadows
Blockweave
The Blockweave is different from most blockchains in that it doesn’t require every block in a network to participate in order to validate a transaction. With Arweave, nodes do not have to have possession of the whole chain. This is possible because nodes can still fulfill network functions by having a block hash list and a wallet list. The block hash list contains the hashes of all previous blocks. This allows for old blocks to be verified and for new blocks to be quickly evaluated. The wallet list contains all the active wallets in the system. By introducing these two concepts, transactions can be verified without possession of the most recent block.Additionally, miners do not need to verify the entire blockchain from genesis to present. Instead, they use a system of ‘on-going verification’. Miners verify that the transaction has been signed by the wallet owner’s private key.Proof of Access
The Arweave protocol uses Proof of Access and Proof of Work as its consensus mechanisms. For miners to mine or verify a new block, their mining node needs access to that block’s recall block. The recall block is a historical block which is generated based on the current block. This proof of access is required as part of block construction and verifying this proof is how Arweave validates a new block. This requirement inherently incentivizes storage since miners will need access to arbitrary blocks in order to receive mining rewards through mining new blocks. This Proof of Access system works through its probabilistic and incentive-driven structure.Wildfire
The Arweave protocol also uses a similar mechanism as bittorrent, called the Adaptive Interacting Incentive (AIIA) Wildfire Agent.This creates a sort of “meta-game” on top of the $AR rewards which incentivizes pro-social behavior from miners. Being a responsive node means gaining a higher rank from peers. Less responsive nodes can either choose to improve or continue dropping in rank.What is this ranking or score? It’s a rolling average of bytes per second over a number of recent requests to that peer.This allows nodes to properly choose where to use their bandwidth and have a high probability of accurate and prompt communications. This system prevents sending messages to defunct nodes in the network, thereby enabling a system of efficient communications given the finite bandwidth of nodes.Blockshadows
Blockshadows allow for the reconstruction of full blocks without needing to send each whole block to every node in the network when it’s mined. This allows for transactions to be mined into a block at the same speed that they are distributed around the network since blockshadows are only a few kilobytes in size. Blockshadows enable the Arweave protocol to support unlimited sized blocks, thus allowing for a network with permanent on-chain storage.$AR
$AR is to $BTC as a satoshi is to a winston. In other words, a winston is the smallest denomination of $AR.$AR is a utility token since it is used to pay for permanent data storage. However, users can also use it as a means of value exchange. 55 million $AR were created in the genesis block at network launch in 2018. Maximum circulation will be 66 million $AR as more $AR is introduced into circulation in the form of block mining rewards.How is $AR actually used? In order to write a transaction into a block, a user has to pay some $AR as a transaction fee. Most of the fee goes towards a storage endowment which over time is gradually distributed to the miner wallets.What are Arweave’s use cases and how are they different from other decentralized storage providers?
Since Arweave’s main value proposition is the permanence and upfront cost of its storage, using it as an archive makes a lot of sense since once a record is added to the blockweave, it cannot be removed. Additionally, uploaded data is signed by the uploading user, thus making the origins of anti-social behavior, such as misinformation, highly traceable. However, Arweave is not optimized for changes to data stored on the permaweb, nor is it designed for privacy.Storj DCS
Another decentralized storage solution is Storj DCS (Decentralized Cloud Storage). Storj DCS is a secure cloud object for developers that is S3-compatible at up to 80% reduced costs.Storj DCS focuses on providing a decentralized “pay only for how much you store” secure cloud storage platform. When you use Storj DCS, you get your first 150GB of storage free and only have to pay for any additional storage capacity needed beyond 150GB. For example, storing 1TB of data using Storj DCS will cost you $4 a month.To understand how Storj DCS works, we will use the breakdown given in the Storj DCS whitepaper to split the framework into eight individual components. These components are:- Storage Nodes
- Peer-to-peer communication and discovery
- Redundancy
- Metadata
- Encryption
- Audits and reputation
- Data Repair
- Payments
- Provide network bandwidth
- Have appropriate responsiveness
Conclusion
IPFS, Arweave and StorjDCS are only a few of the options in the exciting new field of decentralized storage and we hope you learned something new reading this Skiff Page. In fact, Skiff allows you to enable decentralized storage on IPFS, which will store your files in Skiff Drive and content inside Skiff Pages on IPFS.If you enjoyed this content follow us on Twitter or join our Discord community to continue the discussion and learn more!Related articles
Andrew MilichSkiff, DecentralizedWe are building towards the future of online collaboration
Skiff TeamPrivacy Guide For Cryptocurrency UsersLearn about protecting your privacy in the crypto space.
Jason GinsbergIntroducing Skiff DriveEnd-to-end encrypted, privacy-first file storage, with 10GB free.
Skiff TeamIs Google Drive end-to-end encrypted?Over a billion people use Google Drive. Is Google Drive secure, encrypted, and end-to-end encrypted?
Andrew MilichWhy you should use crypto wallets for your emailsHow Web3 and crypto wallets improve privacy, anonymity, identity portability, and encryption for email and communications.
Richard LiuKeplr Wallet: Web3 mail launchKeplr and Skiff announce a verified custom domain - keplr.xyz - for all Keplr wallet users to securely communicate.
Skiff TeamSkiff named top 25 app of 2022Skiff is thrilled to be one of the top 25 new apps of 2022!
Eli MacKinnonSkiff Domains launches!With thousands of Skiff Mail users already using custom domains for email, we’re excited to launch a major feature to make custom domains even simpler: Skiff domains.