Web3's Ultimate Sources of Truth
November 28, 2021
|by Raunak Singh
Oracles were the unnoticed power source behind the extreme growth of web3 applications in the past couple of years. They've solved problems that most opponents of blockchain adoption thought were unsolvable a few years ago, and they've enabled the following explosive trends:
- 88x growth in DeFi in a year. DeFi generally relies on oracles for price feeds.
- Growing NFT gaming to a $2.5 billion industry. NFT games often use oracles for verifiable randomness. Oracle randomness will likely power The Metaverse as well.
- $83 billion in secured assets through ChainLink, the current most popular oracle platform, alone.
Figure 1. Oracles enable blockchain interactions with the rest of the web.
And yet, even given their enormous usage over the past few years, oracles still have a long way to go before realizing their full potential as the ultimate sources of truth. These limitations give important context in designing web3 solutions. Here are three powerful designs for how oracles can take web3 to another level through securely serving valuable data to blockchains.
What Can't Blockchains Do?
Before discussing the solutions themselves, we should first have a clear idea of the problems they solve. You can skip this section if you are already familiar with oracles; but if not, here's where oracles fit into the blockchain landscape.
Why Blockchain Code Needs to be Deterministic
At its core, a blockchain is a network of nodes that maintain consensus on the network's state.
A node verifies the current state of the network by checking the history of all transactions (i.e. network state-changes). Nodes need to agree on the network's current state to process new blockchain transactions. A blockchain where nodes can't agree on the current state is similar to a government shutdown in the US - all functioning ceases until key decisions are made.
A network becomes more secure as more nodes join, since there are more nodes independently verifying the transaction history. Joining a network should be seamless to incentivize as many nodes to join as possible. All computation that is done on blockchains needs to be deterministic to minimize the chances of a network not reaching consensus and to reduce friction in node onboarding.
This reliance on purely deterministic computation severely limits the capabilities of standalone blockchains. Determinism prevents blockchains from directly integrating with external data sources or producing randomness. Any developer living in the current decade knows that any platform that can't integrate with outside data sources is hardly useful. This is a big reason why the current state of web3 is just the tip of the iceberg, despite cryptocurrencies already holding trillions of dollars in market cap.
A concrete example might be useful to accompany this explanation. Let’s say we want to build an on-chain asset trading smart contract that executes trades based on real-time price data from an external API. Integrating the smart contract directly with external API calls would introduce non-deterministic code to the blockchain. If this was done, we couldn't guarantee that nodes would get the same response for each API call when calculating the next state from a given transaction. This would leave us relying on a blockchain network that can't guarantee consensus.
This is a severe limitation. To remedy this, we can first convert API data into deterministic transactions before submitting it to the smart contract. We can introduce another component, known as an oracle, to do this for us.
It's often important to make a distinction between on-chain and off-chain computation. On-chain code is code run on the blockchain itself, through transactions or smart contracts. Off-chain code is code run on traditional non-blockchain machines. Gas fees make on-chain code expensive to run. As previously discussed, on-chain code needs to be deterministic. We use our oracle to integrate off-chain code (the external API) with on-chain code (our trading smart contract).
The Oracle Problem
It makes sense why oracles have such a grandiose name. We're trusting our oracle to be accurate, available, and tamper-resistant enough to serve data that can actuate irreversible events on the blockchain. There's a lot of ways that this could go horribly wrong.
Moreover, true crypto-heads would object to relying on a centralized component in a decentralized system. Similar to how a chain is only as strong as its weakest link, a network is only as decentralized as its most centralized component. Relying on a central oracle to be the source of truth means our whole system is centralized. If this was what we truly wanted, we wouldn't be using decentralized blockchains in the first place.
To sum up, we need to use something that is both decentralized and secure to convert off-chain data to on-chain data. Luckily, there are 3 proposed oracle solutions that might give us what we need. These solutions vary on the degree of centralization, costs, and support for off-chain data processing.
1. Blockchain Nodes as Oracles
Instead of relying on blockchain nodes solely for computing deterministic states, what if nodes served as oracles themselves? This option minimizes the centralization and security risks that come from introducing additional network components.
Blockchain Nodes as Oracles Design
We can rely on the nodes that are part of the network at the time a transaction is sent to fetch the API data. Nodes would filter fetched data based on a simple consensus rule (e.g. the result of a majority vote from the nodes, or no data if a majority vote can't be established), and then store the filtered data on-chain. Nodes that join the network at a later time can access the fetched data on-chain so they aren't required to fetch previously-processed requests.
But what about API calls that timeout? What if someone attacked the blockchain network by maliciously requesting an API that didn't return responses in a timely manner? Nodes can enforce a time limit on external data calls to remedy these problems.
Even with enforcing time limits on requests, nodes can still be bribed. In our trading smart contract example, an attacker is incentivized to bribe nodes to report false data if reporting that data is sufficiently profitable for them. To mitigate this incentive, we can randomly select a subset of all nodes in the network to make the API calls. We can set the likelihood of a node being selected based on its history of trustworthiness (ie. how often it has agreed with the majority of nodes in the past). This would make node-bribing much more expensive.
This approach works efficiently for some specific use cases, but it suffers from the limitations of linking oracles with a specific blockchain. To start, we'd be limiting ourselves to blockchains that implement this protocol. If we wanted to access data in a blockchain that doesn't support oracles as nodes, we'd need to use still-nascent inter-blockchain protocols and would be relying on another blockchain to get the data.
Figure 2. Using this method for a blockchain that doesn't support this protocol imposes reliance on a blockchain that does. A user still might prefer to interact with Blockchain B despite a lack of oracle nodes if Blockchain B is more popular, cheaper, or secure than Blockchain A.
Requiring a randomly selected node to establish a connection with a data source makes it harder to access endpoints that require authentication or authorization. Thus, we'd be limiting ourselves to publicly available API endpoints. Lastly, committing to a single blockchain means we can't pay data providers off-chain for our requests (unless they somehow accept cryptocurrency). Most importantly, this blockchain-centered approach doesn't do anything to solve the pressing data scarcity problem that is currently plaguing the blockchain landscape.
2. Data providers as Oracles
Aside from finance, there aren't a lot of data providers that currently operate oracles themselves. If crypto is already so big, why are so few data providers invested in making their data available on-chain? Wouldn't data providers want a piece of the trillion-dollar industry?
Let's think from a data provider's perspective for a second. Running an oracle for proprietary data would be costly. It would require hiring an expensive blockchain dev. Moreover, there's uncertainty in how many people would actually use the provided data on-chain. Even if a data provider was sure there would be enough users to make it worthwhile to operate their own oracle, there's still not a good way to accept payments for their services. If the data provider used the previously discussed solution for their oracle, they'd be stuck with accepting payments in cryptocurrencies, opening up a legal and operational can of worms. This is why only data providers in the finance industry are currently operating oracles.
If we want more than just financial data available to blockchains, we should design oracles around data providers. Can we design an oracle system that makes it more worthwhile for data providers to run their own oracles?
Data Providers as Oracles Design
We can minimize operating costs for data provider oracles by designing an automated, serverless oracle. This oracle can just be a simple script that conducts the following process on a loop:
- Look for on-chain data requests. This can be done without gas fees since reading from blockchains doesn't require gas.
- Group similar data requests to avoid serving redundant API requests.
- Respond to data requests by submitting on-chain transactions of returned data.
Those familiar with blockchain would know that a naive implementation of step 3 would require data providers to purchase cryptocurrencies to fund on-chain gas fees. To relieve data providers of this, we can require the users to fund gas fees. Users would submit gas fees to wallets managed by the data provider oracle; similar to how cryptocurrency exchanges manage their users' wallets.
But wouldn't it be hard for data providers to scale up support for multiple blockchains if they manage their own oracle? Nope. That's one of the few areas where blockchains make this integration easy. Web3 libraries generally use modular application binary interfaces to send blockchain transactions. This means that we can easily use the same script with different blockchain nodes.
Being freed from committing to a single blockchain opens up more possibilities. For example, we could use a Decentralized Autonomous Organization to incentivize off-chain fiat payments to pay data providers for their services. This would relieve data providers from having to work with cryptocurrencies altogether.
Figure 3. This design makes it much simpler for Oracles to support any blockchain. Unlike the design in Figure 2, we don't have to rely on Blockchain A to interface with the API.
It's important to note that there's a danger of centralization if smart contracts only use a single data provider. To remedy this, smart contracts should aggregate multiple data providers. Data aggregation should be done with any oracle implementation; though it's most relevant here since there is no middleman doing the aggregation.
Lightweight data provider-run oracles are a much more robust solution than relying on blockchain nodes as oracles. They're a great fit for our example project where we simply need to integrate a smart contract with the external API. But they still don't quite have the capabilities of a modern data service. What if we wanted to expand our example project to do more than simply fetch off-chain data? What if we wanted to use a third-party service that aggregates and cleans data? Or maybe one that does fancy machine learning on trading data before it's submitted on-chain?
3. Decentralized Oracle Networks
We'll need an even more robust solution if we want to trust others to do arbitrary data transformations before submitting data on-chain. For reasons previously discussed, we can't trust a centralized source to do this. We need a whole decentralized network of oracles.
Decentralized Oracle Network Design
In a decentralized oracle network, all oracles in the network redundantly serve computation requests. Each oracle in the network queries data providers for data, runs its computations, and then submits the results on-chain. Paying for a whole network to do this redundantly isn't a cost-efficient solution. This design trades cost-efficiency for more capability.
Figure 4. Oracle Networks make room for running arbitrary computations on API data.
How do we ensure that oracles are processing data in good faith? What prevents oracles from just outputting arbitrary data if we're allowing them to do arbitrary transformations? This is where we can utilize data signing.
Data signing works by requiring oracles to send certificates issued by the data provider to ensure data hasn't been changed. If the oracle changes the data provider's data in any way, the certificate would be invalid. Smart contracts can verify certificates on-chain in real-time to ensure end-to-end security before using the data. Certificates aren't themselves useful for decentralized oracle networks since we're using our oracle network to do data transformations.
Luckily, data signing can also be done on functions of data. This is immensely useful when working with confidential or aggregated data. For example, we could use functional signatures to verify a person meets a given age requirement without knowing their actual age. We can define a function f(x) to return true if the age (represented as the input x) is greater than the threshold, and false otherwise. The oracle would query the data provider for f(x) directly, and the smart contract would check that the certificate is valid for the oracle-provided value of f(x).
In our example project, we could use functional signatures to sign aggregated data across multiple data providers. We can define f(x) to return the aggregated value if valid certificates from other data sources are also submitted with the query, and false otherwise. This would ensure that the oracle is truthfully aggregating data before we use it on-chain.
Functional signing requires coordination with the data provider to implement. This imposes extra demand on data providers, which isn't great for the reasons discussed in the previous section. The high cost and specialized skills needed to run each oracle node are another downside to this design. High barriers to running oracle nodes mean fewer independent entities running them. This could potentially result in a centralized network.
Additionally, all participants in the decentralized oracle network need to be incentivized to prevent bribing. The public nature of blockchains means it would be trivial to implement a smart contract that automatically pays colluding oracles based on the data values they submit publicly on-chain. The costs of oracle incentivization scale with data importance. Extremely important data would be extremely expensive to secure in an oracle network.
Looking Forward
New use cases will continue to arise as the capabilities of web3 expand through oracle adoption. It's crucial to keep an eye on oracles to be aware of key blockchain trends. Oracles may very well play a pivotal role in the future of the web3 landscape.