Oracles — Importing Trusted Data into a Deterministic EVM - Mastering Ethereum: Building Smart Contracts and DApps

Key Principle

An oracle is "a system that can answer questions that are external to Ethereum" — it brings off-chain data on-chain for contracts to consume. Because the EVM must be totally deterministic to maintain consensus, contracts cannot call out to the web (each node would get a different answer). Oracles are the forced workaround: trust must be imported through an on-chain bridge. Critically, injecting data on-chain only makes it agreed upon, not trustworthy — "we have just deferred the problem." This is the oracle problem, a direct consequence of consensus design, not an oversight. [DATED 2018: Oraclize→Provable; Chainlink now dominant.]

Why This Matters

Determinism produces two hard limits that make oracles necessary:

No intrinsic randomness. If a true RNG existed, node A might store 3 while node B stores 7 for the same contract/code/context — nodes diverge, consensus breaks, and "it would get much worse very quickly, because knock-on effects, including ether transfers, would build up exponentially." Pseudorandomness via secure hashes is deterministic and available, but insufficient against adversaries (a miner running a coin-flip game can win by only including their transaction in blocks where they win — they control inclusion).
Extrinsic data only via transaction payloads. External info (prices, weather, randomness) can only enter as a transaction's data field.

The oracle is a single point of failure whose compromise is irreversible on-chain. Antonopoulos's smart-will example: "If the inheritance amount controlled by such a contract is high enough, the incentive to hack the oracle and trigger distribution... before the owner dies is very high." The secured value is the attack incentive.

Two distinct trust problems (easy to conflate):

Is the source authoritative? Attestation/subjective data (passports, diplomas, IDs) "cannot be provided trustlessly... as there is no independently verifiable objective truth." No cryptography fixes this; you trust the issuer.
Was the data tampered with in transit? Solved by data authentication (authenticity proofs or TEEs) — but these only prove the carrier didn't alter the data; they say nothing about source honesty.

Good Examples

Three design patterns (chosen by data size, change frequency, and gas/bandwidth cost — an economics choice):

Immediate-read — just-in-time decisions ("Is this person over 18?"). Read via call or direct client read "without... incurring the gas costs of issuing a transaction." Stores a hash, not raw data — or, for private IDs, only a Merkle root with salts.
Publish–subscribe — regularly-changing data; a flag signals new data. "Polling is very inefficient in the world of web servers, but not so in the peer-to-peer context of blockchain platforms" since every synced client already tracks all state changes. Reduces bandwidth and storage cost.
Request–response — most complex; for datasets too large to store on-chain. Asynchronous: EOA→DApp→oracle request→event/state change→off-chain query→result "signed by the oracle owner, attesting to the validity of the data at a given time"→returned to DApp.

Data authentication:

Oraclize + TLSNotary proof (via PageSigner): TLS master key split between server, auditee, auditor; the auditor runs on an AWS VM verifiable as unmodified — but "requires the assumption that Amazon itself will not tamper with the VM instance."
Town Crier — TEE feed on Intel SGX: integrity, confidentiality, attestation. Trust assumption: "assuming that we trust Intel/SGX."

Computation oracles (off-chain compute, "especially useful given Ethereum's inherent block gas limit"):

TrueBit — market of incentivized solvers and verifiers; a challenged result triggers an on-chain "verification game" recursing to a trivial round where Ethereum miners rule. "In theory, this enables trustless smart contracts to securely perform any computation task."

Decentralized oracles (remove the single point of failure):

ChainLink — three contracts: reputation, order-matching (selects bids by reputation, finalizes an SLA), aggregation (collects responses via a commit–reveal scheme). "The hard part: the formulation of the aggregation function." Outlier-rejection "risks penalizing correct answers over average ones."
SchellingCoin — reporters submit values; the median is truth; deposits are "redistributed in favor of values closer to the median," converging on the Schelling point.

Oracle client pattern (usingOraclize inheritance + oraclize_query), two load-bearing guards:

Callback authentication: __callback must verify msg.sender == oraclize_cbAddress(). Without it, any account can inject a false result.
Fee/balance check: oraclize_getPrice("URL") > this.balance guards a query that silently fails to send if unfunded.

Counterpoints

Every authentication scheme relocates trust (Amazon, Intel) rather than eliminating it — consistent with power-as-liability.
Decentralizing trades the single-point-of-failure for the unsolved problem of designing an aggregation function.
A trusted oracle = a single point of failure: "if they are trusted sources and can be compromised, they can result in compromised execution of the smart contracts they feed."

Key Quotes

"Such data simply cannot be trusted, because it comes from unverifiable sources. As such, we have just deferred the problem." — Antonopoulos & Wood, Chapter 11

"In order to maintain consensus, EVM execution must be totally deterministic and based only on the shared context of the Ethereum state and signed transactions." — Antonopoulos & Wood, Chapter 11

"Oracles bring external facts to contract execution... if they are trusted sources and can be compromised, they can result in compromised execution of the smart contracts they feed." — Antonopoulos & Wood, Chapter 11

Rules of Thumb

Never use on-chain pseudorandomness for adversarial/high-value outcomes; miners control inclusion.
Always authenticate the callback: msg.sender == oraclize_cbAddress() (or address(oracle)).
Design against silent failure: check oraclize_getPrice against this.balance before querying.
Pick the pattern by economics: immediate-read (just-in-time), publish–subscribe (frequently changing), request–response (large datasets).
Separate the two trust questions: source authority (unsolvable by crypto) vs. in-transit tampering (solvable by proofs/TEEs).
Prefer decentralized oracles (ChainLink/median schemes) to remove the single point of failure.

Related References

EVM Internals — Stack Machine, State, Gas, and Bytecode - determinism and the block gas limit that force oracles
DApps and the web3 Stack — Decentralization Across the Whole Architecture - off-chain components re-import the oracle problem
Smart Contract Vulnerability Catalog - oracle as single point of failure; tx.origin/CALLER