Crypto
4 min read

Hash

A fixed-length string output produced by running data through a hash function. Hashes are deterministic and one-way: the same input always yields the same hash, but the input cannot be recovered from it.

How hashes work

A hash function takes input data of any size and produces a fixed-length output. Key properties:

  • Deterministic — same input always produces the same hash.
  • One-way — given a hash, you can't reasonably reverse it to find the original input.
  • Avalanche — tiny changes in input produce completely different hashes.
  • Collision-resistant — finding two different inputs that produce the same hash should be computationally infeasible.

A typical SHA-256 hash looks like a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e — 256 bits, displayed as 64 hexadecimal characters.

What hashes do in blockchains

Hashes are used pervasively:

  • Block linking. Each block contains the hash of the previous block; modifying any historical block invalidates every hash that follows.
  • Transaction identifiers. Each transaction has a hash that uniquely identifies it.
  • Merkle trees. All transactions in a block are organized into a hash tree, with the "Merkle root" included in the block header.
  • Address generation. Wallet addresses are typically derived from hashes of public keys.
  • Proof-of-work mining. Miners compete to find a hash with specific properties (e.g., starting with a certain number of zeros).
  • Commitments. "Commit-reveal" schemes use hashes to commit to a value before revealing it.

Common hash functions

Different cryptocurrencies use different hash functions:

  • SHA-256 — used by Bitcoin for mining, block hashes, and transaction IDs. Originally designed by NSA and standardized by NIST.
  • Keccak-256 — used by Ethereum for most operations. A variant of the SHA-3 standard.
  • Ethash — formerly used by Ethereum for mining (pre-Merge). Memory-hard variant designed to resist ASIC mining.
  • Scrypt — used by Litecoin for mining. Memory-hard alternative to SHA-256.
  • BLAKE3, BLAKE2 — newer high-performance hash functions used in some chains.

Different functions have different performance characteristics, security analyses, and ecosystem support.

Why hashes are useful for verification

A few common patterns:

  • File integrity. Compute the hash of a downloaded file; compare to the published hash. Any tampering changes the hash.
  • Password storage. Don't store passwords; store their hashes. Verify by hashing the entered password and comparing.
  • Digital signatures. Sign the hash of a document rather than the document itself. Far faster, equally secure.
  • Git commits. Each commit identifies its content by hash. The whole history forms a hash chain.

Hashes and addresses

Most blockchain addresses are derived from hashes:

  • Bitcoin addresses — typically derived from RIPEMD-160(SHA-256(public key)) plus encoding.
  • Ethereum addresses — last 20 bytes of Keccak-256(public key). Displayed as 0x followed by 40 hex characters.

The hashing serves multiple purposes: shortens addresses, hides public keys until use, provides limited quantum resistance.

Hashes and Merkle trees

A Merkle tree is a tree of hashes. Each parent node's value is the hash of its children's values combined. This structure has useful properties:

  • The root hash captures all transactions in a block. Changing any single transaction would change every hash up to the root.
  • Proofs of inclusion are compact. To prove a specific transaction is in a block of millions of transactions, you only need log(n) hashes — not the full list.
  • Light clients can verify specific transactions without downloading the whole block.

Merkle trees are used in Bitcoin and Ethereum blocks, in some data availability systems, and in many other contexts.

Cryptographic strength

Modern hash functions are designed for cryptographic security. Key concerns:

  • Pre-image resistance. Given a hash, finding any input that produces it should be infeasible. SHA-256 has 256-bit security; finding a pre-image would require approximately 2^256 operations.
  • Collision resistance. Finding two different inputs with the same hash should be infeasible. Birthday paradox reduces this to roughly 2^128 operations for SHA-256 — still infeasible.
  • Quantum vulnerability. Some hash functions could potentially be weakened by quantum computers; current consensus is that doubling output size mitigates this for most uses.

The cryptographic community continuously evaluates hash functions. Older functions (MD5, SHA-1) have known weaknesses and are deprecated; modern blockchains use functions still considered secure.

Hash collisions in practice

While theoretical collision attacks exist for older hash functions:

  • MD5 — collisions can be generated trivially. Insecure for cryptographic use.
  • SHA-1 — practical collisions demonstrated in 2017. Phasing out.
  • SHA-256 — no known practical attacks; considered secure for the foreseeable future.
  • Keccak-256 — same.

Bitcoin's continued reliance on SHA-256 since 2009 reflects ongoing confidence in its security. If this changed, the protocol would face existential pressure to upgrade.

Why hashes matter beyond crypto

Hashes appear throughout computing:

  • Hash tables — fundamental data structure mapping keys to values.
  • Caching — content-addressable storage uses hashes as keys.
  • Distributed systems — consistency checking uses hash comparisons.
  • Software distribution — package managers verify downloads against published hashes.

Crypto's specific use of hashes for blockchain construction is one application; the underlying primitive is much broader.