Crypto
4 min read

Hash Function

A cryptographic algorithm that maps input data of any size to a fixed-length output. Good hash functions are fast, deterministic, and collision-resistant. Bitcoin uses SHA-256.

What a hash function does

The function takes input of any size — a single byte, a kilobyte, a terabyte — and produces output of fixed size:

  • SHA-256 — 256-bit output (32 bytes, displayed as 64 hex characters).
  • SHA-512 — 512-bit output.
  • Keccak-256 (used by Ethereum) — 256-bit output.
  • BLAKE2/BLAKE3 — variable output sizes.

The same input always produces the same output. Different inputs almost always produce different outputs (collisions are theoretically possible but cryptographically infeasible to find).

Required properties

For cryptographic use, hash functions must satisfy:

  • Deterministic. Identical inputs produce identical outputs.
  • Fast computation. Hashing is cheap; verification is essentially free.
  • Pre-image resistance. Given a hash output, finding any input that produces it should require approximately 2^n operations (where n is the output bit length).
  • Second-pre-image resistance. Given an input and its hash, finding a different input with the same hash should be infeasible.
  • Collision resistance. Finding any two different inputs that hash to the same output should be infeasible (limited by the birthday paradox to ~2^(n/2) operations).
  • Avalanche effect. Small changes in input produce drastically different outputs. Flipping one bit changes about half the output bits on average.

A function that fails any of these is unsuitable for cryptographic use.

Common hash functions

A few worth knowing:

  • SHA-256 — designed by NSA, standardized by NIST. Used by Bitcoin and many other systems.
  • Keccak (SHA-3) — different mathematical structure than SHA-2. Used by Ethereum.
  • MD5 — older, broken. Collisions can be generated; should not be used cryptographically.
  • SHA-1 — also broken. Practical collisions demonstrated in 2017.
  • BLAKE2/BLAKE3 — modern, fast, secure.
  • RIPEMD-160 — used in some Bitcoin address derivation.

Different functions are appropriate for different purposes — performance vs. security trade-offs, output size considerations, ecosystem compatibility.

How hash functions are constructed

Modern cryptographic hash functions use one of several construction paradigms:

  • Merkle-Damgård — applies a compression function repeatedly. Used by SHA-2 family.
  • Sponge construction — used by Keccak/SHA-3. Different mathematical properties; more flexible.
  • HAIFA — incremental design used by some functions.

These designs produce different security properties and resistance to specific attack types.

What "broken" means for a hash function

A hash function is "broken" when:

  • Practical attacks exist that violate its security properties.
  • The attacks are feasible with current computing resources.
  • Known weaknesses suggest more attacks may follow.

MD5 was broken in the early 2000s when practical collision attacks emerged. SHA-1 was effectively broken in 2017. Both are still used in some non-security-critical contexts (file checksums) but should not be used cryptographically.

SHA-256, Keccak-256, and BLAKE family functions remain unbroken as of 2025. The cryptographic community continues to evaluate them; no successful practical attacks have been demonstrated.

In blockchains

Hash functions are foundational to blockchain operation:

  • Block linking — each block references the previous via its hash.
  • Mining — proof-of-work requires finding a hash with specific properties.
  • Transaction IDs — each transaction is identified by its hash.
  • Merkle trees — efficient verification of large data sets.
  • Address derivation — wallet addresses derived from public-key hashes.

Different chains' choice of hash function reflects their security model and design philosophy.

Quantum resistance

Quantum computers could potentially weaken some cryptographic primitives, including some hash functions. The current consensus:

  • Symmetric primitives (including hash functions) are believed to be relatively quantum-resistant. A quantum attack on SHA-256 would require ~2^128 operations (square root of classical) — still infeasible.
  • Asymmetric primitives (public-key cryptography, including signatures) are more vulnerable. Bitcoin signatures could be forged by sufficiently powerful quantum computers.

Most blockchain communities are evaluating post-quantum signature schemes. Hash functions themselves likely don't need replacement, just possibly larger output sizes for additional margin.

Performance considerations

Hash function speed varies significantly:

  • BLAKE3 — among the fastest, often 10x+ faster than SHA-256.
  • SHA-256 (with hardware acceleration) — fast on modern CPUs with AES-NI / SHA extensions.
  • SHA-256 (software-only) — slower, but still fast enough for most uses.
  • Keccak/SHA-3 — slower than SHA-256 in software.

Performance matters for some applications (high-throughput databases, content-addressable storage). For typical blockchain use, the differences are often immaterial since the hash computation isn't the bottleneck.

Beyond crypto

Hash functions appear throughout computing:

  • Hash tables — primary data structure mapping keys to values.
  • Caching — content-addressable storage.
  • Deduplication — identify identical data across distributed systems.
  • Database indexing — efficient lookup structures.
  • Software distribution — verify download integrity.
  • Password hashing — store hashes, not passwords (with additional considerations like salting).

The cryptographic-grade hash functions used in blockchains are the strongest variety. Lighter hash functions (FNV, MurmurHash) are used for non-security-critical applications where performance matters more than collision resistance.