Data Integrity in Peer-to-Peer Systems

December 30, 2025

Bitcoin is a peer-to-peer system that depends on strong data integrity. Participants have a monetary incentive to tamper with transaction data to gain an advantage, so nodes need to verify what they receive from the network. The most straightforward approach is to download the entire data (for example, an entire block) and recompute its cryptographic hash. But that makes verification expensive: it is bound by bandwidth, storage, and compute.

To make verification more efficient, Bitcoin uses an elegant data structure called Merkle trees. This post walks through how Merkle trees work, how to build them, and how to verify inclusion proofs without downloading everything.

The Key Idea

Since our goal is to avoid downloading all the data, we divide it into chunks. Let denote the entire dataset, with representing the chunk. In Bitcoin, each chunk corresponds to a block.

A Merkle tree is a binary tree of hashes. Each data chunk is hashed to form a leaf node.

A Leaf in a Merkle Tree

In this case a cryptographic hash function is a function that maps arbitrary data to a fixed-size output. For our purposes, it needs two properties:

  • Collision resistance: It should be computationally infeasible to find two different inputs such that .
  • Preimage resistance: Given a hash output , it should be computationally infeasible to find any input such that .

Parent nodes are created by hashing the concatenation of their children. If a level has an odd number of nodes, the last one is duplicated.

A Node in a Merkle Tree

This process repeats until a single root hash remains, committing to the entire dataset.

A Merkle Tree

This structure gives us two key properties:

  • Compact proofs: To prove a chunk is included, you only need the sibling hashes along the path to the root, about hashes.
  • Tamper detection: Changing any chunk invalidates the root, making verification cheap.

Proving Data Inclusion

An inclusion proof is a list of hashes starting from the leaf to the root. The list of hashes can be use to verify that a chunk is part of the tree.

The proof for is generated by the hashes in green and orange in the diagram below. The key advantage is that the proof is tiny: instead of sending an entire data chunk, we only send a handful of fixed-size hashes (32 bytes each for SHA-256 or SHA3-256).

Verification path
Proof (sibling hashes)

Inclusion Proof for D₀

To verify a proof, we start with the hash of the data block and iteratively combine it with each sibling hash in the proof. If the final result matches the known root hash, the proof is valid. This requires only hash operations and (roughly ) bytes of proof data, regardless of the dataset size. Since both number of operations and size of proof data scale logarithmically this ends up super efficient even for very large datasets.

Implementation

Here's a brief implementation in Go.

We add a 1-byte identifier 0x00 when hashing the leaf data and 0x01 to internal nodes to prevent a second preimage attack.
package main

import (
	"bytes"
	"crypto/sha3"
	"fmt"
)

// We start by creating a Node type that has the hash and points to two children nodes. 
// For a leaf the children are simply nil.
type Node struct {
	Hash   []byte
	Left   *Node
	Right  *Node
	Parent *Node
}

// A Merkle tree is then simply a struct that points to the root and leaves.
// The leaves allow us to generate a proof for any leaf.
type MerkleTree struct {
	Root   *Node
	Leaves []*Node
}

// The hash function wraps SHA3. 
// We use it to hash both leaf data and internal nodes.
func hash(data []byte) []byte {
	h := sha3.New256()
	h.Write(data)
	return h.Sum(nil)
}

// We add 0x00 as an identifier for each leaf.
func Leaf(data []byte) []byte {
	return hash(append([]byte{0x00}, data...))
}

// A parent node concatenates its children's hashes and hashes the result.
// We add 0x01 as an identifier for each node.
func Combine(leftHash, rightHash []byte) []byte {
	buf := make([]byte, 0, 1+len(leftHash)+len(rightHash))
	buf = append(buf, 0x01)
	buf = append(buf, leftHash...)
	buf = append(buf, rightHash...)
	return hash(buf)
}

// Generating a merkle tree from chunked data
func New(chunks [][]byte) *MerkleTree {
	if len(chunks) == 0 {
		return &MerkleTree{}
	}

	level := make([]*Node, len(chunks))
	for i, c := range chunks {
		level[i] = &Node{Hash: Leaf(c)}
	}
	leaves := level

	for len(level) > 1 {
		n := len(level)
		next := make([]*Node, (n+1)/2)

		for i := 0; i+1 < n; i += 2 {
			l, r := level[i], level[i+1]
			p := &Node{Hash: Combine(l.Hash, r.Hash), Left: l, Right: r}
			l.Parent, r.Parent = p, p
			next[i/2] = p
		}

		if n%2 == 1 {
			l := level[n-1]
			p := &Node{Hash: Combine(l.Hash, l.Hash), Left: l, Right: l}
			l.Parent = p
			next[n/2] = p
		}

		level = next
	}

	return &MerkleTree{Root: level[0], Leaves: leaves}
}

// Inclusion proof

// The struct we can pass to a client for verification
type Proof struct {
	SiblingHash   []byte
	SiblingOnLeft bool // if true: parent = H(sib || cur), else H(cur || sib)
}

// Generates a short proof of data inclusion
func (t *MerkleTree) Generate(leafIndex int) ([]byte, []Proof) {
	if leafIndex < 0 || leafIndex >= len(t.Leaves) || t.Root == nil {
		return nil, nil
	}

	leaf := t.Leaves[leafIndex]
	cur := leaf

	var proof []Proof
	for cur.Parent != nil {
		p := cur.Parent
		if p.Left == cur {
			// sibling is on the right
			proof = append(proof, Proof{SiblingHash: p.Right.Hash, SiblingOnLeft: false})
		} else {
			// sibling is on the left
			proof = append(proof, Proof{SiblingHash: p.Left.Hash, SiblingOnLeft: true})
		}
		cur = p
	}

	return leaf.Hash, proof
}

// Verifying the proof of data
func Verify(leafHash []byte, proof []Proof, expectedRoot []byte) bool {
	cur := append([]byte(nil), leafHash...)
	for _, step := range proof {
		if step.SiblingOnLeft {
			cur = Combine(step.SiblingHash, cur)
		} else {
			cur = Combine(cur, step.SiblingHash)
		}
	}
	return bytes.Equal(cur, expectedRoot)
}

func main() {
	chunks := [][]byte{[]byte("a"), []byte("b"), []byte("c"), []byte("d")}
	t := New(chunks)

	fmt.Printf("Root = %x, ", t.Root.Hash)

	leafHash, proof := t.Generate(2)
	c := Verify(leafHash, proof, t.Root.Hash)
	fmt.Println("c in tree?", c)
}