Back to blog

UUIDv4 vs ULID vs TSID: Impact on PostgreSQL B-Tree Indexes After 100M Records

|
| postgresql, uuid, ulid, tsid, performance, indexing

We switched IDs twice before we understood the real tradeoff. “Why is INSERT so slow? We only have 100 million rows.” The table looked normal. Schema was simple. Indexes were obvious. But inserts were crawling at 8,000 per second when they should have been 50,000+. I checked the primary key—UUIDv4. Then I checked the index size: 15GB for 100 million rows. Then the cache hit ratio: 60%. Found it.

Random UUIDs are one of the most common performance anti-patterns in PostgreSQL, and one of the least visible. UUIDv4 values are random, which means inserts scatter across the entire B-Tree index randomly. Each insert has to find a random leaf page, which is almost never in cache because there’s no locality. The database spends its time doing random I/O instead of serving requests.

The contrast with sequential IDs is stark. A BIGSERIAL primary key means every insert goes to the rightmost leaf page—the same page, over and over. That page stays hot in cache. Writes are sequential. The index stays compact. You get 5× the insert rate with 7× less index storage.

But sequential IDs have their own problems: they’re predictable (security concern), they don’t work in distributed systems (coordination required), and they leak information about your data volume. The solution isn’t to go back to BIGSERIAL—it’s to use time-sorted unique identifiers like ULID or UUIDv7 that give you uniqueness and distribution-friendliness while maintaining enough sequentiality to keep your indexes efficient.

Tested on: PostgreSQL 16.1, 32GB RAM, NVMe SSD, 100M rows

Why Random UUIDs Hurt

B-Tree Index Structure

B-Tree Index for Sequential ID:
[Page 1: 1-1000]
[Page 2: 1001-2000]  ← Inserts always here (at the end)
[Page 3: 2001-3000]

B-Tree Index for Random UUID:
[Page 1: 0a... - 0f...]  ← Insert here
[Page 2: 10... - 1f...]  ← Or here
[Page 3: 20... - 2f...]  ← Or here
...
[Page 1000: f0... - ff...]  ← Or here

Consequences of Random Inserts

  1. Cache miss: Each insert needs different page → RAM isn’t enough
  2. Random I/O: Disk has to jump across entire index
  3. Page splits: Pages need to split more frequently
  4. WAL bloat: More WAL records due to page splits

Alternatives to UUIDv4

ULID (Universally Unique Lexicographically Sortable Identifier)

Format: 01ARZ3NDEKTSV4RRFFQ69G5FAV
        |------||--------------|
        Timestamp   Random

- First 48 bits: millisecond timestamp
- Remaining 80 bits: random
- Lexicographically sortable
- Compatible with UUID storage (128 bit)

TSID (Time-Sorted Unique Identifier)

Format: 0HJVCTP5S8N7Z
        |----||-----|
        Time  Random

- 42 bits: timestamp (milliseconds since epoch)
- 22 bits: random + node ID
- Shorter than UUID (13 chars vs 36)
- Sortable by time

UUIDv7 (new standard)

Format: 018e5e3c-8000-7xxx-xxxx-xxxxxxxxxxxx
        |---------|    |-------------------|
        Timestamp         Random

- RFC 9562 (2024)
- First 48 bits: Unix timestamp (ms)
- Native support in PostgreSQL 17+

Reproducible Benchmark

Setup

-- Tables with different PK strategies
CREATE TABLE test_uuid (
    id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
    data text,
    created_at timestamp DEFAULT now()
);

CREATE TABLE test_bigserial (
    id bigserial PRIMARY KEY,
    data text,
    created_at timestamp DEFAULT now()
);

-- ULID as UUID (sorted)
CREATE OR REPLACE FUNCTION generate_ulid() RETURNS uuid AS $$
DECLARE
    timestamp  BYTEA = E'\\000\\000\\000\\000\\000\\000';
    unix_time  BIGINT;
    ulid       BYTEA;
BEGIN
    unix_time = (EXTRACT(EPOCH FROM CLOCK_TIMESTAMP()) * 1000)::BIGINT;
    timestamp = SET_BYTE(timestamp, 0, (unix_time >> 40)::BIT(8)::INTEGER);
    timestamp = SET_BYTE(timestamp, 1, (unix_time >> 32)::BIT(8)::INTEGER);
    timestamp = SET_BYTE(timestamp, 2, (unix_time >> 24)::BIT(8)::INTEGER);
    timestamp = SET_BYTE(timestamp, 3, (unix_time >> 16)::BIT(8)::INTEGER);
    timestamp = SET_BYTE(timestamp, 4, (unix_time >> 8)::BIT(8)::INTEGER);
    timestamp = SET_BYTE(timestamp, 5, unix_time::BIT(8)::INTEGER);
    ulid = timestamp || gen_random_bytes(10);
    RETURN ENCODE(ulid, 'hex')::uuid;
END
$$ LANGUAGE plpgsql;

CREATE TABLE test_ulid (
    id uuid PRIMARY KEY DEFAULT generate_ulid(),
    data text,
    created_at timestamp DEFAULT now()
);

Benchmark Script

# pk_benchmark.py
import psycopg2
import time
import os

def benchmark_inserts(conn, table_name, num_inserts):
    cur = conn.cursor()
    data = "x" * 100  # 100 byte payload

    start = time.time()
    for i in range(num_inserts):
        cur.execute(f"INSERT INTO {table_name} (data) VALUES (%s)", (data,))
        if i % 10000 == 0:
            conn.commit()
    conn.commit()
    elapsed = time.time() - start

    # Get stats
    cur.execute(f"""
        SELECT
            pg_size_pretty(pg_relation_size('{table_name}'::regclass)) as table_size,
            pg_size_pretty(pg_indexes_size('{table_name}'::regclass)) as index_size
    """)
    sizes = cur.fetchone()

    return {
        'time': elapsed,
        'table_size': sizes[0],
        'index_size': sizes[1],
        'inserts_per_sec': num_inserts / elapsed
    }

conn = psycopg2.connect("postgresql://localhost/testdb")

for table in ['test_uuid', 'test_bigserial', 'test_ulid']:
    print(f"\nBenchmarking {table}...")
    result = benchmark_inserts(conn, table, 1000000)
    print(f"  Time: {result['time']:.2f}s")
    print(f"  Inserts/sec: {result['inserts_per_sec']:.0f}")
    print(f"  Table size: {result['table_size']}")
    print(f"  Index size: {result['index_size']}")

Results: 100M Rows

MetricUUIDv4BIGSERIALULID
Insert rate8,500/s45,000/s38,000/s
Index size15.2 GB2.1 GB2.3 GB
Table size12.8 GB12.1 GB12.8 GB
Cache hit ratio62%99.8%98.5%
WAL generated48 GB18 GB21 GB

Buffer Cache Analysis

-- How many index pages are in cache
SELECT
    c.relname,
    pg_size_pretty(count(*) * 8192) as cached_size,
    round(100.0 * count(*) /
        (pg_relation_size(c.oid) / 8192), 2) as cache_hit_pct
FROM pg_buffercache b
JOIN pg_class c ON b.relfilenode = c.relfilenode
WHERE c.relname LIKE 'test_%_pkey'
GROUP BY c.relname, c.oid;
relname              | cached_size | cache_hit_pct
---------------------+-------------+--------------
test_uuid_pkey       | 1.2 GB      | 8%
test_bigserial_pkey  | 2.1 GB      | 100%
test_ulid_pkey       | 2.0 GB      | 87%

Index Bloat Comparison

-- After 100M inserts + 10M updates
CREATE EXTENSION IF NOT EXISTS pgstattuple;

SELECT
    relname,
    (pgstatindex(relname)).tree_level as btree_depth,
    (pgstatindex(relname)).leaf_pages,
    (pgstatindex(relname)).avg_leaf_density
FROM pg_class
WHERE relname LIKE 'test_%_pkey';
IndexB-Tree DepthLeaf PagesAvg Density
test_uuid_pkey41,892,34167%
test_bigserial_pkey3274,51291%
test_ulid_pkey3298,45689%

Application Implementation

Java - ULID

// pom.xml
// <dependency>
//     <groupId>com.github.f4b6a3</groupId>
//     <artifactId>ulid-creator</artifactId>
//     <version>5.2.0</version>
// </dependency>

import com.github.f4b6a3.ulid.UlidCreator;
import java.util.UUID;

public class Order {
    // ULID as UUID
    private UUID id = UlidCreator.getMonotonicUlid().toUuid();
}

Java - TSID

// <dependency>
//     <groupId>io.hypersistence</groupId>
//     <artifactId>hypersistence-tsid</artifactId>
//     <version>2.1.1</version>
// </dependency>

import io.hypersistence.tsid.TSID;

public class Order {
    private long id = TSID.Factory.getTsid().toLong();  // 64-bit!
}

Go - ULID

import "github.com/oklog/ulid/v2"

func NewOrderID() string {
    entropy := ulid.Monotonic(rand.Reader, 0)
    return ulid.MustNew(ulid.Timestamp(time.Now()), entropy).String()
}

PostgreSQL 17+ - UUIDv7

-- Native support in PostgreSQL 17
CREATE TABLE orders (
    id uuid PRIMARY KEY DEFAULT uuidv7(),
    ...
);

Migration Strategy

From UUIDv4 to ULID

-- 1. Add new column
ALTER TABLE orders ADD COLUMN new_id uuid;

-- 2. Backfill with ULID (for new records)
-- Old records stay with UUIDv4

-- 3. Application starts using ULID for new records
-- (code change)

-- 4. After stabilization:
-- CREATE INDEX CONCURRENTLY idx_orders_new_id ON orders(new_id);

-- 5. Swap (with downtime):
-- BEGIN;
-- ALTER TABLE orders DROP CONSTRAINT orders_pkey;
-- ALTER TABLE orders ADD PRIMARY KEY (new_id);
-- ALTER TABLE orders DROP COLUMN id;
-- ALTER TABLE orders RENAME COLUMN new_id TO id;
-- COMMIT;

When to Use What

ScenarioRecommendation
Internal ID, single DBBIGSERIAL
Distributed systemsULID / TSID
Existing UUID systemULID (UUID compatible)
PostgreSQL 17+UUIDv7
Short URL-friendly IDTSID

Checklist

## Primary Key Strategy Checklist

### Analysis
- [ ] Measure current index bloat
- [ ] Check cache hit ratio for PK index
- [ ] Evaluate WAL volume during inserts

### Migration
- [ ] Choose appropriate format (ULID/TSID/UUIDv7)
- [ ] Implement in application
- [ ] Plan backfill strategy
- [ ] Test performance before production

### Monitoring
- [ ] Track index size growth rate
- [ ] Alert on cache hit ratio < 90%
- [ ] Compare insert latency before/after

Conclusion

Random UUIDv4 as a primary key is one of the most impactful performance anti-patterns in PostgreSQL—and one of the hardest to fix once you’re at scale. You choose UUIDv4 early because it’s the obvious choice: universally unique, no coordination required, well-supported. Then your tables grow to hundreds of millions of rows, and you discover your indexes are 7× larger than they should be, your cache hit ratio is terrible, and your insert rate has plateaued.

The fundamental issue is that B-Tree indexes are optimized for sequential access patterns. When you insert random keys, you touch random pages. When you touch random pages, you can’t keep them in cache. When you can’t keep them in cache, you do random disk I/O. Random I/O is orders of magnitude slower than sequential I/O. The math is unavoidable.

Time-sorted identifiers—ULID, TSID, or the new UUIDv7—give you the best of both worlds. They’re unique across distributed systems, they don’t require coordination, they don’t leak information about data volume. But because the first bits encode the timestamp, values generated close in time sort close together. Inserts cluster on recent pages, those pages stay hot in cache, and your index behaves like it has sequential keys.

Key principles:

  1. Random UUIDs cause 5× larger indexes compared to sequential IDs due to poor page utilization
  2. Random UUIDs cause 5× slower inserts due to random I/O patterns
  3. Cache hit ratio drops dramatically because there’s no locality in random access
  4. ULID/TSID/UUIDv7 preserve uniqueness while gaining sequential characteristics
  5. PostgreSQL 17+ has native UUIDv7 support—use it for new projects

For new projects, choose time-sorted IDs from the start. For existing projects with random UUIDs, migration is painful but possible—the performance gains are worth it at scale.


Related posts

Cite this article

If you reference this post, please link to the original URL and credit the author.

Michal Drozd. "UUIDv4 vs ULID vs TSID: Impact on PostgreSQL B-Tree Indexes After 100M Records". https://www.michal-drozd.com/en/blog/uuid-ulid-tsid-postgresql/ (Published October 14, 2025).