Back to blog

GIN Index Pending List Overflow: Fast Writes, Slow Searches

|
| postgresql, debugging, indexes, full-text-search, performance

This one looked like random stalls until the kernel admitted it was the pending list. “Full-text search went from 5ms to 500ms after a data migration.” The team had just loaded 100,000 documents from a legacy system, and now their search feature was unusable. They suspected something wrong with the data—maybe bad UTF-8 encoding or documents that were too large. But the data was fine. The index was fine. The problem was something most developers never think about: the GIN index’s pending list.

GIN (Generalized Inverted Index) indexes are brilliant for full-text search, JSONB queries, and array operations. They map each word (or key) to a list of documents containing that word. This makes searches fast—you look up the word, get the list of documents, done. But this structure makes inserts slow, because inserting a single document might update hundreds of index entries (one per word in the document).

To solve the insert performance problem, GIN indexes use a clever trick: the pending list. Instead of immediately updating the main index for each insert, GIN appends entries to an unsorted list. Periodic cleanup merges the pending list into the main index. This makes inserts fast, but there’s a catch: searches must scan the entire pending list in addition to the main index. If the pending list grows large, searches become slow.

Environment: PostgreSQL 14+, GIN indexes for full-text search or JSONB, bulk data loads or high-write workloads

The Problem

The Slow Search Incident

The pattern is always the same: searches were fast before a bulk operation, slow after:

-- Before bulk insert: Fast search
EXPLAIN ANALYZE SELECT * FROM documents
WHERE tsv @@ to_tsquery('postgresql');
-- Index Scan using documents_tsv_gin
-- Execution Time: 5.234 ms

-- After bulk insert of 100,000 documents
EXPLAIN ANALYZE SELECT * FROM documents
WHERE tsv @@ to_tsquery('postgresql');
-- Index Scan using documents_tsv_gin
-- Execution Time: 523.891 ms  -- 100x slower!

-- What happened?

The query plan looks identical—it’s still using the index. But the execution time jumped 100x. Something changed inside the index that the query plan doesn’t reveal.

The Hidden Pending List

The pg_stat_user_indexes view doesn’t show pending list statistics directly, but the pageinspect extension lets you peek inside the GIN index’s metadata page:

-- Check pending list size
SELECT
    indexrelid::regclass AS index_name,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
    n_pending_pages,
    n_pending_tuples
FROM pg_stat_user_indexes
JOIN pg_index USING (indexrelid)
LEFT JOIN pg_class c ON c.oid = indexrelid
CROSS JOIN LATERAL (
    SELECT * FROM gin_metapage_info(get_raw_page(indexrelid::regclass::text, 0))
) meta
WHERE indexprs IS NOT NULL OR indkey::text LIKE '%gin%';

-- Output:
-- index_name        | index_size | n_pending_pages | n_pending_tuples
-- documents_tsv_gin | 250 MB     | 1523            | 98,234
--                                  ^^^^              ^^^^^^
--                         These pending entries make search slow!

Over 98,000 pending tuples spread across 1,500 pages. Every search must scan all of these in addition to the main index structure. That’s where the 100x slowdown comes from.

Root Cause

How GIN Indexes Work

Understanding the pending list requires understanding GIN’s structure:

GIN Index Structure:
┌─────────────────────────────────────────────────────────────┐
│ Main B-tree: word → list of document IDs                   │
│                                                             │
│ "postgresql" → [doc1, doc5, doc42, doc789, ...]            │
│ "database"   → [doc1, doc2, doc3, doc5, ...]               │
│ "index"      → [doc5, doc42, doc100, ...]                   │
│                                                             │
│ Inserting a document means updating MANY entries            │
│ Document with 100 words = 100 index insertions!             │
└─────────────────────────────────────────────────────────────┘

To speed up inserts, GIN uses a "pending list":
┌─────────────────────────────────────────────────────────────┐
│ Pending List (unsorted, fast to append):                   │
│                                                             │
│ doc1001: "postgresql", "rocks", "performance"              │
│ doc1002: "gin", "index", "pending", "list"                 │
│ doc1003: "full", "text", "search", "postgresql"            │
│ ...                                                        │
│                                                             │
│ During INSERT: Just append to pending list (fast!)          │
│ During SEARCH: Must scan ALL pending entries (slow!)        │
│ During VACUUM: Pending list merged into main B-tree        │
└─────────────────────────────────────────────────────────────┘

The main index is highly optimized for lookup—it’s a B-tree where you can jump directly to “postgresql” and find all documents containing that word. But the pending list is just a sequential list of recent inserts. To search it, you must scan every entry and check if it matches your search term.

This trade-off makes sense for typical workloads. A few thousand pending entries add microseconds to searches while making inserts much faster. Problems arise when the pending list grows to tens or hundreds of thousands of entries.

When Pending List Gets Too Big

Under normal operation, the pending list stays small because autovacuum periodically merges it into the main index:

-- Default: pending list auto-cleans at ~4MB of data
-- But during bulk inserts, it can grow much larger

-- Check gin_pending_list_limit
SHOW gin_pending_list_limit;
-- 4096kB (4MB)

-- During bulk insert:
-- - Inserts are fast (just append to pending list)
-- - Pending list grows to 100MB+
-- - Every search must scan 100MB of unsorted data
-- - VACUUM eventually cleans it, but may not run during bulk load

The 4MB default is a trigger, not a hard limit. When the pending list reaches 4MB, the next insert will trigger a merge. But during rapid bulk inserts, autovacuum might not keep up—each merge takes time, and new data arrives faster than it can be processed.

The VACUUM Delay

Several factors can prevent autovacuum from cleaning the pending list:

-- Autovacuum settings affect GIN cleanup
SHOW autovacuum_naptime;           -- 1min (default)
SHOW autovacuum_vacuum_threshold;  -- 50 (rows)

-- But during bulk insert:
-- 1. You might disable autovacuum for speed
-- 2. Autovacuum might not keep up
-- 3. Long transaction blocks vacuum
-- Result: Pending list grows unbounded

Many bulk load procedures disable autovacuum for performance, which is reasonable—you don’t want vacuum fighting your inserts for I/O. But forgetting to run vacuum after the load leaves the pending list bloated.

Long-running transactions also block vacuum. If your bulk load runs as a single transaction, vacuum can’t clean the pending list until the transaction commits. Meanwhile, the pending list grows larger with every insert.

Diagnosis

Check Pending List Size

The pageinspect extension provides visibility into GIN internals:

-- Use pageinspect extension
CREATE EXTENSION IF NOT EXISTS pageinspect;

-- Check GIN meta page for pending info
SELECT *
FROM gin_metapage_info(get_raw_page('documents_tsv_gin', 0));
-- pending_head: 1234
-- pending_tail: 2345
-- n_pending_pages: 1523  -- High = problem
-- n_pending_tuples: 98234

-- Rule of thumb:
-- n_pending_pages > 100 = likely causing slowdown
-- n_pending_tuples > 10000 = definitely slow

The n_pending_pages and n_pending_tuples values tell you how much work every search must do beyond the normal index lookup. If these numbers are high, searches will be slow regardless of what the query plan says.

Compare Query Plans

The query plan itself doesn’t reveal pending list overhead, but buffer statistics can hint at it:

-- Disable pending list for comparison
SET gin_fuzzy_search_limit = 0;  -- (doesn't help directly)

-- Force index rescan
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM documents
WHERE tsv @@ to_tsquery('postgresql');

-- Look for:
-- "Buffers: shared hit=15234"  -- Lots of buffers = scanning pending list

High buffer hit counts during what should be a simple index lookup suggest that PostgreSQL is scanning lots of data—likely the pending list.

Monitor Pending List Growth

For ongoing visibility, track the pending list size over time:

-- Track pending list over time
CREATE TABLE gin_pending_stats (
    ts timestamp DEFAULT now(),
    index_name text,
    pending_pages int,
    pending_tuples int
);

-- Periodic capture
INSERT INTO gin_pending_stats (index_name, pending_pages, pending_tuples)
SELECT 'documents_tsv_gin',
       (SELECT pending_pages FROM gin_metapage_info(get_raw_page('documents_tsv_gin', 0))),
       (SELECT n_pending_tuples FROM gin_metapage_info(get_raw_page('documents_tsv_gin', 0)));

If you see pending tuples growing steadily without dropping, autovacuum isn’t keeping up. If you see spikes after bulk operations, you need explicit vacuums in your load procedures.

The Fix

Option 1: Force Pending List Cleanup

The immediate fix is to run VACUUM:

-- Clean pending list without full vacuum
VACUUM documents;

-- Or more aggressive:
VACUUM (VERBOSE) documents;

-- Check result
SELECT n_pending_pages, n_pending_tuples
FROM gin_metapage_info(get_raw_page('documents_tsv_gin', 0));
-- Should show 0 or very low numbers now

Regular VACUUM merges the pending list into the main index. It’s fast—much faster than VACUUM FULL—and doesn’t lock the table. After vacuum, searches return to normal speed.

Option 2: Tune gin_pending_list_limit

Adjusting the pending list threshold trades insert speed for search consistency:

-- Lower limit = more frequent cleanup, slower inserts, faster searches
SET gin_pending_list_limit = '1MB';  -- Stricter

-- Or disable pending list entirely (for small indexes)
SET gin_pending_list_limit = 0;
-- Now every INSERT immediately updates main index
-- Inserts slower, searches always fast

-- Per-index setting (PostgreSQL 14+):
ALTER INDEX documents_tsv_gin SET (gin_pending_list_limit = 1024);

For read-heavy workloads where search latency matters more than insert throughput, a lower limit (or zero) makes sense. For write-heavy workloads with occasional searches, the default or higher might be better.

Option 3: Explicit Cleanup After Bulk Load

Integrate vacuum into your bulk load procedures:

-- Bulk load pattern with cleanup
BEGIN;
-- Disable autovacuum during load
ALTER TABLE documents SET (autovacuum_enabled = false);

-- Do bulk insert
COPY documents FROM '/data/import.csv';

-- Re-enable autovacuum
ALTER TABLE documents SET (autovacuum_enabled = true);
COMMIT;

-- Immediate cleanup
VACUUM documents;
REINDEX INDEX CONCURRENTLY documents_tsv_gin;

The vacuum cleans the pending list. The reindex is optional but ensures the main index is optimally organized after a large data addition.

Option 4: Use fastupdate = off for Predictable Performance

For workloads where consistent search performance matters more than insert speed, disable the pending list entirely:

-- Create GIN index without pending list
CREATE INDEX documents_tsv_gin ON documents USING gin(tsv)
WITH (fastupdate = off);

-- Or alter existing
ALTER INDEX documents_tsv_gin SET (fastupdate = off);
REINDEX INDEX CONCURRENTLY documents_tsv_gin;

-- Trade-off:
-- fastupdate = on:  Inserts fast, searches may be slow
-- fastupdate = off: Inserts slow, searches always fast

With fastupdate = off, every insert immediately updates the main index. This makes inserts slower (roughly 3-10x depending on document size), but guarantees consistent search performance regardless of insert patterns.

This is the right choice for systems where search latency is critical and insert volume is moderate. It’s the wrong choice for high-throughput logging systems where you’re writing thousands of documents per second.

Monitoring

Set up alerts to catch pending list growth before it affects users:

groups:
  - name: postgresql-gin
    rules:
      - alert: GINPendingListLarge
        expr: |
          pg_gin_pending_tuples > 10000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "GIN index has large pending list - may slow searches"

      - alert: GINPendingPagesHigh
        expr: |
          pg_gin_pending_pages > 100
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "GIN index pending pages high - consider VACUUM"

These alerts catch the problem before users notice slow searches. When they fire, run VACUUM to clean the pending list.

Checklist

## GIN Index Pending List Overflow

### Symptoms
- [ ] Full-text search suddenly slow
- [ ] Slowdown correlates with bulk inserts
- [ ] VACUUM speeds things up temporarily
- [ ] Query plans show high buffer reads

### Diagnosis
- [ ] Check n_pending_pages in gin_metapage_info
- [ ] Compare pre/post insert query performance
- [ ] Check autovacuum activity during bulk loads
- [ ] Monitor pending list size over time

### Fixes
- [ ] Run VACUUM after bulk inserts
- [ ] Lower gin_pending_list_limit
- [ ] Consider fastupdate = off for read-heavy workloads
- [ ] Include VACUUM in bulk load scripts

### Prevention
- [ ] Monitor GIN pending list metrics
- [ ] Schedule VACUUM after known bulk operations
- [ ] Test search performance after migrations
- [ ] Document expected pending list behavior

Conclusion

GIN indexes trade search speed for insert speed via the pending list. This trade-off is usually invisible—searches stay fast because the pending list stays small. But during bulk operations, the pending list can grow large enough to dominate search time.

The key insight is that “index scan” in the query plan doesn’t mean “fast.” It means “using the index,” which includes scanning the entire pending list. When the pending list is large, even index scans are slow.

Key points to remember:

  1. Pending list is searched linearly—thousands of entries = slow searches
  2. VACUUM cleans pending list—run it after bulk loads, always
  3. fastupdate = off gives predictable search performance at the cost of slower inserts
  4. Monitor n_pending_pages as a leading indicator of search performance

The next time you bulk load data into a GIN-indexed table, add VACUUM to your script. Your users’ search experience depends on it.


Related posts

Cite this article

If you reference this post, please link to the original URL and credit the author.

Michal Drozd. "GIN Index Pending List Overflow: Fast Writes, Slow Searches". https://www.michal-drozd.com/en/blog/gin-index-pending-list-overflow/ (Published April 17, 2025).