GIN Index Pending List Overflow: Fast Writes, Slow Searches
This one looked like random stalls until the kernel admitted it was the pending list. “Full-text search went from 5ms to 500ms after a data migration.” The team had just loaded 100,000 documents from a legacy system, and now their search feature was unusable. They suspected something wrong with the data—maybe bad UTF-8 encoding or documents that were too large. But the data was fine. The index was fine. The problem was something most developers never think about: the GIN index’s pending list.
GIN (Generalized Inverted Index) indexes are brilliant for full-text search, JSONB queries, and array operations. They map each word (or key) to a list of documents containing that word. This makes searches fast—you look up the word, get the list of documents, done. But this structure makes inserts slow, because inserting a single document might update hundreds of index entries (one per word in the document).
To solve the insert performance problem, GIN indexes use a clever trick: the pending list. Instead of immediately updating the main index for each insert, GIN appends entries to an unsorted list. Periodic cleanup merges the pending list into the main index. This makes inserts fast, but there’s a catch: searches must scan the entire pending list in addition to the main index. If the pending list grows large, searches become slow.
Environment: PostgreSQL 14+, GIN indexes for full-text search or JSONB, bulk data loads or high-write workloads
The Problem
The Slow Search Incident
The pattern is always the same: searches were fast before a bulk operation, slow after:
-- Before bulk insert: Fast search
EXPLAIN ANALYZE SELECT * FROM documents
WHERE tsv @@ to_tsquery('postgresql');
-- Index Scan using documents_tsv_gin
-- Execution Time: 5.234 ms
-- After bulk insert of 100,000 documents
EXPLAIN ANALYZE SELECT * FROM documents
WHERE tsv @@ to_tsquery('postgresql');
-- Index Scan using documents_tsv_gin
-- Execution Time: 523.891 ms -- 100x slower!
-- What happened?
The query plan looks identical—it’s still using the index. But the execution time jumped 100x. Something changed inside the index that the query plan doesn’t reveal.
The Hidden Pending List
The pg_stat_user_indexes view doesn’t show pending list statistics directly, but the pageinspect extension lets you peek inside the GIN index’s metadata page:
-- Check pending list size
SELECT
indexrelid::regclass AS index_name,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
n_pending_pages,
n_pending_tuples
FROM pg_stat_user_indexes
JOIN pg_index USING (indexrelid)
LEFT JOIN pg_class c ON c.oid = indexrelid
CROSS JOIN LATERAL (
SELECT * FROM gin_metapage_info(get_raw_page(indexrelid::regclass::text, 0))
) meta
WHERE indexprs IS NOT NULL OR indkey::text LIKE '%gin%';
-- Output:
-- index_name | index_size | n_pending_pages | n_pending_tuples
-- documents_tsv_gin | 250 MB | 1523 | 98,234
-- ^^^^ ^^^^^^
-- These pending entries make search slow!
Over 98,000 pending tuples spread across 1,500 pages. Every search must scan all of these in addition to the main index structure. That’s where the 100x slowdown comes from.
Root Cause
How GIN Indexes Work
Understanding the pending list requires understanding GIN’s structure:
GIN Index Structure:
┌─────────────────────────────────────────────────────────────┐
│ Main B-tree: word → list of document IDs │
│ │
│ "postgresql" → [doc1, doc5, doc42, doc789, ...] │
│ "database" → [doc1, doc2, doc3, doc5, ...] │
│ "index" → [doc5, doc42, doc100, ...] │
│ │
│ Inserting a document means updating MANY entries │
│ Document with 100 words = 100 index insertions! │
└─────────────────────────────────────────────────────────────┘
To speed up inserts, GIN uses a "pending list":
┌─────────────────────────────────────────────────────────────┐
│ Pending List (unsorted, fast to append): │
│ │
│ doc1001: "postgresql", "rocks", "performance" │
│ doc1002: "gin", "index", "pending", "list" │
│ doc1003: "full", "text", "search", "postgresql" │
│ ... │
│ │
│ During INSERT: Just append to pending list (fast!) │
│ During SEARCH: Must scan ALL pending entries (slow!) │
│ During VACUUM: Pending list merged into main B-tree │
└─────────────────────────────────────────────────────────────┘
The main index is highly optimized for lookup—it’s a B-tree where you can jump directly to “postgresql” and find all documents containing that word. But the pending list is just a sequential list of recent inserts. To search it, you must scan every entry and check if it matches your search term.
This trade-off makes sense for typical workloads. A few thousand pending entries add microseconds to searches while making inserts much faster. Problems arise when the pending list grows to tens or hundreds of thousands of entries.
When Pending List Gets Too Big
Under normal operation, the pending list stays small because autovacuum periodically merges it into the main index:
-- Default: pending list auto-cleans at ~4MB of data
-- But during bulk inserts, it can grow much larger
-- Check gin_pending_list_limit
SHOW gin_pending_list_limit;
-- 4096kB (4MB)
-- During bulk insert:
-- - Inserts are fast (just append to pending list)
-- - Pending list grows to 100MB+
-- - Every search must scan 100MB of unsorted data
-- - VACUUM eventually cleans it, but may not run during bulk load
The 4MB default is a trigger, not a hard limit. When the pending list reaches 4MB, the next insert will trigger a merge. But during rapid bulk inserts, autovacuum might not keep up—each merge takes time, and new data arrives faster than it can be processed.
The VACUUM Delay
Several factors can prevent autovacuum from cleaning the pending list:
-- Autovacuum settings affect GIN cleanup
SHOW autovacuum_naptime; -- 1min (default)
SHOW autovacuum_vacuum_threshold; -- 50 (rows)
-- But during bulk insert:
-- 1. You might disable autovacuum for speed
-- 2. Autovacuum might not keep up
-- 3. Long transaction blocks vacuum
-- Result: Pending list grows unbounded
Many bulk load procedures disable autovacuum for performance, which is reasonable—you don’t want vacuum fighting your inserts for I/O. But forgetting to run vacuum after the load leaves the pending list bloated.
Long-running transactions also block vacuum. If your bulk load runs as a single transaction, vacuum can’t clean the pending list until the transaction commits. Meanwhile, the pending list grows larger with every insert.
Diagnosis
Check Pending List Size
The pageinspect extension provides visibility into GIN internals:
-- Use pageinspect extension
CREATE EXTENSION IF NOT EXISTS pageinspect;
-- Check GIN meta page for pending info
SELECT *
FROM gin_metapage_info(get_raw_page('documents_tsv_gin', 0));
-- pending_head: 1234
-- pending_tail: 2345
-- n_pending_pages: 1523 -- High = problem
-- n_pending_tuples: 98234
-- Rule of thumb:
-- n_pending_pages > 100 = likely causing slowdown
-- n_pending_tuples > 10000 = definitely slow
The n_pending_pages and n_pending_tuples values tell you how much work every search must do beyond the normal index lookup. If these numbers are high, searches will be slow regardless of what the query plan says.
Compare Query Plans
The query plan itself doesn’t reveal pending list overhead, but buffer statistics can hint at it:
-- Disable pending list for comparison
SET gin_fuzzy_search_limit = 0; -- (doesn't help directly)
-- Force index rescan
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM documents
WHERE tsv @@ to_tsquery('postgresql');
-- Look for:
-- "Buffers: shared hit=15234" -- Lots of buffers = scanning pending list
High buffer hit counts during what should be a simple index lookup suggest that PostgreSQL is scanning lots of data—likely the pending list.
Monitor Pending List Growth
For ongoing visibility, track the pending list size over time:
-- Track pending list over time
CREATE TABLE gin_pending_stats (
ts timestamp DEFAULT now(),
index_name text,
pending_pages int,
pending_tuples int
);
-- Periodic capture
INSERT INTO gin_pending_stats (index_name, pending_pages, pending_tuples)
SELECT 'documents_tsv_gin',
(SELECT pending_pages FROM gin_metapage_info(get_raw_page('documents_tsv_gin', 0))),
(SELECT n_pending_tuples FROM gin_metapage_info(get_raw_page('documents_tsv_gin', 0)));
If you see pending tuples growing steadily without dropping, autovacuum isn’t keeping up. If you see spikes after bulk operations, you need explicit vacuums in your load procedures.
The Fix
Option 1: Force Pending List Cleanup
The immediate fix is to run VACUUM:
-- Clean pending list without full vacuum
VACUUM documents;
-- Or more aggressive:
VACUUM (VERBOSE) documents;
-- Check result
SELECT n_pending_pages, n_pending_tuples
FROM gin_metapage_info(get_raw_page('documents_tsv_gin', 0));
-- Should show 0 or very low numbers now
Regular VACUUM merges the pending list into the main index. It’s fast—much faster than VACUUM FULL—and doesn’t lock the table. After vacuum, searches return to normal speed.
Option 2: Tune gin_pending_list_limit
Adjusting the pending list threshold trades insert speed for search consistency:
-- Lower limit = more frequent cleanup, slower inserts, faster searches
SET gin_pending_list_limit = '1MB'; -- Stricter
-- Or disable pending list entirely (for small indexes)
SET gin_pending_list_limit = 0;
-- Now every INSERT immediately updates main index
-- Inserts slower, searches always fast
-- Per-index setting (PostgreSQL 14+):
ALTER INDEX documents_tsv_gin SET (gin_pending_list_limit = 1024);
For read-heavy workloads where search latency matters more than insert throughput, a lower limit (or zero) makes sense. For write-heavy workloads with occasional searches, the default or higher might be better.
Option 3: Explicit Cleanup After Bulk Load
Integrate vacuum into your bulk load procedures:
-- Bulk load pattern with cleanup
BEGIN;
-- Disable autovacuum during load
ALTER TABLE documents SET (autovacuum_enabled = false);
-- Do bulk insert
COPY documents FROM '/data/import.csv';
-- Re-enable autovacuum
ALTER TABLE documents SET (autovacuum_enabled = true);
COMMIT;
-- Immediate cleanup
VACUUM documents;
REINDEX INDEX CONCURRENTLY documents_tsv_gin;
The vacuum cleans the pending list. The reindex is optional but ensures the main index is optimally organized after a large data addition.
Option 4: Use fastupdate = off for Predictable Performance
For workloads where consistent search performance matters more than insert speed, disable the pending list entirely:
-- Create GIN index without pending list
CREATE INDEX documents_tsv_gin ON documents USING gin(tsv)
WITH (fastupdate = off);
-- Or alter existing
ALTER INDEX documents_tsv_gin SET (fastupdate = off);
REINDEX INDEX CONCURRENTLY documents_tsv_gin;
-- Trade-off:
-- fastupdate = on: Inserts fast, searches may be slow
-- fastupdate = off: Inserts slow, searches always fast
With fastupdate = off, every insert immediately updates the main index. This makes inserts slower (roughly 3-10x depending on document size), but guarantees consistent search performance regardless of insert patterns.
This is the right choice for systems where search latency is critical and insert volume is moderate. It’s the wrong choice for high-throughput logging systems where you’re writing thousands of documents per second.
Monitoring
Set up alerts to catch pending list growth before it affects users:
groups:
- name: postgresql-gin
rules:
- alert: GINPendingListLarge
expr: |
pg_gin_pending_tuples > 10000
for: 5m
labels:
severity: warning
annotations:
summary: "GIN index has large pending list - may slow searches"
- alert: GINPendingPagesHigh
expr: |
pg_gin_pending_pages > 100
for: 5m
labels:
severity: warning
annotations:
summary: "GIN index pending pages high - consider VACUUM"
These alerts catch the problem before users notice slow searches. When they fire, run VACUUM to clean the pending list.
Checklist
## GIN Index Pending List Overflow
### Symptoms
- [ ] Full-text search suddenly slow
- [ ] Slowdown correlates with bulk inserts
- [ ] VACUUM speeds things up temporarily
- [ ] Query plans show high buffer reads
### Diagnosis
- [ ] Check n_pending_pages in gin_metapage_info
- [ ] Compare pre/post insert query performance
- [ ] Check autovacuum activity during bulk loads
- [ ] Monitor pending list size over time
### Fixes
- [ ] Run VACUUM after bulk inserts
- [ ] Lower gin_pending_list_limit
- [ ] Consider fastupdate = off for read-heavy workloads
- [ ] Include VACUUM in bulk load scripts
### Prevention
- [ ] Monitor GIN pending list metrics
- [ ] Schedule VACUUM after known bulk operations
- [ ] Test search performance after migrations
- [ ] Document expected pending list behavior
Conclusion
GIN indexes trade search speed for insert speed via the pending list. This trade-off is usually invisible—searches stay fast because the pending list stays small. But during bulk operations, the pending list can grow large enough to dominate search time.
The key insight is that “index scan” in the query plan doesn’t mean “fast.” It means “using the index,” which includes scanning the entire pending list. When the pending list is large, even index scans are slow.
Key points to remember:
- Pending list is searched linearly—thousands of entries = slow searches
- VACUUM cleans pending list—run it after bulk loads, always
- fastupdate = off gives predictable search performance at the cost of slower inserts
- Monitor n_pending_pages as a leading indicator of search performance
The next time you bulk load data into a GIN-indexed table, add VACUUM to your script. Your users’ search experience depends on it.
Related Articles
- PostgreSQL HOT Updates Index Trap - Another index pitfall
- work_mem Parallel Workers OOM - PostgreSQL memory issues
Related posts
PostgreSQL Partial Index: Planner Ignores Your Index
Query scans full table despite perfect partial index. The cause: query's WHERE clause doesn't match the index predicate exactly, or statistics mislead the planner.
ICU Collation Version Drift: When Database Upgrades Break Your Indexes
Query returns wrong results after OS upgrade. The cause: ICU library version changed, collation rules shifted, and indexes are now sorted inconsistently with the new sort order.
PostgreSQL Read Replica Conflicts: Why Your Queries Get Canceled
Queries on read replicas fail with 'canceling statement due to conflict with recovery'. The fix depends on which of the 5 conflict types you have - here's how to diagnose and solve each one.
PostgreSQL OOM by Design: work_mem × Parallel Workers × Plan Nodes
work_mem looks small at 256MB, but a parallel hash join with 4 workers across 3 plan nodes uses 3GB. Here's how to prevent PostgreSQL from legitimately OOMing your container.
Cite this article
If you reference this post, please link to the original URL and credit the author.