PostgreSQL TOAST Strategy: Why Your JSON Column Kills Query Performance
TOAST was invisible to me until storage exploded. “Our API response is slow even though the query takes 5ms.” We were profiling an API endpoint that was supposed to be fast—a simple query fetching a list of orders. The database query showed 5ms execution time. But the API response took 850ms. Where was the missing 845ms?
The answer was TOAST. Each order had a JSONB column storing the full order payload—line items, shipping details, customer preferences—averaging 50KB per document. When we ran SELECT *, PostgreSQL dutifully fetched every one of those 50KB documents from TOAST storage, decompressed them, and sent them to the application. For 100 orders, that was 5MB of JSONB data being read from disk, decompressed, and transferred—even though our API only needed the order ID and status.
TOAST (The Oversized-Attribute Storage Technique) is PostgreSQL’s solution to a fundamental problem: how do you store values larger than a page (8KB) when pages are your basic unit of storage? The answer is to store large values separately, in a special TOAST table, and keep only a pointer in the main row. This works great for storage efficiency but creates a performance trap: every time you read a row, PostgreSQL must make additional I/O to fetch the TOAST data.
The insidious part is that the query planner time doesn’t reflect TOAST fetches. Your EXPLAIN ANALYZE shows 5ms because that’s how long the index lookup and heap access took. The TOAST fetches happen during result streaming and don’t appear in execution time. Your query looks fast, but your actual data transfer is slow.
Tested on: PostgreSQL 16.1, tables with 100K+ rows, JSONB columns 10-100KB
How TOAST Works
Storage Threshold
Default page size: 8KB
TOAST threshold: ~2KB per column value
If value > 2KB:
1. Compress value (if compressible)
2. If still > 2KB: store in separate TOAST table
3. Main table stores pointer to TOAST data
TOAST Table Structure
-- Main table
CREATE TABLE orders (
id bigserial PRIMARY KEY,
customer_id int,
order_data jsonb -- Large JSON stored in TOAST
);
-- PostgreSQL automatically creates:
-- pg_toast.pg_toast_<oid> - TOAST data table
-- pg_toast.pg_toast_<oid>_index - TOAST index
-- Check TOAST table
SELECT relname, reltoastrelid::regclass
FROM pg_class
WHERE relname = 'orders';
TOAST Strategies
-- EXTENDED (default for varlena types)
-- Allows compression and out-of-line storage
ALTER TABLE orders ALTER COLUMN order_data SET STORAGE EXTENDED;
-- EXTERNAL
-- No compression, out-of-line storage
-- Faster for pre-compressed data (already compressed JSON)
ALTER TABLE orders ALTER COLUMN order_data SET STORAGE EXTERNAL;
-- MAIN
-- Compression, avoid out-of-line if possible
-- Good for data you always need
ALTER TABLE orders ALTER COLUMN order_data SET STORAGE MAIN;
-- PLAIN
-- No compression, no out-of-line (will error if too large)
-- For data that must fit in main tuple
ALTER TABLE orders ALTER COLUMN small_data SET STORAGE PLAIN;
The Performance Problem
SELECT * with TOAST
-- Each row fetches from TOAST table
SELECT * FROM orders WHERE customer_id = 123;
-- Behind the scenes:
-- 1. Find row in main table: 0.1ms
-- 2. Fetch TOAST data for order_data: 2-10ms per row!
-- 3. Decompress if needed: 0.5ms
-- 100 rows × 5ms = 500ms just for TOAST fetches!
Benchmark: TOAST Impact
-- Create test table
CREATE TABLE orders_test (
id serial PRIMARY KEY,
customer_id int,
status text,
order_data jsonb -- 50KB average
);
-- Insert 100K rows with large JSON
INSERT INTO orders_test (customer_id, status, order_data)
SELECT
random() * 1000,
'active',
jsonb_build_object(
'items', (SELECT jsonb_agg(jsonb_build_object(
'product_id', i,
'name', repeat('x', 100),
'price', random() * 100
)) FROM generate_series(1, 100) i)
)
FROM generate_series(1, 100000);
-- Query with TOAST fetch
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders_test WHERE customer_id = 42;
-- Execution Time: 850ms
-- Buffers: shared hit=12000 (lots of TOAST reads)
-- Query without TOAST column
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, customer_id, status FROM orders_test WHERE customer_id = 42;
-- Execution Time: 8ms
-- Buffers: shared hit=120
Solutions
1. Don’t SELECT *
-- BAD: Fetches all TOAST data
SELECT * FROM orders WHERE customer_id = 123;
-- GOOD: Only fetch what you need
SELECT id, customer_id, status, created_at
FROM orders WHERE customer_id = 123;
-- Fetch large column only when needed
SELECT order_data FROM orders WHERE id = 456;
2. Use EXTERNAL for Pre-compressed Data
-- If storing already compressed data (gzipped JSON, images)
ALTER TABLE documents
ALTER COLUMN compressed_data SET STORAGE EXTERNAL;
-- Skips PostgreSQL compression (faster insert/read)
-- Still stores out-of-line if large
3. Consider Separate Table
-- Instead of one wide table
CREATE TABLE orders (
id bigserial PRIMARY KEY,
customer_id int,
status text,
order_data jsonb -- Large!
);
-- Split into two tables
CREATE TABLE orders (
id bigserial PRIMARY KEY,
customer_id int,
status text
);
CREATE TABLE order_details (
order_id bigint PRIMARY KEY REFERENCES orders(id),
order_data jsonb
);
-- Common queries don't touch large data
SELECT id, customer_id, status FROM orders WHERE customer_id = 123;
-- Large data only when needed
SELECT o.*, od.order_data
FROM orders o
JOIN order_details od ON od.order_id = o.id
WHERE o.id = 456;
4. Use JSONB Partial Indexing
-- Index specific JSON paths
CREATE INDEX idx_orders_status ON orders ((order_data->>'status'));
-- Query uses index, doesn't need full TOAST fetch
SELECT id FROM orders WHERE order_data->>'status' = 'shipped';
5. Normalize Hot Fields
-- Instead of querying JSON for common filters
SELECT * FROM orders WHERE order_data->>'status' = 'active';
-- Add regular column for hot fields
ALTER TABLE orders ADD COLUMN status text;
UPDATE orders SET status = order_data->>'status';
CREATE INDEX idx_orders_status ON orders(status);
-- Now queries are fast
SELECT id, status FROM orders WHERE status = 'active';
Monitoring TOAST
Check TOAST Size
-- TOAST table size per table
SELECT
c.relname AS table_name,
pg_size_pretty(pg_relation_size(c.oid)) AS table_size,
pg_size_pretty(pg_relation_size(c.reltoastrelid)) AS toast_size,
pg_size_pretty(pg_total_relation_size(c.oid)) AS total_size
FROM pg_class c
WHERE c.relkind = 'r'
AND c.reltoastrelid != 0
ORDER BY pg_relation_size(c.reltoastrelid) DESC
LIMIT 10;
TOAST Statistics
-- TOAST access statistics
SELECT
schemaname,
relname,
seq_scan,
seq_tup_read,
idx_scan,
n_tup_ins,
n_tup_upd
FROM pg_stat_all_tables
WHERE relname LIKE 'pg_toast_%'
ORDER BY seq_tup_read DESC
LIMIT 10;
Column Storage Strategy
-- Check current storage strategy
SELECT
a.attname,
t.typname,
CASE a.attstorage
WHEN 'p' THEN 'PLAIN'
WHEN 'e' THEN 'EXTERNAL'
WHEN 'm' THEN 'MAIN'
WHEN 'x' THEN 'EXTENDED'
END AS storage
FROM pg_attribute a
JOIN pg_type t ON a.atttypid = t.oid
WHERE a.attrelid = 'orders'::regclass
AND a.attnum > 0;
TOAST and Indexes
Index-Only Scans Blocked
-- This won't use index-only scan if order_data is TOAST'd
CREATE INDEX idx_orders_customer ON orders(customer_id) INCLUDE (order_data);
-- PostgreSQL can't return TOAST'd data from index alone
-- Must visit heap for detoasting
Covering Index for Small Columns
-- Works for non-TOAST columns
CREATE INDEX idx_orders_customer_covering
ON orders(customer_id)
INCLUDE (status, created_at); -- Small columns only
-- Can use index-only scan
EXPLAIN SELECT status, created_at FROM orders WHERE customer_id = 123;
-- -> Index Only Scan
TOAST Compression
Check Compression Ratio
-- Actual vs stored size
SELECT
id,
pg_column_size(order_data) AS stored_size,
length(order_data::text) AS raw_size,
round(100.0 * pg_column_size(order_data) / length(order_data::text), 1) AS compression_pct
FROM orders
LIMIT 10;
When to Disable Compression
-- Already compressed data (images, pre-compressed JSON)
ALTER TABLE media ALTER COLUMN image_data SET STORAGE EXTERNAL;
-- Random/encrypted data (won't compress anyway)
ALTER TABLE secrets ALTER COLUMN encrypted_data SET STORAGE EXTERNAL;
Checklist
## TOAST Optimization
### Analysis
- [ ] Identify tables with large TOAST sizes
- [ ] Find queries doing SELECT * on TOAST tables
- [ ] Check TOAST access frequency in pg_stat_all_tables
### Query Optimization
- [ ] Replace SELECT * with specific columns
- [ ] Only fetch large columns when needed
- [ ] Add regular columns for hot JSON fields
### Storage Strategy
- [ ] EXTENDED: compressible data (default)
- [ ] EXTERNAL: pre-compressed data
- [ ] MAIN: always-needed medium data
- [ ] Consider table splitting for very large columns
### Monitoring
- [ ] Track TOAST table sizes
- [ ] Monitor TOAST sequential scans
- [ ] Alert on excessive TOAST growth
Conclusion
TOAST is one of those PostgreSQL internals that works perfectly until it doesn’t. For tables with small rows, you never think about it—large values are transparently stored out-of-line and transparently fetched when needed. But as your JSONB columns grow, that transparency becomes a performance liability. The query looks fast because the planner doesn’t account for TOAST fetches; the actual response is slow because of all the hidden I/O.
The fundamental fix is simple: don’t fetch data you don’t need. Replace SELECT * with explicit column lists that exclude large TOAST’d columns unless you actually need them. This single change can be the difference between a 5ms response and an 850ms response—a 170× improvement for zero application logic changes.
When you do need the large data, consider architectural changes: split the large columns into a separate table that’s joined only when needed, extract hot fields from JSON into regular columns, or use EXTERNAL storage for pre-compressed data to skip PostgreSQL’s compression overhead.
Key principles:
- **Never SELECT *** on tables with large columns—TOAST fetches are hidden but costly
- Use EXTERNAL storage for pre-compressed data to skip redundant compression
- Split tables when large columns are rarely accessed in the hot path
- Normalize hot fields out of JSON so you can query efficiently without TOAST
- Monitor TOAST table sizes—they reveal where your hidden I/O costs live
Your 50KB JSON column in a listing query is your problem. The query planner won’t tell you; only the end-to-end response time will.
Related Articles
- PostgreSQL HOT Updates - Storage optimization
- UUIDv4 vs ULID vs TSID - Index efficiency
Related posts
PostgreSQL HOT Updates + FILLFACTOR: How to Reduce Index Bloat by 60%
Vacuum runs successfully but disk keeps growing and cache hit ratio drops. I'll show how to quantify HOT-update eligibility using pgstattuple and optimize fillfactor.
GIN Index Pending List Overflow: Fast Writes, Slow Searches
Full-text search was fast, now it's slow. The cause: GIN index pending list grew huge during bulk inserts, and every search must now scan the unsorted pending entries.
PostgreSQL Autovacuum SLO Tuning: How to Configure Vacuum for 200M Rows and 5k UPSERT/s
Autovacuum is either ignored or cargo-cult tuned. I'll show how to turn it into an SLO-driven system with specific numbers, pg_stat metrics, and reproducible tests.
When Prepared Statements Make PostgreSQL 10× Slower: Generic Plan Trap
Same query, same params, but prod is slow and staging works fine. I'll show how to reproduce the generic plan problem with pgBouncer, Java/Go and how to fix it.
Cite this article
If you reference this post, please link to the original URL and credit the author.