Back to blog

The Soft Delete Trap: Why is_deleted Kills Your Database (And What To Do)

Soft deletes feel harmless until your indexes melt. “We never delete data, we just set deleted_at.” I heard this 8 years ago and it seemed like a reasonable approach. It was a small e-commerce project, maybe 10,000 orders. Fast-forward 3 years: 15 million orders, 9 million of them “deleted” (order cancelled, customer requested removal, fraud cleanup), and queries taking 10x longer than at launch. The production database was on its knees.

The insidious thing about soft delete is that it feels like a free feature. Instead of deleting rows, you just set a flag. You get an audit trail! You can undo deletions! Foreign keys don’t break! But what you’ve actually done is keep all your dead data in your live tables, forcing every query to filter through rows that should never have been there in the first place.

Soft delete is technical debt that compounds daily. Every day you don’t delete that cancelled order, it sits in your indexes, gets touched by your queries, takes up space in your buffer cache, and makes your VACUUM slower. The interest rate on this debt is proportional to your table size and query volume.

The worst part is that everyone does it. Every framework defaults to soft delete. Every “enterprise” pattern includes is_deleted. It’s so normalized that questioning it feels like heresy. But the math doesn’t lie: if 60% of your rows are “deleted,” 60% of your index scans are wasted work.

Tested on: PostgreSQL 14-16, tables from 1M to 50M rows. All benchmarks run on real data.

Why Soft Delete Seems Like a Good Idea

At the start of a project, soft delete has many advantages:

  • Audit trail - you can see what was deleted and when
  • Undo functionality - user changed their mind? Restore with one UPDATE
  • Referential integrity - FKs don’t break
  • Legal requirements - GDPR retention periods
-- Seemingly innocent column
ALTER TABLE orders ADD COLUMN deleted_at TIMESTAMP NULL;

-- "Deleting" = update
UPDATE orders SET deleted_at = NOW() WHERE id = 123;

-- Select = add WHERE
SELECT * FROM orders WHERE deleted_at IS NULL;

Where It Starts to Hurt

Problem 1: Indexes Contain Dead Data

Every index on the table contains deleted rows too. If you have 10M rows and 6M are “deleted”:

-- This index has 10M entries, not 4M
CREATE INDEX idx_orders_user_id ON orders(user_id);

-- Query must scan the entire index then filter
SELECT * FROM orders
WHERE user_id = 123 AND deleted_at IS NULL;

Solution? Partial index:

CREATE INDEX idx_orders_user_id_active ON orders(user_id)
WHERE deleted_at IS NULL;

But now you need EVERY index as partial. And what if you forget?

Problem 2: Unique Constraints Don’t Work

-- User has unique email
ALTER TABLE users ADD CONSTRAINT users_email_unique UNIQUE(email);

-- Delete user
UPDATE users SET deleted_at = NOW() WHERE email = '[email protected]';

-- New user wants the same email - FAIL!
INSERT INTO users (email) VALUES ('[email protected]');
-- ERROR: duplicate key value violates unique constraint

Solution? Partial unique index:

CREATE UNIQUE INDEX users_email_unique_active ON users(email)
WHERE deleted_at IS NULL;

But beware - now you can have 10 deleted users with the same email. Is that OK?

Problem 3: Foreign Keys Are Complicated

-- Order references user
ALTER TABLE orders ADD FOREIGN KEY (user_id) REFERENCES users(id);

-- What happens when you delete a user?
UPDATE users SET deleted_at = NOW() WHERE id = 123;

-- Orders still reference "deleted" user
-- Is this a bug or a feature?

Problem 4: Query Planner Struggles

PostgreSQL query planner estimates selectivity based on statistics. If you have 60% deleted rows:

EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 123 AND deleted_at IS NULL;

The planner may:

  • Underestimate row count (if statistics aren’t current)
  • Choose wrong join type
  • Ignore partial index

Benchmark: Soft Delete vs Partitioning

I created a test with 10M rows, 50% marked as deleted:

Setup

-- Table with soft delete
CREATE TABLE orders_soft (
    id SERIAL PRIMARY KEY,
    user_id INT,
    amount DECIMAL(10,2),
    created_at TIMESTAMP,
    deleted_at TIMESTAMP
);

-- Partitioned table
CREATE TABLE orders_partitioned (
    id SERIAL,
    user_id INT,
    amount DECIMAL(10,2),
    created_at TIMESTAMP,
    is_deleted BOOLEAN DEFAULT FALSE
) PARTITION BY LIST (is_deleted);

CREATE TABLE orders_active PARTITION OF orders_partitioned
    FOR VALUES IN (FALSE);
CREATE TABLE orders_deleted PARTITION OF orders_partitioned
    FOR VALUES IN (TRUE);

Results

QuerySoft DeletePartitionedDifference
SELECT active (user_id)245ms89ms2.7x faster
COUNT active1.2s0.4s3x faster
Index size847MB412MB51% smaller
VACUUM time45s12s3.7x faster

Solution: Partitioning or Archive Table

Option 1: Partitioning (PostgreSQL 10+)

-- New partitioned table
CREATE TABLE orders_new (
    id SERIAL,
    user_id INT NOT NULL,
    amount DECIMAL(10,2) NOT NULL,
    created_at TIMESTAMP DEFAULT NOW(),
    is_archived BOOLEAN DEFAULT FALSE,
    PRIMARY KEY (id, is_archived)
) PARTITION BY LIST (is_archived);

-- Active partition - all indexes here
CREATE TABLE orders_active PARTITION OF orders_new
    FOR VALUES IN (FALSE);

-- Archive partition - minimal indexes
CREATE TABLE orders_archived PARTITION OF orders_new
    FOR VALUES IN (TRUE);

-- Indexes only on active partition
CREATE INDEX idx_orders_active_user ON orders_active(user_id);
CREATE INDEX idx_orders_active_created ON orders_active(created_at);

Option 2: Separate Archive Table

-- Main table - only active records
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    user_id INT NOT NULL,
    amount DECIMAL(10,2) NOT NULL,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Archive table - same structure + metadata
CREATE TABLE orders_archive (
    id INT PRIMARY KEY,
    user_id INT NOT NULL,
    amount DECIMAL(10,2) NOT NULL,
    created_at TIMESTAMP,
    archived_at TIMESTAMP DEFAULT NOW(),
    archived_by VARCHAR(100)
);

-- Deleting = moving
CREATE OR REPLACE FUNCTION archive_order(order_id INT, archived_by VARCHAR)
RETURNS VOID AS $$
BEGIN
    INSERT INTO orders_archive (id, user_id, amount, created_at, archived_by)
    SELECT id, user_id, amount, created_at, archived_by
    FROM orders WHERE id = order_id;

    DELETE FROM orders WHERE id = order_id;
END;
$$ LANGUAGE plpgsql;

Migrating from is_deleted Without Downtime

Phase 1: Add New Structure

-- Create archive table
CREATE TABLE orders_archive (LIKE orders INCLUDING ALL);
ALTER TABLE orders_archive ADD COLUMN archived_at TIMESTAMP DEFAULT NOW();

-- Trigger for new "deletes"
CREATE OR REPLACE FUNCTION archive_on_soft_delete()
RETURNS TRIGGER AS $$
BEGIN
    IF NEW.deleted_at IS NOT NULL AND OLD.deleted_at IS NULL THEN
        INSERT INTO orders_archive
        SELECT *, NOW() FROM orders WHERE id = NEW.id;
    END IF;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER tr_archive_on_delete
    AFTER UPDATE ON orders
    FOR EACH ROW EXECUTE FUNCTION archive_on_soft_delete();

Phase 2: Move Existing (Batched)

BATCH_SIZE = 10000

def migrate_deleted_orders():
    while True:
        with transaction():
            # Find batch to migrate
            result = db.execute("""
                WITH to_archive AS (
                    SELECT id FROM orders
                    WHERE deleted_at IS NOT NULL
                    LIMIT %s
                    FOR UPDATE SKIP LOCKED
                )
                INSERT INTO orders_archive
                SELECT o.*, NOW()
                FROM orders o
                JOIN to_archive t ON o.id = t.id
                RETURNING id
            """, [BATCH_SIZE])

            if result.rowcount == 0:
                break

            # Delete from main table
            db.execute("""
                DELETE FROM orders
                WHERE id IN (SELECT id FROM to_archive)
            """)

        time.sleep(0.1)  # Let DB rest

Phase 3: Cleanup and Remove Column

-- When all deleted rows are moved
ALTER TABLE orders DROP COLUMN deleted_at;

-- Drop trigger
DROP TRIGGER tr_archive_on_delete ON orders;
DROP FUNCTION archive_on_soft_delete();

When Soft Delete Is OK

Soft delete isn’t always bad. It’s OK when:

  1. Few deleted - less than 10% of rows
  2. Short retention - hard delete after 30 days
  3. Small tables - thousands, not millions of rows
  4. You need undo - but only short-term

Checklist Before Using Soft Delete

## Soft Delete Checklist

### Before Implementation
- [ ] What's the expected ratio of deleted vs active rows?
- [ ] How long will deleted rows exist?
- [ ] Do I need UNIQUE constraints on the table?
- [ ] Do I have foreign keys from other tables?

### If Yes to Soft Delete
- [ ] All indexes are PARTIAL (WHERE deleted_at IS NULL)
- [ ] UNIQUE constraints are also partial
- [ ] I have a background job for hard deleting old records
- [ ] Monitoring for "dead" data size

### Alternatives
- [ ] Have I considered partitioning?
- [ ] Have I considered a separate archive table?
- [ ] Have I considered event sourcing for audit trail?

Conclusion

Soft delete is like technical debt you pay every day. It’s cheap at the start, but interest grows exponentially with table size.

My recommendation: For new projects, use partitioning or archive table from the start. For existing projects, plan the migration sooner rather than later - the more data, the more painful.

FAQ

What if I need an audit trail?

Use event sourcing or a separate audit log table. Audit and active data don’t need to be in the same table.

Won’t VACUUM help?

VACUUM frees space but doesn’t change the fact that indexes contain deleted rows too. Partial indexes are the only solution.

What about ORMs that automatically add deleted_at?

Most ORMs (Django, Rails, Hibernate) support custom delete strategies too. Check documentation for “archive” or “hard delete” modes.


Related posts

Cite this article

If you reference this post, please link to the original URL and credit the author.

Michal Drozd. "The Soft Delete Trap: Why is_deleted Kills Your Database (And What To Do)". https://www.michal-drozd.com/en/blog/soft-delete-trap/ (Published March 23, 2025).