Transactional Outbox: Solving the Dual Write Problem Without 2PC
The first outbox I built felt like overkill, until it saved a deploy. “Save the order and send an email.” One sentence that caused me three months of debugging. Customers complained they didn’t receive order confirmation, but the order existed in the system. Or the opposite - they got the email, but the order was nowhere to be found.
The problem? Dual Write - two independent systems (database + email/message queue) without a shared transaction. The solution? Transactional Outbox pattern.
Tested on: PostgreSQL 14+, Node.js 20+, TypeScript 5.x. Production-proven on systems handling 50k+ messages daily.
What is the Dual Write Problem
Look at this code:
async function createOrder(order: Order): Promise<void> {
// Step 1: Save to DB
await db.orders.insert(order);
// Step 2: Send event to Kafka/RabbitMQ
await messageQueue.publish('order.created', order);
// Step 3: Send notification
await emailService.send(order.customerEmail, 'Order confirmed');
}
What can go wrong?
- DB OK, Kafka FAIL - order exists, but no other system knows about it
- DB OK, Kafka OK, Email FAIL - customer doesn’t receive confirmation
- DB timeout after Kafka - message sent, but transaction rolled back
Race Condition in Practice
Even worse - even when everything succeeds:
// Request 1 (slow network)
await db.orders.insert(order); // t=0ms
// ... network delay ...
await kafka.publish(event); // t=500ms
// Request 2 (fast)
await db.orders.update(order); // t=100ms
await kafka.publish(updateEvent); // t=150ms
// Kafka receives: updateEvent (t=150ms), createEvent (t=500ms)
// Consumer sees UPDATE before CREATE!
Why 2PC is Not the Answer
Two-Phase Commit (2PC) theoretically solves distributed transactions:
Coordinator: "Prepare?"
DB: "Ready"
Kafka: "Ready"
Coordinator: "Commit!"
DB: "Committed"
Kafka: "Committed"
Problems:
- Kafka doesn’t support 2PC - most message brokers don’t
- Performance - blocking on the slowest system
- Availability - if one system goes down, everything stops
- Complexity - coordinator failures, timeouts, recovery
Transactional Outbox Pattern
Instead of writing to two systems, write ONLY to the database:
┌─────────────────────────────────────┐
│ PostgreSQL │
│ ┌─────────────┐ ┌──────────────┐ │
│ │ orders │ │ outbox │ │
│ │ │ │ │ │
│ │ id, data... │ │ id, payload │ │
│ │ │ │ processed_at │ │
│ └─────────────┘ └──────────────┘ │
└─────────────────────────────────────┘
│
▼ (async worker)
┌─────────┐ ┌─────────┐
│ Kafka │ │ Email │
└─────────┘ └─────────┘
Principle:
- In a single DB transaction, save order + outbox event
- Separate worker reads outbox and sends to external systems
- Worker marks event as processed
TypeScript Implementation
Outbox Table
CREATE TABLE outbox (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
aggregate_type VARCHAR(100) NOT NULL,
aggregate_id VARCHAR(100) NOT NULL,
event_type VARCHAR(100) NOT NULL,
payload JSONB NOT NULL,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
processed_at TIMESTAMP WITH TIME ZONE,
retry_count INT DEFAULT 0,
last_error TEXT
);
-- Index for worker
CREATE INDEX idx_outbox_unprocessed
ON outbox(created_at)
WHERE processed_at IS NULL;
-- Index for cleanup
CREATE INDEX idx_outbox_processed
ON outbox(processed_at)
WHERE processed_at IS NOT NULL;
Business Logic with Outbox
interface OutboxEvent {
aggregateType: string;
aggregateId: string;
eventType: string;
payload: unknown;
}
class OrderService {
constructor(
private db: Pool,
private outbox: OutboxRepository
) {}
async createOrder(order: CreateOrderInput): Promise<Order> {
return await this.db.transaction(async (tx) => {
// 1. Save order
const savedOrder = await tx.orders.insert(order);
// 2. Save event to outbox - in the same transaction!
await this.outbox.insertInTransaction(tx, {
aggregateType: 'Order',
aggregateId: savedOrder.id,
eventType: 'order.created',
payload: {
orderId: savedOrder.id,
customerId: savedOrder.customerId,
items: savedOrder.items,
total: savedOrder.total,
}
});
// 3. Another event for email
await this.outbox.insertInTransaction(tx, {
aggregateType: 'Order',
aggregateId: savedOrder.id,
eventType: 'notification.send',
payload: {
type: 'email',
to: order.customerEmail,
template: 'order-confirmation',
data: { orderId: savedOrder.id }
}
});
return savedOrder;
});
}
}
Outbox Worker with LISTEN/NOTIFY
PostgreSQL LISTEN/NOTIFY enables real-time notifications without polling:
-- Trigger for new outbox events
CREATE OR REPLACE FUNCTION notify_outbox_insert()
RETURNS TRIGGER AS $$
BEGIN
PERFORM pg_notify('outbox_events', NEW.id::text);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER tr_outbox_notify
AFTER INSERT ON outbox
FOR EACH ROW
EXECUTE FUNCTION notify_outbox_insert();
Worker in TypeScript:
import { Pool, Client } from 'pg';
class OutboxWorker {
private listenerClient: Client;
private processingPool: Pool;
private isRunning = false;
constructor(
private handlers: Map<string, EventHandler>,
connectionString: string
) {
this.listenerClient = new Client(connectionString);
this.processingPool = new Pool({
connectionString,
max: 5 // Parallel processing
});
}
async start(): Promise<void> {
this.isRunning = true;
await this.listenerClient.connect();
await this.listenerClient.query('LISTEN outbox_events');
this.listenerClient.on('notification', async (msg) => {
if (msg.channel === 'outbox_events' && msg.payload) {
await this.processEvent(msg.payload);
}
});
// Process existing unprocessed events on startup
await this.processBacklog();
console.log('Outbox worker started');
}
private async processEvent(eventId: string): Promise<void> {
const client = await this.processingPool.connect();
try {
// Lock row for exclusive processing
const result = await client.query(`
SELECT * FROM outbox
WHERE id = $1
AND processed_at IS NULL
FOR UPDATE SKIP LOCKED
`, [eventId]);
if (result.rows.length === 0) {
return; // Already processed by another worker
}
const event = result.rows[0];
const handler = this.handlers.get(event.event_type);
if (!handler) {
console.error(`No handler for event type: ${event.event_type}`);
return;
}
try {
await handler.handle(event.payload);
// Mark as processed
await client.query(`
UPDATE outbox
SET processed_at = NOW()
WHERE id = $1
`, [eventId]);
} catch (error) {
// Retry logic
await client.query(`
UPDATE outbox
SET retry_count = retry_count + 1,
last_error = $2
WHERE id = $1
`, [eventId, error.message]);
// Dead letter after X attempts
if (event.retry_count >= 5) {
await this.moveToDeadLetter(client, event);
}
}
} finally {
client.release();
}
}
private async processBacklog(): Promise<void> {
const result = await this.processingPool.query(`
SELECT id FROM outbox
WHERE processed_at IS NULL
ORDER BY created_at
LIMIT 100
`);
for (const row of result.rows) {
await this.processEvent(row.id);
}
}
async stop(): Promise<void> {
this.isRunning = false;
await this.listenerClient.query('UNLISTEN outbox_events');
await this.listenerClient.end();
await this.processingPool.end();
}
}
Event Handlers
interface EventHandler {
handle(payload: unknown): Promise<void>;
}
class KafkaEventHandler implements EventHandler {
constructor(private kafka: KafkaProducer) {}
async handle(payload: OrderCreatedPayload): Promise<void> {
await this.kafka.send({
topic: 'orders',
messages: [{ value: JSON.stringify(payload) }]
});
}
}
class EmailEventHandler implements EventHandler {
constructor(private emailService: EmailService) {}
async handle(payload: NotificationPayload): Promise<void> {
await this.emailService.sendTemplate(
payload.to,
payload.template,
payload.data
);
}
}
// Register handlers
const handlers = new Map<string, EventHandler>();
handlers.set('order.created', new KafkaEventHandler(kafka));
handlers.set('notification.send', new EmailEventHandler(emailService));
const worker = new OutboxWorker(handlers, DATABASE_URL);
await worker.start();
Guaranteed Event Ordering
Outbox pattern guarantees ordering for a single aggregate:
// In transaction - ordering guaranteed
await this.outbox.insert(tx, { eventType: 'order.created', ... });
await this.outbox.insert(tx, { eventType: 'order.confirmed', ... });
await this.outbox.insert(tx, { eventType: 'order.shipped', ... });
Worker processes in created_at order:
SELECT * FROM outbox
WHERE aggregate_id = $1
AND processed_at IS NULL
ORDER BY created_at
FOR UPDATE SKIP LOCKED
Idempotency and At-Least-Once Delivery
Outbox guarantees at-least-once delivery (event will be delivered at least once). Consumers must be idempotent:
class OrderCreatedConsumer {
async handle(event: OrderCreatedEvent): Promise<void> {
// Idempotency check
const existing = await this.db.processedEvents.findOne({
eventId: event.id
});
if (existing) {
console.log(`Event ${event.id} already processed, skipping`);
return;
}
// Process event
await this.db.transaction(async (tx) => {
await this.processOrder(tx, event);
// Mark as processed
await tx.processedEvents.insert({ eventId: event.id });
});
}
}
Production Checklist
## Outbox Pattern Checklist
### Database
- [ ] Outbox table with worker indexes
- [ ] LISTEN/NOTIFY trigger
- [ ] Monitoring outbox table size
- [ ] Cleanup job for old processed events
### Worker
- [ ] Graceful shutdown (completes in-progress work)
- [ ] Health check endpoint
- [ ] Retry with exponential backoff
- [ ] Dead letter queue for failed events
- [ ] Metrics: processing time, queue depth, error rate
### Monitoring
- [ ] Alert: outbox queue > N unprocessed
- [ ] Alert: event processing time > X ms
- [ ] Alert: retry_count > threshold
- [ ] Dashboard: throughput, latency, errors
### High Availability
- [ ] Multiple worker instances (SKIP LOCKED)
- [ ] Worker restart policy (K8s, systemd)
- [ ] Database connection pool sizing
When Not to Use Outbox
- You don’t need guaranteed delivery - fire-and-forget is OK
- Single service - you don’t need distributed transactions
- You have an event store - Event Sourcing already solves this
- Real-time requirements < 100ms - LISTEN/NOTIFY has latency
Alternatives
Change Data Capture (CDC)
Debezium reads WAL log and generates events:
PostgreSQL WAL → Debezium → Kafka
Pros: No code changes, no outbox table Cons: Additional infrastructure, events are DB-level (not domain events)
Transaction Log Tailing
Similar to CDC, but custom implementation.
Conclusion
Transactional Outbox is a battle-tested pattern for reliable messaging. Key principles:
- Single write - everything to database in one transaction
- Async processing - worker reads and sends
- Idempotent consumers - at-least-once delivery requires deduplication
- Monitoring - queue depth is your best problem indicator
For Node.js projects, I recommend PostgreSQL with LISTEN/NOTIFY - minimal latency without polling.
FAQ
What if the worker crashes during processing?
Event remains processed_at = NULL, worker will process it again after restart. That’s why consumers must be idempotent.
How to scale workers?
FOR UPDATE SKIP LOCKED enables parallel processing. Each worker locks and processes different events. Watch out for ordering - if you need strict ordering, use one worker per aggregate.
How many events to keep in outbox?
Delete processed events after 7-30 days (depending on debugging needs). Unprocessed should not exceed hundreds - if there are thousands, you have a problem.
Why not Kafka Transactions?
Kafka Transactions solve exactly-once within Kafka, not between DB and Kafka. Outbox pattern solves DB → anywhere.
Related Articles
- Zero-Downtime PostgreSQL Migrations - How to safely migrate the outbox table
- The Soft Delete Trap - Why you should archive old outbox events
Related posts
The Soft Delete Trap: Why is_deleted Kills Your Database (And What To Do)
A practical analysis of why soft delete destroys database performance over time. Benchmarks, partitioning solution, and migration checklist.
API Idempotency: Designing Endpoints Resistant to Retries
Complete guide to implementing idempotent APIs. From Idempotency-Key through Redis locking to request processing state diagram.
Double Charges From Idempotency Keys: The Replica Lag Trap
Perfect idempotency logic, but customers still get charged twice. The cause: checking idempotency keys against a read replica that's seconds behind the primary during traffic spikes.
Architectural Linting: Automated Protection Against Spaghetti Code
How to enforce architectural rules in CI/CD. Dependency Cruiser for JS/TS, ArchUnit for Java, and practical configuration examples.
Cite this article
If you reference this post, please link to the original URL and credit the author.