Back to blog

Adaptive Concurrency Limits: Stop Guessing Thread Pool Sizes

|
| concurrency, performance, java, go, netflix, rate-limiting, adaptive

I learned the hard way that concurrency limits are not a knob you can set once and forget. “Set thread pool to 200.” Why 200? “That’s what we’ve always used.” Two weeks later: latency spikes because 200 is too high for this dependency.

Netflix’s adaptive concurrency limits dynamically adjust based on actual system behavior. No more guessing.

Tested on: Java 21, concurrency-limits library 0.4, Spring Boot 3.2

The Problem with Fixed Limits

Static Configuration

# "Standard" configuration
server:
  tomcat:
    threads:
      max: 200

spring:
  task:
    execution:
      pool:
        max-size: 100

Why Fixed Limits Fail

Scenario 1: Limit too high
- Dependency slows down (100ms → 2s)
- 200 threads all blocked waiting
- Memory pressure, GC pauses
- Cascading failure

Scenario 2: Limit too low
- Dependency is fast (5ms)
- Only 50 threads, could handle 4x more
- Wasted capacity, unnecessary queueing

The right limit depends on:
- Current latency of dependencies
- Available CPU
- Network conditions
- Time of day

Adaptive Concurrency: Little’s Law Again

The Algorithm

Little's Law: L = λ × W

L = number of concurrent requests
λ = request rate (throughput)
W = average latency

If we know optimal latency (W_optimal), we can calculate optimal L:
L_optimal = λ × W_optimal

As latency increases (W ↑), we should reduce concurrency (L ↓)
As latency decreases (W ↓), we can increase concurrency (L ↑)

AIMD (Additive Increase, Multiplicative Decrease)

Algorithm:
1. Start with low limit (e.g., 10)
2. If requests succeed with good latency:
   → limit = limit + 1 (additive increase)
3. If latency degrades or errors occur:
   → limit = limit × 0.9 (multiplicative decrease)
4. Repeat continuously

Result: Limit automatically finds optimal value

Implementation

Netflix concurrency-limits Library

// pom.xml
// <dependency>
//     <groupId>com.netflix.concurrency-limits</groupId>
//     <artifactId>concurrency-limits-core</artifactId>
//     <version>0.4.1</version>
// </dependency>

import com.netflix.concurrency.limits.Limiter;
import com.netflix.concurrency.limits.limit.AIMDLimit;
import com.netflix.concurrency.limits.limiter.SimpleLimiter;

public class AdaptiveLimiter {

    private final Limiter<Void> limiter;

    public AdaptiveLimiter() {
        // AIMD limit with min 10, max 200
        var limit = AIMDLimit.newBuilder()
            .initialLimit(20)
            .minLimit(10)
            .maxLimit(200)
            .backoffRatio(0.9)  // Decrease by 10% on failure
            .build();

        this.limiter = SimpleLimiter.newBuilder()
            .limit(limit)
            .build();
    }

    public <T> T execute(Supplier<T> action) {
        Optional<Limiter.Listener> listener = limiter.acquire(null);

        if (listener.isEmpty()) {
            throw new RejectedExecutionException("Limit reached");
        }

        try {
            T result = action.get();
            listener.get().onSuccess();  // Limit may increase
            return result;
        } catch (Exception e) {
            listener.get().onDropped();  // Limit decreases
            throw e;
        }
    }

    public int currentLimit() {
        return ((SimpleLimiter<?>) limiter).getLimit();
    }
}

Gradient-Based Limit

// More sophisticated: adjusts based on latency gradient
import com.netflix.concurrency.limits.limit.Gradient2Limit;

var limit = Gradient2Limit.newBuilder()
    .initialLimit(20)
    .minLimit(10)
    .maxLimit(200)
    .smoothing(0.2)        // Smooth latency measurements
    .longWindow(600)       // Long-term baseline (samples)
    .rttTolerance(1.5)     // Allow 50% latency increase before limiting
    .build();

// Gradient2 tracks:
// - Long-term average latency (baseline)
// - Short-term latency (current)
// - Adjusts limit based on gradient (current/baseline)

Spring Boot Integration

// ConcurrencyConfig.java
@Configuration
public class ConcurrencyConfig {

    @Bean
    public Limiter<String> httpClientLimiter() {
        var limit = Gradient2Limit.newBuilder()
            .initialLimit(20)
            .minLimit(5)
            .maxLimit(100)
            .build();

        return SimpleLimiter.<String>newBuilder()
            .named("http-client")
            .limit(limit)
            .metricRegistry(new SpectatorMetricRegistry())
            .build();
    }
}

// HttpClientWrapper.java
@Component
public class AdaptiveHttpClient {

    private final RestTemplate restTemplate;
    private final Limiter<String> limiter;

    public <T> T get(String url, Class<T> responseType) {
        Optional<Limiter.Listener> listener = limiter.acquire(url);

        if (listener.isEmpty()) {
            throw new ServiceUnavailableException("Service overloaded");
        }

        long start = System.nanoTime();
        try {
            T result = restTemplate.getForObject(url, responseType);
            long rtt = System.nanoTime() - start;
            listener.get().onSuccess();
            return result;
        } catch (Exception e) {
            if (isServerError(e)) {
                listener.get().onDropped();
            } else {
                listener.get().onIgnore();  // Client error, don't adjust limit
            }
            throw e;
        }
    }
}

Go Implementation

// adaptive_limiter.go
package limiter

import (
    "sync"
    "sync/atomic"
    "time"
)

type AdaptiveLimiter struct {
    limit       int64
    inFlight    int64
    minLimit    int64
    maxLimit    int64
    backoff     float64
    mu          sync.Mutex
    latencies   []time.Duration
    windowSize  int
}

func NewAdaptiveLimiter(initial, min, max int64) *AdaptiveLimiter {
    return &AdaptiveLimiter{
        limit:      initial,
        minLimit:   min,
        maxLimit:   max,
        backoff:    0.9,
        windowSize: 100,
        latencies:  make([]time.Duration, 0, 100),
    }
}

func (l *AdaptiveLimiter) Acquire() bool {
    for {
        current := atomic.LoadInt64(&l.inFlight)
        limit := atomic.LoadInt64(&l.limit)

        if current >= limit {
            return false
        }

        if atomic.CompareAndSwapInt64(&l.inFlight, current, current+1) {
            return true
        }
    }
}

func (l *AdaptiveLimiter) Release(latency time.Duration, success bool) {
    atomic.AddInt64(&l.inFlight, -1)

    l.mu.Lock()
    defer l.mu.Unlock()

    l.latencies = append(l.latencies, latency)
    if len(l.latencies) > l.windowSize {
        l.latencies = l.latencies[1:]
    }

    if success && l.isLatencyGood() {
        // Additive increase
        newLimit := atomic.LoadInt64(&l.limit) + 1
        if newLimit <= l.maxLimit {
            atomic.StoreInt64(&l.limit, newLimit)
        }
    } else if !success {
        // Multiplicative decrease
        newLimit := int64(float64(atomic.LoadInt64(&l.limit)) * l.backoff)
        if newLimit >= l.minLimit {
            atomic.StoreInt64(&l.limit, newLimit)
        }
    }
}

func (l *AdaptiveLimiter) isLatencyGood() bool {
    if len(l.latencies) < 10 {
        return true
    }
    // Compare current vs baseline
    // Implementation: compare p99 to baseline p99
    return true
}

Benchmarks

Test Setup

// Simulated service with variable latency
@GetMapping("/api")
public Response api() {
    // Simulate latency based on load
    int activeRequests = activeCounter.get();
    int baseLatency = 10;
    int latencyPerRequest = activeRequests > 50 ? (activeRequests - 50) * 2 : 0;

    Thread.sleep(baseLatency + latencyPerRequest);
    return new Response("ok");
}

Results: Fixed vs Adaptive

Load test: 500 RPS for 10 minutes, dependency degrades at 5 min mark

Fixed limit = 200:
  Before degradation (0-5 min):
    p50: 12ms, p99: 45ms, errors: 0%
  After degradation (5-10 min):
    p50: 850ms, p99: 4200ms, errors: 35%
    Thread pool exhausted, cascading failure

Adaptive limit (10-200):
  Before degradation (0-5 min):
    p50: 12ms, p99: 42ms, errors: 0%
    Limit stabilized at: 180
  After degradation (5-10 min):
    p50: 85ms, p99: 280ms, errors: 5%
    Limit dropped to: 25
    System remained stable!

Monitoring

Prometheus Metrics

// Custom metrics
@Bean
MeterBinder adaptiveLimiterMetrics(Limiter<?> limiter) {
    return registry -> {
        Gauge.builder("adaptive_limiter.limit", limiter,
            l -> ((SimpleLimiter<?>) l).getLimit())
            .register(registry);

        Gauge.builder("adaptive_limiter.inflight", limiter,
            l -> ((SimpleLimiter<?>) l).getInflight())
            .register(registry);
    };
}

Grafana Dashboard

# Current limit
adaptive_limiter_limit

# Inflight requests
adaptive_limiter_inflight

# Utilization
adaptive_limiter_inflight / adaptive_limiter_limit

# Limit changes over time (for tuning)
changes(adaptive_limiter_limit[5m])

When to Use Adaptive Limits

Good Fit

✅ HTTP client calls to dependencies
✅ Database connection pools
✅ Message queue consumers
✅ Any external service call

Why: External systems have variable latency

Not a Good Fit

❌ Request rate limiting (use token bucket)
❌ Memory-bound operations (use fixed pool)
❌ CPU-bound operations (use CPU count based)

Why: These have predictable, fixed capacity

Checklist

## Adaptive Concurrency Setup

### Implementation
- [ ] Add concurrency-limits library
- [ ] Choose algorithm (AIMD vs Gradient2)
- [ ] Set min/max bounds appropriately
- [ ] Wrap external calls with limiter

### Configuration
- [ ] Initial limit: start low (10-20)
- [ ] Min limit: enough for health checks (5-10)
- [ ] Max limit: reasonable upper bound (100-500)

### Monitoring
- [ ] Track current limit over time
- [ ] Track inflight count
- [ ] Alert on limit hitting min (dependency issues)

### Testing
- [ ] Load test with dependency degradation
- [ ] Verify limit decreases under stress
- [ ] Verify limit recovers after recovery

Conclusion

Stop guessing concurrency limits:

  1. Fixed limits fail when conditions change
  2. Adaptive limits adjust based on actual latency
  3. AIMD algorithm is simple and effective
  4. Gradient2 is more sophisticated for complex scenarios

Let the algorithm find the optimal limit for you.


Related posts

Cite this article

If you reference this post, please link to the original URL and credit the author.

Michal Drozd. "Adaptive Concurrency Limits: Stop Guessing Thread Pool Sizes". https://www.michal-drozd.com/en/blog/adaptive-concurrency-limits/ (Published April 11, 2025).