Batching requests we don't control
How pushing dataloader up to the HTTP controller — out of the resolver layer it usually lives in — cut downstream database load by 83% on a read-heavy endpoint.
When you own both ends of a client-server boundary, batch endpoints work like they should — clients accumulate work and send it as one fat request. The interesting case is when you own the server but not the client, and the client is sending you one tiny request per entity, hundreds of times a second, in a pattern you can’t change.
That was the shape of a recent problem on a service I work on. A read-heavy endpoint that already supported batching — multiple user IDs and entity IDs in one POST — was getting hammered by a third-party email platform that fires one request per recipient when sending a campaign. We couldn’t change how the vendor batched (or didn’t), and the path from the vendor into our service ran through another system we weren’t in a position to change either. The endpoint that was designed for bulk traffic was instead processing 2.2 million single-entity requests in a 24-hour period, peaking around 170 requests per second.
Every one of those requests carried the full fixed cost of an HTTP round-trip — request parsing, controller dispatch, several SQL queries across two databases, downstream service calls, response serialisation. The batch path next to it was sitting unused.
A quick terminology aside
Two terms get used interchangeably here and they mean different things.
Request coalescing is collapsing multiple identical requests into one. Often handled at a layer above the application — an API gateway noticing ten near-simultaneous calls to /user/123 and only forwarding one. Caches do a degenerate version of this.
Request batching is taking distinct requests for different entities and combining them into one bulk call. /user/1, /user/2, /user/3 becomes /user?ids=1,2,3.
What I ended up implementing is closer to batching, with a deduplication step inside the batch that gives it a coalescing flavour. More on that shortly.
The shape of the fix
Three constraints worth noting before the solution:
- The endpoint already supported a bulk shape —
POST /lookupstaking arrays of user IDs and company IDs and returning the cartesian product. - The inbound traffic couldn’t be reshaped at the source.
- The fixed per-request cost was real and measurable: independent SQL queries, separate service calls, separate downstream HTTP.
The fix is to hold incoming requests for a short window at the HTTP controller, group whatever shows up in that window, deduplicate the IDs across the group, and dispatch one bulk call through the existing batch path. Each original request gets sliced its share of the result and returned to its caller as if nothing happened.
In effect: turn the unwanted “one entity per request” pattern back into the bulk shape the endpoint was designed for, before it reaches anything expensive.
The settings I landed on:
- Window length: 50ms
- Max batch size: 25 requests
- Anything that arrives while a batch is in flight gets queued into the next one
At 170 requests per second, 50ms gives you ~8.5 requests per window on average — well under the cap, which means the limit only ever kicks in during spikes.
Implementation
This is where things get nice. The mechanics — collect requests within a window, cap batch size, dispatch in bulk, return individual results to individual callers — are exactly what dataloader does. It’s the same library that solves the GraphQL N+1 problem at the resolver layer; I wrote about that a couple of years back. It’s not really a GraphQL library though. It’s a generic request-batching primitive that happens to have shipped with GraphQL.
That genericness is the bit worth flagging. There’s no rule that says dataloader has to live next to a resolver. Pushed up to the HTTP controller, the same primitive batches inbound HTTP requests instead of outbound database calls. The shape is symmetric.
A stripped-down sketch:
@Controller()
class LookupController {
private readonly loader: DataLoader<Request, Response>;
constructor() {
this.loader = new DataLoader(
async (requests) => {
return batchHandleRequests(requests);
},
{
cache: false,
maxBatchSize: 25,
batchScheduleFn: (callback) => setTimeout(callback, 50),
},
);
}
@Post('lookups')
public lookup(@Body() request: Request): Promise<Response> {
return this.loader.load(request);
}
}
batchHandleRequests is where the real work goes: union the user IDs and company IDs across every request in the batch, run the bulk path once with the deduplicated lists, then map the response back to each original request based on what it asked for. That mapping is where the coalescing-flavoured part lives — if ten requests in the same batch all ask for the same user, the underlying lookup runs once and the result is fanned back out to all ten.
The dataloader part of the work is trivial. The non-trivial part is making sure the rest of the code path is genuinely batch-friendly: SQL queries that use WHERE IN rather than firing one per ID, downstream HTTP calls that themselves batch where possible, work that parallelises where it can’t.
On that last point: it pays to write code in a batch-friendly shape from the start even when there’s no immediate consumer for it. Retrofitting a single-entity service path into a bulk one is consistently more work than writing the bulk shape upfront and calling it with arrays of length one.
Results
During peak periods, the total volume of downstream work from the controller dropped by 83%. Broken down:
- Peak request rate against the upstream database: 101/s → 34.1/s, a 66% drop.
- Peak request rate against the service’s own database: 296/s → 34/s, an 86% drop.
- Cumulative database requests across a one-hour peak window: 1,429,200 → 245,160, an 83% drop.
The other thing worth noting is what this does for burst tolerance. When per-request fixed cost dominates, an application’s capacity is roughly bounded by how many of those fixed costs it can pay per second. Batching collapses many of those costs into one, so spikes that would previously have started backing up the queue or saturating connection pools instead get absorbed into the next 50ms window.
The headroom is bounded by the configured batch size and window. At 25 requests per batch and 50ms per window, the upper bound is 1000ms ÷ 50ms × 25 = 500 batched requests per second — roughly 3x the current peak. Both knobs are tunable if it ever stops being enough.
Caveats
Two worth flagging.
The biggest is added latency on the way in. Every request now waits up to the window length before being dispatched. The worst case added latency is the full 50ms — for a request unlucky enough to arrive right at the start of a fresh window. The average is half that, around 25ms, assuming roughly uniform arrivals. Interestingly the impact on tail latency (P90, P95, P99) was negligible in relative terms, because those slow requests were already taking long enough that an extra 25ms barely registers as a percentage. The visible effect was a clean step up on the baseline (P50) only.
The other is complexity. There’s a new failure mode where a poison-pill request can break a whole batch unless the batch handler is careful to isolate errors per-input. Worth knowing about and worth testing for — I wrote an end-to-end test that fires concurrent requests and verifies the responses get sliced correctly back to their callers.
Where this fits
The pattern works wherever you’ve got:
- A read endpoint with a fixed per-request cost that dwarfs the variable cost.
- A bulk-capable code path already underneath it (or worth building).
- An inbound traffic pattern you can’t reshape at the source.
- Some tolerance for ~25ms of added baseline latency.
The third condition is the one that makes it worth doing. If you can fix the client, fix the client. When you can’t — when the client is a vendor, a legacy system, or simply someone else’s problem — pushing dataloader up to the controller boundary is a small change for a large effect.