martyw.dev

Write batch-shaped functions by default

Why I write repository methods that take arrays and return Maps, even when the only caller — today — is asking for one thing.

A row of identical glowing orange pyramidal shapes lined up across a grid of darker cubes.

When I write a repository method, the first version takes an array and returns a Map keyed by whatever the array contained. Even when the only caller — today — is asking for exactly one thing.

public getCommentCounts(articleIds: string[]): Promise<Map<string, number>>

Not getCommentCount(articleId: string): Promise<number>. The single-key version, if I bother writing it at all, is a three-line wrapper over the array version.

The rest of this post is why.

The shape

Two things to notice about the signature above.

The input is an array. That means the SQL query is WHERE article_id = ANY($1) (or WHERE article_id IN (...)) instead of WHERE article_id = $1. One round-trip to the database, regardless of how many keys the caller has.

The output is a Map, not a flat array. This matters more than people think. A flat array forces every caller to write a lookup loop — find the row whose ID matches the input ID, handle the case where it’s missing, decide what to return if nothing comes back. A Map<inputKey, result> puts that work in the function that already knows the shape, and gives callers an O(1) lookup with a clean missing-key story:

const counts = await repo.getCommentCounts(['a', 'b', 'c']);
const countForA = counts.get('a') ?? 0;

The single-key sugar, if it’s worth having:

public async getCommentCount(articleId: string): Promise<number> {
  const counts = await this.getCommentCounts([articleId]);
  return counts.get(articleId) ?? 0;
}

The batch shape is the real implementation. The scalar shape exists for ergonomics where the caller genuinely only has one key.

Why default to it

The pushback I get is some variant of “this is premature optimisation for a method that’s only ever called with one ID.”

Two things going the other way.

First, the overhead is negligible. Writing the array-shaped version takes maybe ten extra lines compared to the scalar version. You allocate a Map, you build the WHERE clause for N keys instead of one, you populate the map from the result rows. Most of that work is the same work either way; the difference is one extra parameter going through your query builder.

Second — and this is the one that actually matters — retrofitting a scalar function into a batch one is consistently more painful than writing the batch one upfront. Not because the function itself is hard to rewrite, but because by the time you need the batch version, you have N callers depending on the scalar shape, and “I now need to call this 500 times” is the kind of problem that arrives during an incident rather than during planning.

A batch-shaped method gives every future caller — including the GraphQL field resolver someone will write next year — somewhere to land. A scalar one is a wall.

A real example

You write getArticle(id) because you need to fetch one article on a page. Six months later, someone wires up a feed that needs the titles of fifty articles. The path of least resistance is Promise.all(ids.map(id => getArticle(id))), and now there are fifty concurrent queries pulling fifty single rows from a table that would happily have served them in one.

You might catch that in code review. Often it ships. Often it ships and works fine for another six months until traffic doubles and the database starts complaining, and then someone has to go fix not just this caller but every caller that learned the same trick.

The batch-by-default version makes the right thing the easy thing. getArticles(ids) returning a Map<id, Article> is exactly the same shape whether the caller has one ID or fifty, and there’s no second function to maintain in parallel.

When to break the rule

A few cases where I don’t bother:

  • Mutations against a single row that is inherently scalar — e.g. markArticlePublished(id). There’s no plural version of “publish this specific article” that makes sense for the caller’s domain, and the batch wrapper would just be ceremony. (If a real bulk-publish use case shows up later, that’s a different method.)
  • One-shot operations — a migration script, a CLI command, anything where there’s exactly one call site and it will never grow another.
  • When the underlying datastore really is per-key fetch — though even there, most stores have a bulk variant (MGET, BatchGetItem, etc) and the same logic applies the moment you expect more than one call at a time.

Outside those, I default to batch. The cost is small, the upside compounds, and the alternative is a slow drift into a codebase full of single-row functions wrapped in Promise.all because that’s the only door anyone left open.

Comments