martyw.dev

Stable URLs without a redirect table

A short random prefix plus a throwaway slug gives you URLs that survive title changes — no UUID ugliness, no slug-history table. Plus why the collision-retry loop matters more than the keyspace suggests.

A single unbroken glowing orange line running diagonally across a dark, textured charcoal surface.

A good entity URL wants two things that pull against each other. It should be readable — a slug built from the title, so a human glancing at the link has some idea what’s on the other end. And it should be stable — the same URL should resolve forever, even after someone edits that title for the third time. Readability ties the URL to mutable content. Stability demands it never move. You can’t have both for free, and every way I’ve tried to square it has left an ugly corner somewhere.

The version I’ve landed on: a short random prefix, then whatever slug you like.

/articles/9gy4am0w-momentum-in-the-product-cycle-will-offset-headwinds

Only the 9gy4am0w is real. It’s the entity’s identity — generated once, never changed. Everything after that first dash is decoration, pulled from the current title or an excerpt or whatever reads well today. The backend matches on the prefix and ignores the rest entirely. So the URL above and

/articles/9gy4am0w-this-thesis-aged-badly

resolve to the same thing. Change the title, regenerate the tail, and old links keep working, because the only load-bearing part of the URL never moved.

What it replaced

The first instinct is to put a UUID in the path: /articles/3f9a2c1e-7b.... Stable, certainly — the key never changes. But it’s ugly, miserable to read down a phone, and it throws away the slug, which is free real estate for both humans and search engines. A URL with no keywords in it tells a crawler nothing and tells a person even less. You’ve bought stability by giving up everything else the URL was good for.

So we went the other way: a slug-history table. Store every slug a piece of content has ever had in a side table, and on each request, look the incoming slug up, find the current canonical one, and redirect if they differ. This is the approach we ran widely before pivoting away from it. It works. It’s also a lot of machinery: the table grows with every edit, every page load costs a lookup against it, and you need server-side rendering (or an edge function) to detect the mismatch and issue the redirect before the client settles. That’s storage, a query, and a round-trip’s worth of complexity to answer one question — “what is this thing?” — that a prefix answers with a substring match.

The prefix approach collapses all of it. There’s no history to store, because there’s no history that matters: the prefix is the only thing that ever identified the entity, and it doesn’t change. There’s no redirect to issue, because every variation of the slug is already valid — you just serve the page. The single concession to tidiness is a canonical meta tag pointing at the current, correct slug, so search engines don’t index forty permutations of one article as forty separate pages. Humans can hit any variant; crawlers are told which one counts.

The prefix itself

Generation is unremarkable, which is the point:

const CHARS = "abcdefghijklmnopqrstuvwxyz0123456789"; // 36 symbols
const prefix = Array.from(randomBytes(8))
  .map((byte) => CHARS[byte % CHARS.length])
  .join("");

Eight characters from a 36-symbol alphabet. On insert, try to commit it; if that prefix is already taken, generate another and try again, a handful of times before giving up. That retry loop is the part worth sitting with, because my first instinct was to treat it as defensive boilerplate that would never actually fire. With 36⁸ ≈ 2.8 trillion possible prefixes, a collision feels impossible.

Per insert, it nearly is. With a million entities already in the table, the odds that the next prefix collides are about one in 2.8 million. Fine.

But that’s the wrong question. The right one is the birthday question: across the whole life of the table, how likely is it that any two entities ever collide? That number climbs much faster. At a million entities you’re already at roughly a 16% chance of having hit at least one collision somewhere along the way. At five million it’s about 99%. Past ten million it stops being a maybe — collisions happen, and they keep happening.

So the retry loop isn’t decoration. It’s the thing that makes the scheme correct. The birthday paradox guarantees you’ll generate a duplicate eventually; the loop guarantees that when you do, nobody notices. Framed that way, “36⁸ is so big it’ll never collide” is exactly the kind of confident-and-wrong I’d want caught in review. The keyspace is big enough that collisions are rare per write and trivially cheap to handle — not big enough that they never happen.

If you want to be pedantic, there’s one more wrinkle: byte % 36 over 256 possible byte values isn’t perfectly uniform. 256 isn’t a multiple of 36, so the first four letters of the alphabet turn up about 14% more often than everything else. It shaves a sliver off the effective keyspace and is completely irrelevant for cosmetic IDs at this scale. But if you were generating these for something adversarial rather than decorative, you’d reject the biased high bytes instead of taking the modulo.

Why this is the one that stuck

The trick is just keeping two jobs in two halves of the same string. The prefix is identity; the slug is presentation. UUIDs collapse them by making identity the whole URL and giving up presentation. History tables collapse them the other way, treating presentation as authoritative and then spending storage, lookups and redirects to paper over the fact that presentation changes. The prefix approach refuses to mix them at all. Identity is eight characters that never move. Presentation is everything after, free to change as often as the title does. Links never break, the URLs still read like something a person wrote, and there’s nothing in the request path beyond a substring match.

It’s not novel — it’s roughly how Stack Overflow question URLs and Notion page links already work, an opaque ID doing the real work with human-readable cruft wrapped around it. But arriving at it the long way, through the ugly URLs and the clunky history table, is what made me appreciate why this is the shape that keeps showing up.

Comments