21 November 2022

Thoughts on GraphQL After 2 Years

What two years of running a production GraphQL service taught me about where it earns its keep and where it doesn't.

A stylised orange GraphQL-like graph against a dark backdrop.

Two years on a single technology gives you enough rope to either hang yourself or learn what it’s actually good for. Here’s what I’ve taken from running a production GraphQL service alongside an existing REST API for that long.

The original setup was a proof of concept — Node.js, Apollo Server, Express, a small slice of public data exposed to show what GraphQL could do. It went over well, got promoted to production, and grew slowly for about four months until a major piece of new product functionality landed on top of it. That was the inflection point. Contributions sped up. Today it’s a critical service, on par with the PHP API it sits alongside.

I came in after the proof of concept but became the person most invested in it. I refactored it into a federated graph using Apollo Federation — multiple subgraphs, a gateway, separate deployments, each one a Nest.js application. I ported core functionality across, wrote most of the documentation, and made it safe enough that the rest of engineering could build on top without asking me first.

The point of this post is what I learned about what GraphQL is good for and what it isn’t.

The pitch you’ll see first

If you’re new to GraphQL, the first two selling points you’ll encounter are these.

Clients query just the data they need and get it in fewer requests than REST. Smaller payloads, fewer round trips, particularly nice on mobile.

The server describes the data it provides instead of designing dozens of endpoints. You define Article, Author, Tag and the relationships between them, and the graph handles the rest. A REST API has to make decisions endpoint by endpoint about what to include, and either bloats payloads or invents a non-standard inclusion mechanism. GraphQL gets that for free.

Both true. Both barely scratch the surface. A new developer who stops here either underestimates the learning curve of running a performant GraphQL server, or dismisses GraphQL as a thing REST can replicate with enough effort and never engages with it properly.

The actual shape of it

The thing GraphQL gives you that REST doesn’t is the realisation that you’re building one graph, not predefined sets of data at predefined URLs. The graph grows naturally over years because all you do is add entities, fields, and relationships. Existing consumers that don’t care about the new fields see no change to their payloads. Consumers that want the new data make a small tweak to their query and get it — no new endpoint, no manual reconciliation.

What REST forces you into

Here’s the standard example. A REST endpoint that returns news articles:

GET /articles?limit=10&offset=0

{
  "total": 55,
  "limit": 10,
  "offset": 0,
  "articles": [
    {
      "title": "Example Article",
      "published": "2022-05-18T09:24:18.536Z",
      "body": "<p>This article is about...</p>",
      "slug": "example-article"
    },
    ...
  ]
}

Fine for a basic article list. The consumer is stuck with whatever data the endpoint returns, downloads the full payload even if they only want titles, and has no way to ask for related entities. Say we want to add the author and some tags. Three options, all with costs.

The most common move. Each article now includes author and tags:

{
  "title": "Example Article",
  "published": "2022-05-18T09:24:18.536Z",
  "body": "<p>This article is about...</p>",
  "author": {
    "name": "Mr Author",
    "photo": "https://some.s3bucket.com/mr-author.png"
  },
  "tags": ["example", "featured"]
}

The data is available with no consumer effort. But every consumer pays for it whether they want it or not, payload sizes grow, and the database runs a more complex query with joins on every request. TTFB suffers for everyone, including the consumers who never wanted the extra data.

Reference data goes into the response and consumers fetch related entities separately:

{
  "title": "Example Article",
  "published": "2022-05-18T09:24:18.536Z",
  "body": "<p>This article is about...</p>",
  "authorId": 6,
  "tagIds": [1, 3]
}

authorId becomes a GET /authors/:id. Tag IDs the same. These endpoints will need bulk support so consumers aren’t making one request per tag. Consumers get selectivity, the API is more granular, future consumers can pick smaller buckets of data — at the cost of pushing reconciliation work onto every consumer, who now has to bulk-load related data, deduplicate, and cache it themselves. The first time a consumer accidentally fires off 100 GET /authors/:id requests because someone forgot to write a batching layer, you’ll know what I mean.

Roll your own include/exclude syntax

The advanced version. The API supports something like:

GET /articles?limit=10&include=author,tags
                       ^^^^^^^^^^^^^^^^^^^

The server parses this and decides whether to include related data. There are frameworks for it, and parts of the community have adopted vaguely consistent conventions. My read is that it’s non-standard, inconsistent, and you end up reimplementing it every project. The engineering team builds and maintains custom logic to parse the include directive and defer loading. The complexity scales with the depth and breadth of relationships. Each endpoint owns its own include support, so the article endpoint might let you include authors but the author endpoint won’t let you include articles unless someone manually wired it up.

These approaches can work for small projects. They all start to fall apart as the dataset grows, and none of them scale gracefully to an API exposing hundreds or thousands of fields.

How GraphQL handles the same problem

GraphQL drops the “endpoint” framing entirely. You declare entities, their fields, and the relationships between them. The query planner walks the request down to the resolvers your application defines. There’s no articles endpoint — there’s an articles resolver that takes arguments and returns Article entities:

query {
  articles(limit: 10) {
    title
    slug
  }
}

Response:

{
  "data": {
    "articles": [
      {
        "title": "Example Article",
        "slug": "example-article"
      }
    ],
    ...
  }
}

Out of the box: omit any fields you don’t want, payload stays minimal. The author and tags become additional fields on Article, with their own field resolvers:

@ResolveField(() => Author)
public async author(@Root() article: Article): Promise<Author> {
  return this.authorRepository.findOne(article.authorId);
}

The eagle-eyed will spot the N+1 problem this creates. Hold that thought — I’ll get back to it.

If the consumer asks for author, this resolver runs. Because the field returns an Author entity, any fields on Author are now reachable in the same query tree. Want an author’s three most popular articles on hover? One additional resolver, one nested query:

query {
  articles(limit: 10) {
    title
    slug
    author {
      name
      photo
      popularArticles(limit: 3) {
        title
        slug
      }
    }
  }
}

Custom directives extend this further. A @transform directive on a photo lets the consumer specify dimensions:

query {
  author(id: 6) {
    name
    photo @transform(width: 36, height: 36)
  }
}

Where it actually pays off

Letting consumers pick their data shape with minimal backend effort is good. The bigger thing for me has been federation.

A federated graph lets you break the graph into subgraphs that compose into a single supergraph. Two consequences worth caring about.

Subgraphs are their own projects, deployments, and scaling units. Each can run a different framework version, runtime, set of dependencies, infrastructure profile.

Each subgraph can be owned by a different team. A large engineering organisation can ship into one unified API without teams stepping on each other’s code.

Because federation lets you extend any part of the supergraph from your subgraph, existing entities pick up new fields owned by a different team. Concretely: there’s a team that owns news articles, authors, and tags. Your team needs to attach products to articles. With REST, you build a new service at products.api.yourcompany.com, consumers make multiple calls, the two services diverge in subtle ways. With GraphQL, you create a subgraph that extends Article with a products field. Consumers ask for it as part of the article query:

query {
  article(slug: "example-article") {
    title
    published
    body
    products {
      name
      link
      photo @transform(width: 120, height: 120)
      price
    }
  }
}

Your service owns the products data, the mapping table, and the implementation. If a consumer later wants the inverse — articles related to a product — it’s trivial. Consumers stick to one endpoint and get a consistent experience across every team that contributes to the graph.

Things to watch out for

It’s transport-agnostic

GraphQL doesn’t ride on HTTP status codes. A request that errored will still return 200 OK with an errors field in the payload:

{
  "data": null,
  "errors": [ ... ]
}

Two practical consequences. Logging and monitoring tools that group by HTTP status to find errors won’t catch any of these — you need a different observability strategy. Generic XHR clients like axios won’t surface errors the way they would for a non-2xx REST response. Consumers either need their own error-checking layer or should reach for a GraphQL-aware client like graphql-request.

Performance is your problem

A REST endpoint has a known shape. You can write one SQL query with the right joins and indexes, cache the response when it doesn’t change often, and ship. GraphQL gives you flexibility at the cost of that simplicity.

The N+1 problem I flagged earlier is the canonical example. A consumer asks for 100 articles, each with its author. The naive implementation runs one query for the articles and 100 more for the authors. That doesn’t scale. Anyone working on a GraphQL backend has to get comfortable with batch loading — from databases, microservices, third-party APIs, wherever data comes from.

The standard tool is dataloader. Roughly how it works: you declare a loader that takes an array of keys and returns an array of corresponding entities, the resolvers ask the loader instead of the database, and the Node runtime collects all the per-request loader calls into a single batch. The author loader looks like this:

const loader = new DataLoader(async (ids: readonly number[]) => {
  const authors = await authorRepository.findByIds(ids);

  return ids.map((id) => authors.find((author) => author.id === id));
});

And the resolver:

@ResolveField(() => Author)
public async author(@Root() article: Article): Promise<Author> {
  return this.authorLoader.load(article.authorId);
}

The 100-article query now runs two queries:

SELECT ... FROM articles LIMIT 100
SELECT ... FROM authors WHERE id IN (...)

This is the start, not the finish. Getting performance right across a graph that touches multiple databases, microservices, and external APIs takes work — but it’s a satisfying problem when you land it.

New attack surface

GraphQL can be exploited in ways REST can’t. The textbook example is DoS by query nesting. A circular relationship — articles have authors, authors own articles — lets a consumer write something like this:

query {
  articles(limit:10) {
    author {
      articles(limit:10) {
        author {
          articles(limit:10) {
            author { ... }
          }
        }
      }
    }
  }
}

A naive implementation runs thousands of SQL queries. You don’t even need circular relationships — aliases let a consumer hit the same resolver as many times as they want in a single request:

query {
  list1: articles(offset: 1, limit: 10) { title }
  list2: articles(offset: 2, limit: 10) { title }
  list3: articles(offset: 3, limit: 10) { title }
  ...
}

The defence is query complexity limits, which the developer has to wire up themselves, typically via a plugin.

It’s overkill for small projects

This post has been hard on REST, but for small projects REST is the right call. A small dataset, a dozen endpoints, one or two consumers — there’s no reason to take on the complexity of GraphQL. Our graph sits in front of several REST-based microservices that own domain-specific data. If you’re building a small API serving one or two clients, write REST.

After two years

GraphQL is my favourite technology to work with across a twelve-year career. I’d recommend it to any organisation building an API large enough to support multiple teams and many consumers — it scales with the team and the business, and the upside compounds over years. The caveat is that you need senior engineers to do it well. Performance tuning and schema design have real depth to them, and a graph built by people who haven’t done either before will hurt you down the line.