The incident
We'd built a new authentication flow that validated a JWT audience claim against third-party identity providers. The work happened over about a week on a feature branch. Locally, everything worked — login succeeded, the audience check passed, tokens validated correctly. The PR went through code review, CI ran green, and we merged and deployed to production through our normal pipeline. No errors. No failed health checks. No warnings in the deployment log.
Within minutes of the deploy, our error tracking lit up. Every authenticated request was failing with SecurityTokenInvalidAudienceException. Production users couldn't log in. We rolled back within ten minutes, but by then the damage was done — a chunk of users had hit a broken login flow during a peak traffic window.
What had actually happened
During development, we'd added a new key to appsettings.Development.json to configure the expected JWT audience for our local identity provider sandbox:
// appsettings.Development.json — had the new key
{
"Jwt": {
"Issuer": "https://localhost:5001",
"Audience": "devtoolshub-dev",
"ExpiryMinutes": 60
}
}
Nobody had added the corresponding key to appsettings.Production.json. The PR diff only showed the Development file changing — reviewers (myself included) approved it without checking whether the production config needed the same update, because nothing in the GitHub diff suggested it was missing. Production's file looked like this:
// appsettings.Production.json — Audience key never added
{
"Jwt": {
"Issuer": "https://api.devtoolshub.info"
}
}
Why this produces no error until the worst possible moment
ASP.NET Core's configuration system layers appsettings.json, then appsettings.{Environment}.json, then environment variables, then (on Azure) App Service Application Settings — each layer overrides matching keys from the layer before it. Critically, it's a merge, not a validated schema. If Jwt:Audience exists in Development and not in Production, the key is simply absent in the merged configuration for that environment. There's no startup error, because IConfiguration has no concept of "this key should exist."
// Our JWT bearer setup — the code that read the missing key
builder.Services.AddAuthentication()
.AddJwtBearer(options =>
{
options.TokenValidationParameters = new TokenValidationParameters
{
ValidIssuer = builder.Configuration["Jwt:Issuer"],
ValidAudience = builder.Configuration["Jwt:Audience"], // null in Production
ValidateAudience = true,
// ...
};
});
// ValidAudience = null compiles fine, starts fine, and the app boots successfully.
// The exception only fires on the FIRST incoming request that presents a token,
// because that's the first time TokenValidationParameters.ValidAudience is actually used.
The application started cleanly. Health checks passed, because our health check endpoint didn't require authentication. The container reported healthy. Nothing failed until a real user with a real token hit an authenticated endpoint — at which point validation compared the token's aud claim against a null expected audience and rejected every single request.
The fix: fail fast with options validation
The first fix was making this class of bug impossible to deploy silently, by binding configuration to a strongly-typed, validated options class instead of reading raw strings:
public class JwtOptions
{
[Required(AllowEmptyStrings = false)]
public string Issuer { get; set; } = string.Empty;
[Required(AllowEmptyStrings = false)]
public string Audience { get; set; } = string.Empty;
[Range(1, 1440)]
public int ExpiryMinutes { get; set; } = 60;
}
// Program.cs
builder.Services
.AddOptions<JwtOptions>()
.Bind(builder.Configuration.GetSection("Jwt"))
.ValidateDataAnnotations()
.ValidateOnStart(); // <-- fails at container startup, not on first request
// Consume the validated options instead of raw IConfiguration reads
builder.Services.AddAuthentication()
.AddJwtBearer((options, sp) =>
{
var jwt = sp.GetRequiredService<IOptions<JwtOptions>>().Value;
options.TokenValidationParameters = new TokenValidationParameters
{
ValidIssuer = jwt.Issuer,
ValidAudience = jwt.Audience,
ValidateAudience = true,
};
});
ValidateOnStart() means a missing or empty Jwt:Audience now throws an OptionsValidationException the moment the container starts, instead of waiting for the first authenticated request. That's a real improvement — but it still only catches the bug after you've already deployed. The container fails to start, the deployment slot doesn't swap, and you find out from a failed deployment rather than from user-facing errors. Better, but not good enough.
The fix that actually prevents the deploy
What I wanted was to catch this during code review, before the PR even merges — by diffing the two config files directly and seeing the missing key as a visible, obvious red flag. I couldn't find a tool that did this anywhere, so I built one: an appsettings.json structural diff that's specifically aware of which differences matter and which don't.
// The kind of structural check the tool performs, conceptually:
// 1. Parse both files as JsonDocument
// 2. Walk both trees, building the union of all key paths
// 3. For each path, classify:
// - exists in Dev only -> MISSING IN PROD (critical)
// - exists in Prod only -> informational (often legitimate)
// - exists in both, different JsonValueKind -> TYPE MISMATCH
// - exists in both, same kind -> OK (value differences are expected and ignored)
using var devDoc = JsonDocument.Parse(devJson);
using var prodDoc = JsonDocument.Parse(prodJson);
// ... recursive comparison keyed on JsonValueKind, not literal value equality
The key design decision: the diff ignores value differences entirely and only flags structural differences. Your Development and Production connection strings are supposed to be different. Your log levels are supposed to be different. What's not supposed to happen is a key existing in one file's structure and not the other's — that's the actual bug class, and structural diffing surfaces it instantly without false positives from legitimate environment-specific values.
The generalizable lesson
Code review catches logic bugs because the diff shows you the logic. Code review does not catch configuration drift, because a PR that only touches appsettings.Development.json shows you a diff of that one file — there's nothing in a standard git diff that tells you a sibling file failed to get the same update. The bug isn't in what changed; it's in what didn't change somewhere else.
ValidateOnStart() on your IOptions<T> bindings is still worth doing — it's the right safety net for catching this in deployment rather than in production traffic. But the cheapest place to catch a missing config key is before you deploy at all, by diffing the environment files directly. Paste your appsettings.Development.json and appsettings.Production.json into the appsettings.json Environment Diff tool before every release that touches configuration — it takes ten seconds and would have caught this exact incident.