Friday, April 17, 2026

Your acceptance criteria aren't missing. They're just lying.

Milica

Intermediate7 min readThe User Story Problem

You've been in this meeting. The sprint is two weeks old. The developer and the PO are looking at the same ticket. The acceptance criteria are right there - someone wrote them, they passed refinement, the story got pointed. But the feature that was built and the feature that was meant are different things, and now the conversation is about who misunderstood what.

Neither side is wrong. That's the part nobody says out loud.

The pattern I see most often isn't missing acceptance criteria. It's acceptance criteria that look complete - specific enough to feel confident in planning, structured enough to pass review - but that leave every meaningful decision open. Not as obvious gaps. As ambiguity that's invisible until someone tries to build from them.

That's a different problem than absence. An obvious gap gets flagged in refinement. A criterion that looks finished gets implemented - three different ways, depending on who reads it.

The meeting that happens in every sprint

Take this acceptance criterion:

"The system should validate the email address."

It's a verb, a subject, a target. It passes the smell test. A developer reads it and implements format validation - checks for an @ sign, a domain suffix, basic syntax. That's email validation. A QA engineer reads it and tests whether the system rejects a syntactically valid address on a non-existent domain. That's also email validation. The PO wrote it while thinking about the error state - specifically that the user should see a message below the field, not a toast, not a silent block, and the form should not submit.

Three readers. One criterion. Three different features in progress.

Nobody interpreted it incorrectly. The criterion genuinely supports all three readings. That's the problem.

The opposite failure is just as common. Take a criterion that goes the other way:

"The user sees a red error message directly below the email input field."

That one looks very precise. But it isn't testing the right thing. QA now validates color and position. Whether the email was actually rejected for the right reason - whether the validation logic itself works correctly - quietly slips out of scope. The tests go green. The logic might still be wrong. And when the design changes and the error moves to a toast, every test breaks for a reason that has nothing to do with email validation. Too-specific criteria don't just constrain developers. They redirect attention away from what actually matters.

False precision is harder to fix than an obvious gap

When a ticket has no acceptance criteria at all - when it says "user can manage their settings" and nothing else - that gap gets caught. Someone asks in refinement. Someone flags it in planning. The team knows the story isn't ready.

When a ticket has acceptance criteria that look complete, nobody asks. The developer builds. The story moves to in-progress. The questions that should have been asked before the sprint get answered during it, by whoever happens to be building at the time, with whatever context they have in the moment.

I call this the precision illusion. The criterion is there. It feels done. And because it feels done, nobody examines whether it actually answers anything.

The result is quiet variance: sometimes the developer's interpretation matches what was meant, sometimes it doesn't, and the delta surfaces in the sprint review.

Two kinds of acceptance criteria

There's a structural distinction that changes how teams write. It's not about Gherkin or templates or any particular methodology.

The first kind describes what the system should do. "The system should validate the email." "Errors should be clearly communicated." "The user should receive confirmation." These are intent statements. They sound like criteria. But there's no observation you can make that confirms or denies them without already knowing what the author had in mind. They're not testable - they're translatable. Each reader translates them differently.

The second kind describes what you can observe when the feature works correctly. Not what the system intends. What a person watching it would see.

The test is simple: can a QA engineer who wasn't in any of the planning conversations execute this criterion without asking a single clarifying question? If not, you have the first kind.

Most teams write the first kind. Not because they're careless - writing observable criteria requires thinking through specifics before the sprint, which always feels premature. But that thinking isn't premature. It's just happening at the wrong time.

"Observable" gets misapplied in practice, so it's worth being precise about what it means. A criterion that says "the error message appears below the input field in 14px red text" hasn't solved the problem - it's stepped into the solution space instead. Developers need technical freedom. What they don't need - and in my experience, don't want - is domain freedom: the latitude to decide what the feature should accomplish for the user.

The line sits here: AC close every domain decision - what the system must do for the user, under what conditions, with what outcome - while leaving every technical decision open. "The user cannot complete registration with an invalid email format, and receives feedback that their input needs correction" is observable domain behavior. Where the feedback appears, what it looks like, how the validation runs - that's implementation, whether it's technical or UI design. The AC defined the boundary. The developer works inside that boundary.

That boundary has a natural shape: domain behavior is what the system looks like from the outside - what the user can or cannot do, and what they observe as a result. The technical solution is what happens on the inside - which services run, how data moves, where state lives. AC belong entirely to the outside view. Everything on the inside stays open.

The same feature requirement across three levels of specificity. The target is the right column - observable domain outcomes that close every domain question while leaving the technical implementation to the developer.

The formulation move

Getting from intent to observable domain behavior doesn't require adopting a framework. It requires answering two questions before finalizing any criterion:

What must be true for the user when this works correctly? What must be true when it doesn't?

Those two questions close the domain space. "The system should validate the email address" becomes: given a user submits the registration form with an invalid email format, their registration attempt does not complete, and they receive feedback that their email input needs correction. That's one reading. The developer still decides where the feedback appears, what it says verbatim, and how the validation runs. The domain outcome - registration blocked, user informed - is fixed.

The rule: If you can hand this criterion to someone who wasn't in your planning meeting and they'd know exactly what the feature must do for the user - it's done. If they'd need to ask whether the user is actually stopped or just warned, rewrite it.

Gherkin (Given/When/Then) is a useful format for forcing this level of specificity. What makes it work isn't the syntax - it's the discipline of completing three distinct thoughts: the precondition, the trigger, and the domain outcome. When teams skip that structure, they tend to stop at intent. The format makes incompleteness visible.

The cost you're already paying

Writing observable acceptance criteria doesn't add work to the sprint cycle. It relocates work that's already there.

The interpretation happens either way. The question is when. If it happens before the sprint - in planning, in refinement, in the ten minutes it takes to write a proper criterion - it costs an argument, maybe a clarifying conversation, maybe a decision about which edge case to handle first. Those are cheap.

If it happens during the sprint - as a Slack message to the PO, a PR comment about the error state, a sprint review conversation that starts with "that's not quite what I meant" - it costs build time, rework, and velocity. Those are expensive.

An acceptance criterion that allows multiple interpretations isn't a criterion. It's a deferred decision with a done-checkbox attached. The checkbox gets ticked. The sprint starts. The decision gets made anyway - by the developer, in the middle of building something else, with whatever context they happen to have.

The formulation isn't the documentation of the decision. It is the decision. Before you write it precisely, the decision hasn't been made. You've just agreed to make it later, in a worse context, at a higher cost.

Your acceptance criteria aren't missing. They're just lying.

The meeting that happens in every sprint

False precision is harder to fix than an obvious gap

Two kinds of acceptance criteria

The formulation move

The cost you're already paying

Tags

Related posts

Product

Legal