Gherkin basics: Given, When, Then is not a template - it's a thinking structure

Most developers first encounter Gherkin in a test file. It looks like structured English. A runner like Cucumber or SpecFlow parses it and executes it against the code. So the assumption is: Gherkin is a test format.
It isn't. Not primarily.
The test is a side effect. What Gherkin actually does is force you to reason about behavior in a specific structure before you write any code. That structure - Given, When, Then - is not a formatting choice. It's a logic constraint. One that most developers skip past, and most teams misuse.
What Gherkin actually is
Gherkin is a plain-language specification format used in Behavior-Driven Development (BDD). It describes what a system should do in terms of observable behavior, not implementation. A Gherkin scenario has a name and a sequence of steps, each prefixed with a keyword.
A simple scenario looks like this:
Scenario: User logs in with valid credentials
Given the user is on the login page
When the user submits the form with valid credentials
Then the user is redirected to their dashboard
Anyone on the team can read it. Developer, product owner, tester - that's by design. Gherkin was built to close the gap between "what the feature should do" and "what gets implemented." Whether you run it as an automated test is almost beside the point.
Given, When, Then is a logic gate, not a label
Each keyword has a strict semantic role. Not a label you apply to describe what happens - a constraint on what kind of thing you're allowed to say.
Givenis a precondition. It describes a state of the world that is already true when the scenario starts. Something observable. Something verifiable. Not something the user does - something that is.Whenis a single action. One event. The user does one thing. Not a sequence of steps. Not "and then." One.Thenis an observable outcome. What the world looks like after the action - something a person could verify by looking at the screen, not what the system does internally.
The sequence matters because it maps to the logic of any state change: starting condition, trigger, observable result. In that order.
Uncle Bob has described this structure as a specification of a finite state machine: Given a state, When an event, Then a transition to a new state. That framing is more useful than "template." It makes clear why each keyword is a constraint - not a label you can apply loosely, but a slot for exactly one kind of thing.
When you write these three things in order, you're forced to separate ideas that are easy to blur in a ticket or a user story. That separation is the thinking work. The test is just a record of it.
Each keyword has exactly one job. The most common Gherkin mistakes happen when these boundaries blur - a Given that contains an action, a Then that describes the database instead of the screen.
Three ways to get it wrong
Treating Given as an action
"Given the user clicks the login button." That's not a state - it's an action. Given describes what is already true before anything happens. "Given the user is on the login page" is a state. The click belongs in the When.
This mistake is common because it's easy to write scenarios as narratives: the user does X, then Y, then Z. But Given/When/Then isn't a narrative. It's a logical structure. If your Given contains a verb that implies user interaction, it's wrong.
Writing implementation into the Then
"Then a session token is inserted into the users table." That's what the system does internally. A product owner can't verify it by looking at the screen. A tester without database access can't either.
Then should describe behavior that is observable from the outside: "Then the user sees their dashboard." If your Then requires knowledge of the implementation to verify, you've crossed the wrong line.
Stacking actions in the When
"When the user fills in their email, clicks the next button, and waits for the confirmation screen to load." That's three actions. A When has one.
Compound When clauses usually signal a missing scenario - or a missing Given. If you need multiple user interactions to reach an interesting outcome, either split them into separate scenarios or move the earlier steps into the Given as preconditions. The discomfort of writing a single clean When is the signal that something isn't agreed on yet.
What a clean scenario looks like
Scenario: User submits login form with valid credentials
Given the user is on the login page
When the user submits the form with a valid email and password
Then the user is redirected to their dashboard
And their name is displayed in the navigation header
Scenario: User submits login form with invalid password
Given the user is on the login page
When the user submits the form with a valid email and wrong password
Then an error message appears below the password field
And the form is not submitted
The And keyword inherits the role of the keyword above it. Here it extends the Then - still an observable outcome, not an action. Use it for additional conditions in the same step type. Don't use it to chain actions in the When.
Nothing in these scenarios touches the database, mentions an API, or specifies session management. That's on purpose. The scenarios describe what a user can observe, not how the system makes it happen.
The decision you didn't know you were making
Here's what explains most bad Gherkin: in practice, scenarios get written by automation engineers, not product owners. The automation engineer knows the test runner and the code. They don't know what the product should do - that's someone else's job. So they write Gherkin that mirrors the implementation rather than specifying behavior. The keywords show up. The thinking doesn't.
Gherkin was designed to be written by the person who understands the feature - the product owner, the BA, the team in a Three Amigos session. When it ends up with the automation engineer instead, it becomes an abstraction layer on top of tests that already exist. Readable syntax on top of decisions already made.
A clean Gherkin scenario pulls up questions the ticket never asked.
What happens if the user is already logged in? Is a suspended account considered valid credentials? Does the error message include a link to reset the password?
These aren't test questions. They're product decisions. Without a Gherkin scenario, they get made silently - by whoever writes the code, during the sprint, with no one else in the room.
Writing a Gherkin scenario isn't writing a test. It's flushing out the decisions your ticket didn't make.
The structure puts hidden assumptions on the table early enough to argue about them. That friction belongs in planning, where it costs a conversation. Let it slide, and it resurfaces in the sprint, where it costs a rework.
That's the actual reason to learn Gherkin - not the test automation, not the runner. The format forces precision. The tests just prove it.
Speclr generates Given/When/Then scenarios directly from the discovery conversation - Gherkin isn't written after the fact, it emerges as the output of structured requirements work.
Tags
Related posts
BDD is a thinking tool. The tests are optional.
Most BDD rollouts fail the same way: someone installs Cucumber before anyone understands why. The core of BDD isn't a framework, it's a sentence. Write the scenario before you write the code. Everything else is optional scaffolding.
Spec-driven development starts one step too late
SDD tools like GitHub Spec Kit and AWS Kiro solve a real problem. But every one of them starts with a spec already in hand - and nobody asks how it got there.
Why backlogs fail before the first sprint
Teams keep trying to fix their backlogs with better grooming, stricter templates, and more Jira fields. It never works — because the problem isn't in the backlog.

