Thursday, April 16, 2026

Gherkin basics: Given, When, Then is not a template - it's a thinking structure

Jonas

Beginner5 min read

Behavior-Driven Development Requirements Engineering

Most developers first encounter Gherkin in a test file. It looks like structured English. A runner like Cucumber or SpecFlow parses it and executes it against the code. So the assumption is: Gherkin is a test format.

It isn't. Not primarily.

The test is a side effect. What Gherkin actually does is force you to reason about behavior in a specific structure before you write any code. That structure - Given, When, Then - is not a formatting choice. It's a logic constraint. One that most developers skip past, and most teams misuse.

What Gherkin actually is

Gherkin is a plain-language specification format used in Behavior-Driven Development (BDD). It describes what a system should do in terms of observable behavior, not implementation. A Gherkin scenario has a name and a sequence of steps, each prefixed with a keyword.

A simple scenario looks like this:

Scenario: User logs in with valid credentials Given the user is on the login page When the user submits the form with valid credentials Then the user is redirected to their dashboard

Anyone on the team can read it. Developer, product owner, tester - that's by design. Gherkin was built to close the gap between "what the feature should do" and "what gets implemented." Whether you run it as an automated test is almost beside the point.

Given, When, Then is a logic gate, not a label

Each keyword has a strict semantic role. Not a label you apply to describe what happens - a constraint on what kind of thing you're allowed to say.

Given is a precondition. It describes a state of the world that is already true when the scenario starts. Something observable. Something verifiable. Not something the user does - something that is.
When is a single action. One event. The user does one thing. Not a sequence of steps. Not "and then." One.
Then is an observable outcome. What the world looks like after the action - something a person could verify by looking at the screen, not what the system does internally.

The sequence matters because it maps to the logic of any state change: starting condition, trigger, observable result. In that order.

Uncle Bob has described this structure as a specification of a finite state machine: Given a state, When an event, Then a transition to a new state. That framing is more useful than "template." It makes clear why each keyword is a constraint - not a label you can apply loosely, but a slot for exactly one kind of thing.

When you write these three things in order, you're forced to separate ideas that are easy to blur in a ticket or a user story. That separation is the thinking work. The test is just a record of it.

Each keyword has exactly one job. The most common Gherkin mistakes happen when these boundaries blur - a Given that contains an action, a Then that describes the database instead of the screen.

Three ways to get it wrong

Treating Given as an action

"Given the user clicks the login button." That's not a state - it's an action. Given describes what is already true before anything happens. "Given the user is on the login page" is a state. The click belongs in the When.

This mistake is common because it's easy to write scenarios as narratives: the user does X, then Y, then Z. But Given/When/Then isn't a narrative. It's a logical structure. If your Given contains a verb that implies user interaction, it's wrong.

Writing implementation into the Then

"Then a session token is inserted into the users table." That's what the system does internally. A product owner can't verify it by looking at the screen. A tester without database access can't either.

Then should describe behavior that is observable from the outside: "Then the user sees their dashboard." If your Then requires knowledge of the implementation to verify, you've crossed the wrong line.

Stacking actions in the When

"When the user fills in their email, clicks the next button, and waits for the confirmation screen to load." That's three actions. A When has one.

Compound When clauses usually signal a missing scenario - or a missing Given. If you need multiple user interactions to reach an interesting outcome, either split them into separate scenarios or move the earlier steps into the Given as preconditions. The discomfort of writing a single clean When is the signal that something isn't agreed on yet.

What a clean scenario looks like

Scenario: User submits login form with valid credentials Given the user is on the login page When the user submits the form with a valid email and password Then the user is redirected to their dashboard And their name is displayed in the navigation header Scenario: User submits login form with invalid password Given the user is on the login page When the user submits the form with a valid email and wrong password Then an error message appears below the password field And the form is not submitted

The And keyword inherits the role of the keyword above it. Here it extends the Then - still an observable outcome, not an action. Use it for additional conditions in the same step type. Don't use it to chain actions in the When.

Nothing in these scenarios touches the database, mentions an API, or specifies session management. That's on purpose. The scenarios describe what a user can observe, not how the system makes it happen.

The decision you didn't know you were making

Here's what explains most bad Gherkin: in practice, scenarios get written by automation engineers, not product owners. The automation engineer knows the test runner and the code. They don't know what the product should do - that's someone else's job. So they write Gherkin that mirrors the implementation rather than specifying behavior. The keywords show up. The thinking doesn't.

Gherkin was designed to be written by the person who understands the feature - the product owner, the BA, the team in a Three Amigos session. When it ends up with the automation engineer instead, it becomes an abstraction layer on top of tests that already exist. Readable syntax on top of decisions already made.

A clean Gherkin scenario pulls up questions the ticket never asked.

What happens if the user is already logged in? Is a suspended account considered valid credentials? Does the error message include a link to reset the password?

These aren't test questions. They're product decisions. Without a Gherkin scenario, they get made silently - by whoever writes the code, during the sprint, with no one else in the room.

Writing a Gherkin scenario isn't writing a test. It's flushing out the decisions your ticket didn't make.

The structure puts hidden assumptions on the table early enough to argue about them. That friction belongs in planning, where it costs a conversation. Let it slide, and it resurfaces in the sprint, where it costs a rework.

That's the actual reason to learn Gherkin - not the test automation, not the runner. The format forces precision. The tests just prove it.

Speclr generates Given/When/Then scenarios directly from the discovery conversation - Gherkin isn't written after the fact, it emerges as the output of structured requirements work.

Gherkin basics: Given, When, Then is not a template - it's a thinking structure

What Gherkin actually is

Given, When, Then is a logic gate, not a label

Three ways to get it wrong

Treating Given as an action

Writing implementation into the Then

Stacking actions in the When

What a clean scenario looks like

The decision you didn't know you were making

Tags

Related posts

Product

Legal