While I have written a lot about the concept behind the tool Behat in an earlier blog post (“BDD as an agile method”), I want to deal with how to putautomated web acceptance testing with Behat and Mink into practice this time. Though, I neither give an introduction into Behat nor Mink here, there are already plenty of resources dedicated to that – most prominently the Quick Intro to Behat and the documentation on Mink’s website. Instead, I want to present my top 5 challenges of practicing BDD with Behat/Mink that I have accumulated from several projects I was involved in as a “BDD-ish” developer.

If you like pictures more than text… the top 5 challenges elaborated here are also included in these slides (at the very end):

#1: Developers write scenarios

While there is no standardized classification of agile testing artifacts, there is at least the common perception that tests rather “support programming” or “critique the product”, and that they are rather “technology-facing” or “business-facing”. This proposal of a taxonomy was introduced by Brian Marick with his agile testing matrix already in 2003.

Agile testing matrix by Brian Marick

To date, automated tests in general are regarded a “developer-only thing” nonetheless. Maybe because unit tests are still the best known kind of automated tests and because they are indisputably technology-facing artifacts that support programming, stakeholders might be tempted to put automated acceptance tests in the same corner (or “quadrant” if strictly speaking in terms of the matrix). Even though automated acceptance tests created through BDD do support programming, they are definitely more business-facing than technology-facing. After all we employ a DSL for describing scenarios in the “language of the customer” not for the sake of the developer’s affection for exotic programming languages. But even with that fact in mind, many stakeholders (also some developers) speak of integration testing when they think of any testing practices beyond ordinary unit testing. Thus, they still do not see that the involvement of the customer (or representative) is as essential as that of the developer in the creation of automated acceptance tests.

Further, the customer’s involvement is pivotal in the case of the question whether an automated acceptance test should be created at all. In contrast to unit tests, acceptance tests should not be created by the initiative of the developer. Instead they should be artifacts the customer explicitly asked for. Of course, as a developer you can try to convince your customer of BDD by presenting an initial acceptance test suite that you have created completely on your own. You then do not only implement the “executability” of the scenarios but you also write the scenarios themselves. Though, in the case of StoryBDD (Behat), this way of convincing will hardly work apart from causing an ephemeral enthusiasm for the tests by the customer. And to establish BDD as an agile method is virtually impossible this way. Actually you will most likely fail much earlier: If the developer writes the scenario, it will look notably different from the scenario the customer had written. Even though some developers might be able of thinking in the proper domain language, they cannot create this language from scratch. In the worst case their scenarios might resemble source code instead of natural language.

#2: Scenario abstraction level

As said before, scenarios written by developers look differently from those written by the customer. Regarding the typical developer’s mindset, which is coined by a thoughtful use of words, the developer’s description is probably more fine-grained than the customer’s description. Just consider this (fictitious) example:

When I fill my first name into the field "first name"
When I fill my last name into the field "last name"
When I fill ...


When I fill the form

Of course, we already found that developers should not write scenarios on their own. But our previous separation of project stakeholders into developers and “customer” (and additional stakeholders) has been way too black-and-white: Except for the very ideal case that your “customer” is the actual customer, who furthermore acts in the role of the user (this cannot be taken for granted), you probably have to deal with the problem of different scenario description styles as well. Just imagine your “customer” is the UX expert within your organization because there is no other representative available who comes closer to the actual customer/user. In the case of such form-related browser interaction as exemplified above, the UX expert might – likewise to the developer – favor the fine-grained variant over the coarse variant.

And even if there is a representative available who speaks the customer’s domain language properly, it is still uncertain which abstraction level to use for describing the scenarios. The level has certainly to be higher than that of your unit/integrations tests and it is definitely a best practice – if not necessity – that all of the scenarios belonging to a certain feature or user story are described on the same abstraction level. Though, besides these two prerequisites, the stakeholders have to find an agreement which concrete abstraction to start with. And it is not unlikely that it has to be changed in the future when more and more scenarios are added. Konstantin Kudryashov‘s presentation Behat by example provides good examples that show how to revise scenarios in respect of unifying and elevating the abstraction level of a description.

#3: Redundancy

Behat’s DSL (named Gherkin) offers several possibilities to reduce redundancy in the scenario descriptions, such as arguments, backgrounds,scenario outlines, and tables. From the perspective of a developer these “reuse capabilities” are very helpful when creating maintainable automated acceptance tests: It is a well-known fact that redundancy in source code impedes its maintenance. Certainly, the same does not necessarily apply to testing artifacts expressed in the form of prose. But in the case of BDD it does, because a step within a scenario description is matched against the available step definitions (i.e. the implementation of the steps in PHP code). Thus, less redundancy in the description leads to less redundancy in the PHP code and, therefore, to a potentially increased maintainability of the acceptance test constituted by the scenario. This blog post provides examples for how to apply the Gherkin’s reuse capabilities in order to avoid redundancy (it is about Cucumber‘s implementation of Gherkin, but it is valid for Behat’s as well).

Nonetheless, a developer, who is used to sophisticated programming languages, might still consider Gherkin’s possibilities to reduce redundancy as limited. So why not augment the DSL’s reuse capabilities to make it a more “proper” programming language? Well, the problem of limited reuse capabilities might certainly be relevant. But much more relevant than that is the problem that the understanding of those is limited as well – by the customer. And who defines what to write in the DSL and how? It is the customer after all. And even with the reuse capabilities that were given to Gherkin “by birth”, you can imagine scenario descriptions that are not understandable by the customer at all, just due to a heavy use of one or more of the reuse capabilities mentioned above.


#4: Time-consuming refactoring

Agile testing pyramid by Mike Cohn

Creating and maintaining automated acceptance tests is very costly compared to unit or integration tests. In this context Mike Cohn‘s agile testing pyramid serves as a rule of thumb. The pyramid represents all automated tests on a quantitative basis. It is divided into three layers representing different kinds of automated tests: The unit layer at the bottom, the service layer at the middle, and the UI layer at the top.

While there are several interpretations of the concrete meaning of each layer, it is commonly accepted to subsume unit and integration tests in the two lower layers, whereas acceptance tests can be assigned to the top layer as long as they are UI-related. The higher a layer resides in the pyramid, the more expensive its tests are in respect of the effort for creating and maintaining them. As it is a pyramid and not a cube, the rule of thumb here is that you should have considerably fewer acceptance tests than unit/integration tests if you want to pursue a profitable cost-benefit ratio.

The costliness of automated acceptance tests is further reinforced by the disproportion between the effort for creating them and the effort for maintaining them. While both activities are more or less evenly balanced in the case of unit/integration tests, this is not true in the case of automated acceptance tests (regardless of whether they were created through BDD or not): Extending existing tests or debugging broken tests in order to fix them requires you to run the acceptance test suite several times probably. In our case, running the acceptance test suite means executing all the scenarios. This execution takes place in real-time, i.e. all the interaction between user and browser is simulated with the same speed as in reality. Thus, maintaining automated acceptance tests becomes an incredibly time-consuming process if compared with maintaining unit/integration tests, where programmatic possibilities such as Mocking and Dependency Injection enables you to bypass parts of the subject under test that are time-consuming.

In contrast to an manual execution of the automated acceptance tests, an automatically triggered execution with Continuous Integration (CI) in mind could of course mitigate the maintenance effort, as it escalates earlier when a test gets broken. Not using CI at all for automated tests is even denominated “sloth” in the 7 Deadly Sins of Automated Software Testing. Nonetheless, the effort for extending acceptance test remains the same. And in Agile, you do not extend code in a sloppy way but in a sustainable way – by following the common agile practice of code refactoring. Though, particularly that practice is directly impeded by the time-consuming execution of your scenarios: If it takes so much time to implement a test, a developer might most likely not be motivated to refactor code for the sake of having “clean code” instead of code that just works somehow.

#5: Complex infrastructure

An infrastructure for running an acceptance test suite based on the Behat/Mink framework has to comprise the following components:

  • Behat, which consists the Gherkin DSL and the method matching functionality
  • Mink, which provides a generic API to simulate interaction between user and browser
  • A so-called Driver, which maps Mink’s API to that of a particular browser emulator
  • The browser emulator, either being a headless browser emulator or a browser controller
  • And in the case of the latter: an actual browser that is controlled by the browser controller

It is obvious that such an infrastructure is more complex than an infrastructure for running unit tests as you have to wire together various components. There are two immediate consequences of that complexity. First, setting up the infrastructure is feasible but certainly not as easy as for most other kinds of testing frameworks. Second – and much worse – the infrastructure adds a certain degree of fragility to the acceptance test suite. Just imagine what could go wrong: The Driver you use could be implemented incompletely such that it does not provide the full set of Mink’s API methods (this actually is the case for Mink’s Selenium Driver). Or the browser controller could not conduct the interaction that is to simulate just because an “Are you sure you want to navigate away from this page?” dialog stands in its way.

Fortunately, there are ways to diminish this fragility: You could outsource the vast part of your infrastructure to a service provider like SauceLabs, which offers you a Selenium test execution environment as IaaS (even though you would then restrict the generality of Mink to Selenium). On the other hand, headless browser emulators such as Zombie.js become more and more sophisticated. Thus, you can take into consideration to rely solely on those and to consequently discard any browser controller, even if your simulated interaction relies on features not supported by conventional browser emulators (think of the missing JavaScript support of Goutte, for example).


Now it is you turn: What do you think about my top 5 challenges? After all they are highly subjective. Perhaps you know any best practices that could be helpful? Or do you have your own top N challenges of practicing BDD (regardless whether with Behat/Mink or not)?

1 Kommentar

  1. Great article! My biggest beef with Behat, and PHP BDD in general is that the learning curve for entry is basically impossible. I consider myself an expert PHP/Wordpress programmer and I’m barely able to handle the frustration level associated with these tools. Codeception has made the situation even worse. Anyway, I’m probably the last PHP dev trying to actually do BDD. It seems everyone else gave up three years ago! Keep the dream alive folks… BDD and scripting languages like PHP are how humans are going to speak to powerful a.i.. It’s coming!


Einen Kommentar abschicken

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.