From the culture shift, to differences in Agile, Dave Farley and Michael Hackett discuss the nitty gritty of Testing in DevOps.
For this issue of LogiGear Magazine, our very own Michael Hackett sat down with one of the godfathers of Continuous Delivery, David Farley. In this exclusive interview, David discusses how test teams and automation are being impacted by CD/DevOps. David discusses and clarifies the meaning of current buzzwords in the industry, as well as provides some advice for organizations beginning their Continuous Delivery voyage.
Michael: Testers have become more integrated with developers with Agile teams. The big change in CD/DevOps is for the Ops side and deployment to be integrated as well. How does integrating Ops with Test teams help? Would it remove bottlenecks that many test teams have without having proper environments and data to test with?
Dave: I think that there are a couple of different ways in which CD impacts this relationship, the general and the specific.
“In general, CD works best when teams work in a very collaborative way.”
All of the people that have an impact on creating high-quality software and delivering it to users work closely with the same goal in mind: “placing valuable software into the hands of our users.”
Testers, product owners, developers and operations people are all working closely together to achieve that goal. The best teams do this with very fine-grained collaboration in nearly everything that they do.
On the specific relationship between Ops and testers, an important aspect of CD is the use of automation. We use automation to remove repetitive work of all kinds, like regression testing, environment setup, and deployment— to name just a few. When you start to take this kind of automation seriously you very quickly gain a great deal of flexibility.
This impacts testers (and others) directly by putting them in the driving seat. Instead of waiting for some operations or admin person to prepare a test environment, you choose the release-candidate that you want to test, choose the environment where you want to test it and push a button, and it is deployed and is up and running in a matter of a few seconds.
This is a very liberating experience the first time that it happens. It changes your relationship with the code that you are working with. It also changes how you think about ideas like, “what a release-candidate is.” In CD, we consider that every commit to a version control system gives birth to a “release-candidate” and the job of the process and the automation is to prove that that candidate is NOT fit to make it into production.
“My experience of teams like this is that testers become a much more involved, integral part of the team as a result, I believe that the professional testers that I have worked with on these kinds of teams have found their role more fulfilling.”
Michael: When people talk about shift-left, are they talking about developers doing more unit testing or is this shifting Test Automation suites to an earlier phase in the pipeline?
Dave: “Shift-left” is a bit more general than both of those things, I think. The idea is to learn faster! We want to improve our feedback so that we can more quickly understand if we have made a mistake. By “we” I mean everyone on the team. Another way of saying this is that we want to “fail fast.” In CD, our objective is to get definitive feedback on the quality of our work multiple times per day. Ideally we want to have viable release-candidates produced every hour or so.
In order to achieve that we work very hard to “shift-left” on all of our activities. We will design our systems to help us catch errors as soon as possible. We will use unit-tests to give us fast feedback and a reasonably high level of confidence in just a few minutes. We will design automated, functional tests, that we call “acceptance tests” to assert that the code does what our users want it to do in life-like test environments, running realistic scenarios, and we expect results from those tests in tens of minutes and we will test other characteristics of our software like its performance, scalability, resilience and security also in tens of minutes.
We use a “deployment pipeline” to organize our testing to ensure that the testing is all performed, and to allow us to optimize its performance. We will often run lots of tests in parallel to get results quickly. We will also organize our tests so that the most common errors are caught sooner in the process. This too is part of the “shift-left” idea.
Our approach is based on the assumption that if we can learn these things about the behavior of our systems very quickly, it will have a significant impact on the quality of our work, and the quality of our systems.
Michael: How are things different for Agile test teams as organizations move to full pipeline automation?
“I think that Agile testing has always been pretty collaborative, in CD we just turn-up-the-dial on that.”
Dave: I advise people to avoid manual regression testing. Instead, aim to automate ALL regression testing. I believe that this enables test professionals to do more interesting things.
They can help in the design of better-automated testing, working, embedded, in the development team, and adding their expertise to ensuring that we can “build quality into” our software rather than inspect things after they are complete. They can also focus on the more creative exploratory testing of systems and advise on their quality in a much broader sense of “is it nice to use?” rather than “does it work?” If we adopt the levels of automated testing that I am describing, we are pretty sure that mostly the systems “work.”
My experience of teams is like this is that testers become a much more involved, integral part of the team as a result, I believe that the professional testers that I have worked with on these kinds of teams have found their role more fulfilling.
Michael: What is/are the biggest culture shifts teams need to make in CD?
Dave: CD is difficult to adopt. It is a technically demanding, highly disciplined approach to development. Which sounds a bit daunting, however it is also very liberating. There are many new things to learn and new ways to think about things for everyone on a team adopting CD. These teams are intensively collaborative, highly experimental and very focused on the customer and the use of measurement and data to understand what works and what doesn’t.
All of these things take a bit of adapting to. I think that perhaps the biggest shift is towards making changes in small, incremental ways. Instead of designing solutions for our users, we tend to “evolve” solutions that better fit their needs. We are continually learning and adapting based on what we learn. This approach to continual learning and continuous improvement is completely pervasive and applies to every aspect of how we work.
If it sounds scary, it is not, it is delightful, but you do need to drink some Kool-Aid 😉
Michael: Manual testing and exploratory testing are sometimes considered bad words in software engineering. Some people see them as naïve, unnecessary, slow, unmanageable, etc. But recently they seem to be getting more popular again. Do you see a place for exploratory testing and manual testing in CD?
Dave: There are some things that human beings are better at than machines. Creative and exploratory thinking is what we are good at, we are poor at things that demand us to be reliably repeatable, like regression-testing. So let’s use computers for what they are good at (repeatable, reliable testing) and let people do the more interesting and creative stuff.
Michael: Automated acceptance testing has been a common practice, from what I can remember—since XP, but the focus for them is narrow. There are many automated tests that may have nothing to do with acceptance criteria or any documented acceptance. They could be about workflows or scenarios that customers do, or simply automated tests written to verify functions written from bad or non-existent user stories. If we have automated suites focused only on acceptance testing, isn’t that too narrow of a scope?
Dave: My definition of “acceptance test” is fairly specific. It is not really the same as “user acceptance testing,” it goes further than that. The approach that I advocate for is to use automated “acceptance tests” to drive the development process. We aim to create an “Executable Specification” (an “acceptance test”) for pretty much every behavior of our system.
We start with the requirements as Agile “stories” focused on describing observable behavior of the system from a user’s perspective. We then identify one or more examples (“acceptance criteria”) that we should be able to observe if that behavior exists in the system. We create a minimum of one automated test per “acceptance criteria.”
I recommend that we do all of this before we start writing code. Then we use these “executable specifications” to drive the development of the code. When these tests all pass, the story is complete—the “specification” is met!
As part of the development work to meet these specifications, we do a lower-level, more technical, testing using the techniques of Test Driven Development to create unit-tests. So we evaluate the quality of our code in two different dimensions, from the perspective of users, in our acceptance tests and from the perspective of the development team, in our unit tests.
Michael: Many test teams have been working for years to add more and more automated tests to build very large automated UI regression suites. In many organizations, these suites have become slow, difficult to maintain, costly, and ineffective. Making matters worse, these bloated suites are notorious for giving not-useful/inconsistent feedback. Organizations have often grown to rely on these dubious, large suites. Many test teams see them as bottlenecks but are afraid to cut their size. Have you seen this, or experienced it? Do you have any recommendations for companies with very large automated regression suites trying to automate their Dev pipelines?
Dave: Yes, I have seen many organizations in this position. I think that it is the result of a few different things. The first is technical. These sorts of tests are usually written in a way that confuses trying to evaluate “what” the system needs to do and “how” the system achieves that. These are, or at least, in my opinion, should be separate concerns.
I like to construct my test infrastructure in a way so that I can treat these two problems separately. I advise my clients to implement a “Domain Specific Language” for expressing the ideas in test-cases and letting the development team do the “plumbing” to make it work in lower-levels of the infrastructure. This separation means that the test-cases themselves are very abstract and are not tightly coupled to the system under test, allowing it to change, without invalidating test-cases.
The next problem is organizational. Most traditional organizations think of testing as the responsibility of a separate team, professional testers, or “QA” teams. I think that this is a mistake. I believe that testing works best when it drives the process of development.
We create our tests as “executable specifications for the behavior of our system.” We use these specifications to drive the whole development process. You build quality into a product, so quality should be treated as a first-order concern in the development process.
Finally, the last brick in the wall of successful, large-scale, automated testing is speed. Organizations underestimate the vital importance of fast-feedback. Without fast-feedback, it is impossible for development teams to keep the tests passing. A simple example is that if I work on a team that runs over-night builds and I only get my test results the next day. That means that if I commit a change that breaks something, it will take a day for me to see the failure, some time to understand what I did and commit a fix, and a day to see if my fix worked.
Deming famously said: “You can’t inspect quality into a product.”
Best case scenario, the build will be broken for a whole day, maybe more! If I can get my test results after an hour, I will get 4 or 5 attempts, during the course of a working day, to correct my mistake. My observation is that teams with long-running test-suites always have more failing tests than teams with fast feedback.
I think that we should work to treat testing as a cornerstone of the development process. We invest in automation, hardware, engineering, and ingenuity to do all of this with speed, quality, and efficiency. We focus on maximizing the quality and frequency of the feedback that we get from our tests. We architect our systems to be more testable (which by some very nice accident, also generally means that they are architectural of higher quality) and optimize the whole process to get this fast, high-quality feedback. When we do all of those things we see some quite remarkable results in terms of the speed, quality, and productivity of our work to create useful software.
Michael: Along the same line, some teams have lean and mean regressions suites that they trust and give good feedback. Often taking a night to run—they do a big daily build and run thousands or tens of thousands of tests overnight, over and over again—as their version of Continuous Testing. How can test teams in this situation move to leaner and faster-automated suites, resulting in better and more immediate feedback?
Dave: I think that, although pretty good by industry standards, an overnight test run is too slow. My experience has been that you need to get as close as you can to definitive feedback at least twice per working day, ideally faster than once per hour. This is often challenging to achieve, but it maximizes your chances to keep things working.
In some industries this may be impossible because of hardware demands, if you are creating physical devices, chips and other hardware, as part of your Dev process you can’t burn silicone every hour. Under those circumstances, you do the vast majority of your testing in simulators. Simulating the hardware accurately enough so that you almost never find a bug in real hardware later on. This can never be perfect, but it can be VERY good.
For software-only products, I think that as extreme as they may sound, my numbers for fast feedback are achievable with ingenuity, belief in the value of the feedback, and with money invested in good engineering and hardware to achieve it.
The numbers are on my side. Google does this at MASSIVE scale, Volvo Trucks evaluate tens of millions of lines of code, running in simulators, and produce multiple “release-candidates” every day.
Data from the industry indicates that organizations that do this kind of thing make software more efficiently— not less. These kinds of organizations are more profitable than comparable organizations that don’t practice this level of engineering discipline.
So even though this practice sounds expensive, and we do spend a lot of time and money on testing and test hardware, the data says that it pays for itself in terms of efficiency, quality, productivity, and bottom-line company performance. (See “State of DevOps Report 2017 & ‘Accelerate’” by Jez Humble, Nichole Forsgren, and Gene Kim).
Michael: At the same time as the above situations, there are very many teams who struggle with Test Automation. Maybe they have a solid, small CI automated smoke test but…that is it. They have little to no other useful, repeatable Test Automation, but they need to modernize. How do you suggest they start or jump-start progress into CD?
Dave: As I have suggested in some of my answers to other questions, there are some tricks and techniques that can help a lot. (See my conference talk on “Acceptance Testing”)
Step one is to ensure that software developers, and not just testers, are monitoring, and crucially are responsible for test failures. Make automated testing a normal part of every-day development practice.
When adopting this practice in traditional organizations with little or no automated testing, start by using “acceptance testing” techniques to automate any manual regression testing. If you get the abstraction in the domain-specific-language correct, this is not a very expensive exercise. As a rough approximation, it will take a similar amount of time to perform a single manual regression test run. This approach though has the huge advantage that once you have done it once, you can re-use the tests in future.
Run the automated tests that this produces every day, ideally constantly every day. If tests are slow, figure out how to make them faster (See my talk on “Optimizing Continuous Delivery”).
Finally, the real aim here is for the whole team to adopt a stronger engineering discipline around software development. You know that you are winning when the first response of anyone on the team to any problem is “how should we test that?” I believe that we can make enormous advances in productivity and quality if we start thinking of our work as a series of small, often automated, experiments.
Dave, thank you so much for taking the time and interviewing with us. This interview exceeds anything that we anticipated, it is spot-on and conclusive. Your answers clear up a lot of confusion that surrounds industry buzzwords by breaking down the terms and applying them to real-life situations. This interview definitely provides readers with tremendous insight on Continuous Delivery. Once again, thank you.