Engineering Transformations - Adopting Automated Testing As A Practice
Leadership, in a way, is about finding and using levers that help the business take step jumps in making progress towards the vision of the company. Engineering leadership is similar but limited to engineering and product development. As engineering leaders, one of our major responsibilities is to help our teams take step jumps in how we innovate quickly, ship high-quality software at break-neck speed and be able to operate them in production as well. I call such step jumps Engineering Transformation.
I use the term growth stage at the risk of sounding jargony (digital transformation, IT transformation, etc.) as they are often used by consultants and I am one right now. But that's just due to the lack of a better term in my vocabulary. I don't intend to make this article about expensive consulting that nobody likes (and perhaps does not deliver what most businesses need). I intend to talk about leadership, that too, at startups and growth-stage companies.
Bringing about transformations is not easy. To bring about a transformation and uplevel an organization, you need to intervene at many levels - skills, culture, mindsets, discipline, to name a few. The journey of transformation is not easy because the challenges are partly technical and mostly sociological.
It is unique for every organization. And it is worthwhile to discuss the challenges and techniques to bring about these transformations. I hope to discuss some of those transformation experiences I have led through a series of articles. And I am starting with one such transformation in this article.
Adopting Automated Testing as a practice
It's 2023. While, as an industry, we have been talking about Automated Testing and Continuous Integration for almost two decades, most teams continue to struggle with quality, Automated Testing, Continuous Integration and Continuous Delivery. So, as the first case study, I want to discuss the challenges of adopting the culture of automated testing, which helps improve quality and speed of execution, and eventually practising Continuous Integration and Continuous Delivery. In this article, I will try to cover patterns from my personal experiences of helping teams adopt automated testing as a regular practice in their teams.
This article is not to sell automated testing and shifting left on quality. That's a different topic and has been written about many times by many accomplished people in the industry. Books have been written on the topic. So, I am assuming that the value of automated testing as a practice is known to the reader. I am only going to talk about the struggles of adopting automated testing as a practice, which any team would have faced only if they wanted to adopt the practice, meaning that they understand the value of it at some level. And hopefully, this article will help them attempt it again.
Failure Modes
Teams that try to adopt the practice of writing automated tests usually try it several times and fail. They fail to adopt it as a practice and get value out of it. Let's unpack that first.
What does it mean to realise the value of automated testing?
- Better user experience - Automated tests help prevent regressions, meaning that known user flows of the application are tested on every change. This helps ensure that developers are testing all the ways that their applications can be used by their users, thus ensuring users don't face functional problems while using the application. Since the tests are automated, they can be run repeatedly over and over again.
- Confident releases - Because teams can test every change automatically, they can ship changes with a high degree of confidence. No more late-night deployments. No more gated PR merges by senior engineers only. No more batching a large set of changes and then testing them manually. Manual testing is expensive. The cost of it introduces bias in deciding which scenarios to manually test. With automated testing (theoretically), you can test all the known ways a user uses the application. Hence, automated testing also helps remove bias in the testing process.
- Continuous and on-demand releases - confident releases lead to the ability to release frequently and on-demand whenever the team wants. Continuous delivery of small batches of work leads to customers getting the latest work frequently. A positive side-effect of this is that engineering teams also get to work on the latest and well-tested version of the code base, which leads to lesser chances of rework and integration hell (merge conflicts).
- Shipping new changes and system maintenance become easier - confidence in releases opens the doors to make more frequent releases. More frequent releases mean faster value delivery to customers, faster learning from customer feedback and faster iterations. So, existing features can be improved faster, and new features can be built faster. But also, other work like system maintenance also happens faster. For example, most teams struggle to update their dependencies to newer versions because they are not sure if the application will work after upgrades. Security vulnerabilities don't get patched on time but automated tests make it possible to confidently patch software without breaking customer experience.
There are several failure modes for a team to try to adopt automated testing:
- Juniors working on automated testing - change starts from the top. This is also a side effect of the leaders of the organization not understanding the value of the transformation itself, and hence, they don't lead the change themselves. If the leaders want to see a change in their organization, they should drive it themselves (being hands-on with writing the code is not necessary). Otherwise, transformations and organisational changes are doomed to fail and should be suspended. Why suspend? As a leader, when you commit something to your teams but don't make it happen, your commitments stop meaning anything and become just words.
- Not focussed - organization-wide changes are hard. They require focus, especially if you are also learning the skill related to the change. In the case of automated testing, it could be learning how to write tests, what kind of tests to write, how to architect the test infrastructure, etc. Learning requires focus. Driving a big change, especially getting everyone else in the organization to learn a new thing and change their behaviours, requires focus on training, coaching, resolving anxiety and providing any kind of support to see through the change. Not being focused also means no clear, achievable goals and milestones. Without that, the value of automated testing will not be realised.
- Too few resources for the change - if you have no tests, the change is too big to bring the test coverage to a level where it provides high confidence during releases. If enough resources are not provided to increase automated tests, achieving "meaningful test coverage" might take forever to realise the value of your automated testing initiative.
- Not taking everyone along - you will likely start with a few people to get to "meaningful test coverage". But then what after that? How do you scale the practice to other teams and make it a "practice"?
- Poor feedback loops - not running tests frequently enough. The process of running tests is automated using a pipeline, but the pipeline is run only on-demand when a developer wants to run them. And then developers choose when to run tests, adding extra effort in getting valuable feedback from your automated tests. Tests should be run while developing locally, on every push to a remote repository and before merging the PR. The latter two are the controls that ensure that tests are always run before a change gets to production. Enforcing everyone to run tests locally might not be feasible depending on the size of your org (you can't monitor everyone, or maybe you can?). But, if possible, absolutely get everyone to run tests locally. But besides that, run tests before merging the PR at least? This leads to some important failure modes:
- Tests are run on-demand when someone wants to test something instead of running them automatically in a CI loop. Delays feedback, which leads to rework in the later stages of building software. Adds extra decision work for the team. It could even lead to changes not getting tested before being released to production because developers can skip running tests (because that's just extra work).
- Tests are flaky because of environment-related issues like sharing of the same environment for manual testing, poor test data sanity due to previous test runs, and lack of automation to setup the infrastructure for the application, to name a few. Flaky tests reduce confidence in the test suite. Hence, teams start to skip running them. Think about it - if you use a product that does not often work, especially when you need it the most, you will stop using it and find an alternative. In this case, it could just be not running tests.
Strategies for Success
Transformations are difficult. They are less about the technology and more about the people and what they must do to collectively be transformed. Understandably, the strategies of success are related to the failure modes.
- Change starts from the top - changes are hard. They don't just require technical skills but also organizational buy-in. Leaders are better positioned in an organization to drive change. When they lead the change themselves, they motivate everyone. They are also better suited to start, build momentum and then monitor progress in later stages because they now understand the problem space better to iterate and monitor progress without being hands-on. Leaders should lead the change themselves, especially if the org has been failing to drive the change bottom up, especially with difficult changes like adopting automated testing.
- If the senior leaders cannot be involved in the transformation initiative, they should be involved enough to understand the problem space and monitor execution and progress. But if they are not involved hands-on, the next available layer of seniors should be involved hands-on in the initiative. This could be senior engineers, tech leads or engineering managers. If they don't know how to write tests, then they should learn it by doing it and then learn what it takes to maintain a good test suite.
- Start by learning - if you are starting a transformation about anything, make sure that there is "enough" knowledge in the organisation to drive the change. I focus on "enough" because knowledge levels are contextual. For example, the knowledge for automated testing and the skills to build the infrastructure for it look very different at Google scale than for a startup. So "enough" is contextual. However, the fundamentals are universal. In the case of Automated Testing, understanding different kinds of automated tests, what value they bring, their implementation complexity, and their associated costs are important to understand. Most teams often start with Unit Tests. However, if you have an existing large enough codebase, unit tests alone are not going to provide enough value. You are going to need Integration Tests that test user behaviours.
- If you have the required knowledge and skills on the team, you can skip this. However, knowing where you stand in terms of the right technical knowledge is important to plan the strategy and its execution.
- If you don't have the required knowledge and skills to drive a transformation like this before, then start by learning. Follow SMEs, read literature, and talk to experts who have done this before. Don't underestimate the fact that you probably don't know enough about a domain that you are trying to venture into. Most teams that try to adopt automated testing as a practice struggle with this - not knowing what tests to write that provide value in the current context of the team.
- Create focus - solving any tough problem in business requires focus. Most teams are able to (or not) create focus on innovation while trying to solve a product problem but they struggle to create focus on driving transformations. Create a focus to learn, experiment and learn by doing, analyse your current context and then build a plan to execute. In the case of automated testing, first make it a clear goal (OKR, objective, deliverable, SMART goals - whatever language works for you). Then set aside a people or team for it. There are two ways to go about doing this:
- Dedicate a team that just does this (like working on a new problem or product) and drives the change centrally. This works well when the team also needs to acquire the knowledge to drive the change, learn the skills, bring clarity to clearly define the objective, and execute themselves to get to the first few milestones before scaling to other parts of the organisation. It also works well when a certain amount of work is needed in the beginning to lay the foundation and build momentum. I think this structure works the best for most teams. In the case of automated testing, this will look like learning how to write automated tests, write the first set of tests to set examples, build the infrastructure like CI pipelines and test data infrastructure, achieve "meaningful test coverage" to confidently release on-demand, etc.
- Set up a loose team (or a working group) that meets at a certain cadence (say weekly or bi-weekly). They support and drive the change across the org. This works well when the org already has the knowledge and needs to incrementally drive the change but still deliver results in a finite time period. In the case of automated testing, this might look like working with multiple teams to run different parts of the overall project with multiple teams (like one team solves the test data infrastructure, another one documents behaviours to test, and another one learns how to write tests in the existing tech stack, another one teaches engineers how to write tests by pairing with them, etc.).
- Provide adequate resources to meet timelines - initiatives and projects with one or two people usually suffer. Two people are not a team, but sometimes that's all you have. architectural may require learning, arechitecture work, technical execution, evangelism, coaching, support, documentation, continuous assessment of progress, and more. With an org wide behavioural change, it is already an uphill battle. Be aware that if enough people and the right people (i.e. seniors but also people with the right motivation) are not tasked to work on the initiative, the initiative might be doomed to fail. So, as a leader, it is also important to motivate the team to work on the initiative. In the case of automated testing, it is such a practice for everyone in the team. A change like that requires the leaders of the org to become experts in the practice themselves so that they can coach others, motivate them and also hold them accountable. Beyond the point of building the thought leadership on the topic, there is going to be just pure execution work as well, like writing a ton of automated test cases. Having some juniors to be a part of the initiative to help execute is necessary to move fast in the direction of the leaders of the initiative.
- Make the transformation goals finite and time-limited - vague goals are most likely to fail. Anything like "we will increase test coverage to 90%" is a bad goal. 90% coverage of what? In what time frame? Make the goals clear, finite and time-limited. "Write automated test cases to cover 60% of user behaviours with 100% coverage of critical flows in 3 months so that we can confidently release on-demand at any time of the day" is probably a clearer goal. This will drive execution towards writing test cases for features and user flows that matter to the business instead of just arbitrary test cases. The next goal could be to "add automated tests for every bugfix and hotfix without fail as a practice in the next 3 months, increasing test coverage from X% to Y%". The next one after that could be that "every feature must be released with automated tests increasing test coverage from Y% to Z%".
- Make meaningful progress to prove value - transformation initiatives are about improving how work happens. Often, teams get stuck in the loop of incremental improvement that does not provide value in a reasonable time period or, worse, will make it impossible to catch up with the speed of everything else happening. In the case of automated testing, I have seen teams commit to adding a "few tests every sprint". If a few tests are added every sprint, but those few tests are negligible compared to all the other code changes, it's mathematically impossible to catch up. In a legacy code base, it is already a very difficult goal. To match up to that debt of automated tests, momentum needs to be built. To build momentum, everyone developing software has to write automated tests. For everyone to commit to learning and writing tests, they have to first truly believe in doing that because change is hard and motivation is probably the biggest factor that is going to turn that into a reality. To get everyone to believe in being a part of the change, one of the best ways is to show them the value of the change. To show them value, work towards a "finite and time-limited goal" of writing enough automated tests that provide value. I have already defined value earlier in this article. To summarize, enough automated tests should give the entire team the confidence to ship changes without breaking core customer experience on-demand multiple times in a day and should provide them feedback quickly to course correct development in earlier stages. To be able to do that, work towards achieving "meaningful test coverage" - write enough test cases that test core business and user flows and enable the entire team to release on-demand multiple times in a day.
- The right way is the only way to work; stick by it and make it easy to follow it - if you expect a change in the organization, then you have to make sure that everyone is following the change you expect them to follow, even if it means manually catching them. If you want your team to run tests before shipping changes to production, make sure that they do it even if you have to check with your teams manually. Too much work? Of course, it is. So, automate it. Make sure that tests are run automatically on every change before changes are released to production, and then make sure that there are no backdoors (like SSH or skipping the release pipeline) to release changes, and now you don't have to do a check manually. Ensure that the right way (running tests and deploying via a pipeline) is the only way to release changes (no backdoors). And then, make it easy to follow the right way. Run tests automatically on every push to a remote server so that developers don't even have to think about it. Test runs should be fast. Otherwise, nobody would like to wait for the test suite and will try to find ways to skip the pipeline. In my opinion, 10 minutes should be the upper limit of time for tests to run. Ideally, they should be much faster. When the pipeline or the test automation setup breaks, stop all the work and fix it. Otherwise, your developers will find another way to release changes. If it can be done once, it can be done again. High-performing engineering teams treat pipeline issues the same way they treat production incidents.
- Evangelise, coach and support - a central team driving the change and doing a lot of work hands-on themselves is a great starting point to build the momentum and make meaningful progress to prove value, but what's next? As I said, in the case of automated testing, every engineer has to write tests to keep up with the pace of software changes. That requires bringing the entire team along. They might need training, coaching and continuous support, especially when they need to learn with regular feature delivery work. It is not easy to do both things at the same time. So, empathise with their situation and extend help to make the transition easy. Some strategies that could work to scale execution - start with one team and go team-by-team to allow each team to go in-depth of the topic, or start with all the teams together (more or less) but with relatively easier tasks and allow them to build expertise over (a finite) time. Some tactics that could work to extend help: organise a boot camp to bring teams up to speed, office hours to resolve queries and doubts, prefer synchronous discussions over asynchronous back-and-forth, drive discussions to understand the thought process over (say) just doing PR reviews, pair program on a few tasks, dedicate bandwidth of the central initiative team to provide proactive and reactive support to teams adopting the practice.
- Build systems for continuous progress - while supporting teams is important, you have to build systems for continuous progress, almost as if progress is guaranteed. Making progress in transformation initiatives is hard. There will be opposing forces. For example, in the case of automated testing, the opposing force is delivering new features. So, what kind of systems can ensure continuous progress? As leaders, hold all teams accountable to the change and their related tasks (like writing tests every sprint to increase the coverage). Remember I said that change starts from the top so that leadership can be in the weeds and see through changes? This is where the organization leaders can hold their team leads and managers accountable for meeting their teams' objectives for automated testing. Leaders can weekly monitor the progress of all teams on how many tests are being added by their teams and if the defined sprint objectives for improving automated testing as a practice are being met. Track the progress of test coverage against a finite, well-defined set of tasks that need to be completed. Weekly monitoring is a good starting point but can be slow (long feedback loop). Consider automating whatever you can to simplify monitoring over shorter periods of time. For example, one approach that has worked really well for me in the past is enforcing that new tests are added (or existing tests are modified) for every bugfix (detected by a consistent branch naming strategy for bugfixes, like
bugfix/name-of-the-branch
. You can practically enforce this via automation in the CI pipeline so that a pull request cannot be merged if a new test case is not added or an existing test case is not modified. Tactics like these may work well to scale accountability for transformation in day-to-day without seniors keeping an eye out and micromanaging. Read more about the philosophy of using DevOps for scaling engineering management. - Monitor progress - if you can't measure it, you can't improve it. Set up metrics to track outputs and outcomes. Tracking does not have to be fully automated. Full automation (AKA perfection) may come in the way of making progress. Just start with weekly tracking of progress. That might be enough to start and then automate depending on which feedback loops need to be shorter. Now let's talk about some things that could be worth tracking periodically - the number of automated tests, coverage of business-critical flows in the product, regressions caught before escaping production, on-demand releases in a day, number of reported bugs over a week, number of hotfixes in a week, how many engineers can confidently write automated tests. Then, work towards improving these metrics. When you see success stories (my favourite is when potential regressions are caught before code changes are released), celebrate those success stories publicly and widely to build confidence and momentum.
Too much to do?
Yes, it is. I will repeat myself - change is not easy. While I discussed automated testing in this article, this applies to any organizational change. This is my (perhaps incomplete) model of driving changes. It feels a lot because it is new. You run your teams with some said or unsaid rules already, but they don't feel like a lot because you are comfortable with them. The idea is to do it enough to feel comfortable so it does not feel too much.
Getting a team to work in sync requires commonly agreed-upon principles and core values and sticking by those values. It's not easy, but I don't think there is another better way as well. It feels better when you do more of it.
What's next?
I am interested in discovering and building generic models for transformation that work well and scale for growing startups. I have been meaning to write about transformations for a while. I might not have all the answers, but I can share my experiences that have worked for me in the past. I have more ideas that I want to explore on similar lines. And I want to write about them to test my own models. If you have any topic that you would like me to cover, please feel free to reach out to me on Twitter.