5 minute read

What are we talking about?

A CTO might be tempted to cut costs, save resources, and increase speed of delivery by cutting the test environment. It’s yet another bureaucratic layer in the way of delivering software that will make the company even more profitable.

A CEO might put the same pressures on their CTO, adding that they are confident they hired the best of the best who use the latest AI and SDLC best practices. After all, they are paying these staff so much.

Why waste time?

Why waste money?

Why delay yet again a release that is already late?

Why indeed… Read on.

Why should your CEO care?

“That’s nice. What is a test environment for?” — Your CEO, maybe?

While these arguments may seem compelling in the short term, the long-term risks of not having a test environment often outweigh the benefits.

Data centre on fire

First, a single major production outage or security breach can cause reputational damage, financial loss, and customer churn. How much does it cost for your service to be down for half a day, or a few days? Do your SLA ensure that you have 99% online time? Once is fine, twice is negligence and customers will move away.

Second, who do you want to find bugs in your product? Your paying customers? Would they be thrilled by this? Are you thrilled when you have to report a bug to a vendor you use? Or would it be better if those issues are found before the customers even see them? Even if that bug is simple as a spelling mistake or a link leading to a 404 page. Some potential customers might just use this as an excuse to go to your competitors.

Finally, the confidence that your latest features are deployable in a timely fashion gives the sales and marketing team confidence that when they say something will be available, it will be. Predictability is a major selling point.

Investing in a test environment is a strategic move to ensure reliability, security, and customer satisfaction. Or, as we are keen on saying, become drama free.

What Makes a Good Test Environment?

As far back as 2013, DevOps research from Gartner identified live-like test environments as one of the main factors correlated with high software velocity (the other was feature switches). Gartner has not changed its mind in 2024 stating that well-defined test environments are a prerequisite to delivering high-quality software using agile and DevOps practices.

Many things have to go right to have a good test environment.

It has to have environment Parity with Production: your test environment should be a clone of production. This means the same database, same configuration, and same pods/containers. The scale might much smaller (after all, there’s not going to be 100s of thousands people using it at the same time), but it should have the same things as productions.

It has to be isolated and independent. It should be on a different name space (do not share TLS certificates!), but run on the same base metal. If something catastrophic happens there, it should not affect production or development or anything else. And something catastrophic will happen there! It is where you can see what happens when it does so you can fix it before it hits production.

Because it will break in unexpected (and sometimes expected) ways, it should have automated provisioning and teardown. Infrastructure as code is a great concept that is perfect for this. This is especially true when you have to create the same environment on different providers.

Realistic test data is essentials. This is probably the hardest thing to get right. Testing with realistic, anonymised, or synthetic data that mimics production data ensures accurate validation of features, performance, and edge cases. Yet such data is hard to make. Thankfully, this is a place where generative AI can do wonders for you. This is the only place where hallucinations are a good thing.

Even more than comprehensive monitoring and logging. There should be an excessive amount of data there. Run all the code in debug mode (or at least the new features), with full audit logs. When things go wrong, you want to have as much information as you possible can. Of course, it means that you can drown in a sea of data, but we have AI for that…

However, all these things are utterly irrelevant if you cannot get funding. This is where you should re-read the previous section: why should your CEO care that you have a test environment?

Continuous Delivery where Test is Production

In the last few years, a lot of companies have been going the continuous deployment (CD) route where deployment to production happen many times per day. DORA 2024 has data that correlate CD to highly performant software1. For this to happen, one needs a automated efficient, robust, and complete continuous integration (CI/CD) pipeline.

Netflix even went further and runs chaos monkeys which will randomly terminate virtual machine instances and containers that run inside of your production environment. Exposing engineers to failures more frequently incentivises them to build resilient services. Understand that such monkeys are the final step in a long road.

However, CI/CD is not suitable for releases. Enclave software has its own patching cycles determined by the client, not the vendor. And in some cases, even those get tested by the client before being accepted: Anything safety critical in hospitals, the military, and aerospace is unlikely to accept multiple releases per day.

Conclusion

As a CEO, we hope that you now have a better idea why test environments are so important. And as as CTO, we hope you have the reasoning to get the budget to get one setup. There are lots of details that we glossed over. But this is a start. Sometimes, you have to slow down to go faster.

Imagine, if you will, a day when all the drama is removed from your software production: no panic, no crisis, just smooth software releases that exceed your customer’s expectations. This is what we have done in the past and can do for you.

How about getting in touch to see how we can help you?


  1. Note that correlation is not causation and there data might be interpreted differently. For example, robust and performant software allows for safe continuous deployment.Well-defined test environments are a prerequisite to delivering high-quality software using agile and DevOps practices. 

Updated: