
Monitoring Is Testing
Monitoring your services is an overhead, there’s no two ways about it. It takes effort, insight, and resources. Consider a simple analogy - having a phone call. There’s three levels of effort:
Having the phone call Having the call and also transcribing what is said Having the call, transcribing it, and performing real-time word usage analysis
Obviously, that third choice is a lot more effort - it needs more equipment, voice to text software and analysis software, and you need to tell it what you care about, which means you need to decide what you care about (and that last part can be the single hardest piece).
But there is a simple reality in the provision of software services; if you are not monitoring something, then it’s probably broken sometimes and you don’t know.
You may have heard an analogous statement about testing your software; if there’s not a passing test for something, that something is probably broken.
Just as tests check the functionality of software, monitoring checks the performance and reliability of software. Monitoring is testing - continuous, real-time testing of the live experience. Your users provide the activity, but you still need that little green tick.
Just as you must write your software to be testable - limiting side effects in functions, thinking about functionality in certain ways, breaking it up in certain ways - software must also be written to be monitorable.
This means thinking about key paths that represent user-observable actions, page load time, service response time, error presentation - then making them such that it is easy to count and measure them.
It is not enough to just slap some instrumentation library into your build and let it spit out every count and timing of every function call. That will get you a lot of data but you will need expert attention, and a lot of it, to turn that into usable insight. It’s way too easy to have a huge amount of this kind of auto-generated data without being able to answer the simple question - how is your software doing?
Instrumentation libraries are useful in development, so it’s what developers can find themselves drawn to. But services are run for a purpose, and you must be able to easily observe that purpose being fulfilled. How many pages, of which type? How many purchases, for how much? How responsive is the payment gateway? How many payment errors? Upload speeds. Password changes. Login failures. User signups. Slow searches. 404s. Websocket connects…
Not to mention any number of metrics about server and network behaviour that your system admins care about.
So, when should you start considering what metrics to bake into your software’s self-reporting? At requirements time - right at the start. Monitoring output should be a functional requirement, added to whatever features you care about.
It’s tempting to let these things slip when the time crunch happens, but consider the most important thing about monitoring your services - you aren’t the only one doing it!
Your customers are also monitoring your service, by using it. If they find it in the red too often, they don’t throw an alert to your on-call teams… they just migrate to your competition.
If you need more guidance, why not contact us to see if we can help.