Observability is about far more than monitoring your network. Ian Tinney outlines why it is key to success in the cloud and why it matters to the CIO.
Observability is synonymous with visibility – giving you an end-to-end unified view of the stack and complete understanding of how your business is performing – and is fast becoming the key to success in the cloud.
As businesses have moved to multiple cloud environments and adopted distributed architectures, it’s become harder and harder to monitor and maintain operational effectiveness. Microservices have made the situation even more complex, with developers working on abstracted cloud infrastructure, leading to monitoring becoming siloed. This fragmentation created gaps so that a way was needed to monitor workflows across the entire stack. Monitoring and, in particular, Application Performance Monitoring (APM) made sense when we scaled horizontally, but with the adoption of microservices, we now scale deep, and this calls for a different way to do things. Observability is that new way.
Observability enables more dynamic monitoring of all events, making the business more responsive, but is a continual exercise rather than an end in itself. It measures how well your legacy and cloud systems are functioning based upon the data they are outputting, but unlike monitoring, it goes beyond mere event checking and the detection of what is broken to focus on what is happening and why.
How it works
Modern cloud environments are dynamic and constantly increasing in scale and complexity, making it much harder to predict problems in advance. Most problems are neither known nor monitored, so you can’t pre-configure a dashboard to look for them. Observability addresses this challenge of the ‘unknown unknowns’ by allowing you to continuously and automatically look for issues and to understand new problems as they arise. In effect, it allows you to answer the questions you didn’t know you needed to ask when you built the system!
Observability collects, explores, alerts, and correlates all telemetry data types across application and infrastructure environments and looks for patterns, trends and behaviours. This means it can help cross-functional teams such as IT and DevOps work together, and Site Reliability Engineers (SREs) understand what’s slow, what’s broken, and what needs to be done to improve performance in distributed systems.
But as well as capturing when things go wrong, observability also captures when things go right, enabling you to innovate or improve the user experience (UX). Consequently, observability can really contribute to the effectiveness of the business, which is why it’s now also top of the CIO’s agenda. By collecting a lot more data on a continual basis, you can improve operational efficiency and gain a competitive edge.
What observability gives you
The value of observability lies in the insights organisations are able to gather via data analysis relating to the problems that are occurring within systems and software. According to the State of Observability 2021 report by Splunk, those who have observability in place are 6.1 times more likely to have accelerated the time to root cause identification, launched 60% more new products and services during the course of the past year, and were 4.5 times more likely to describe their digital transformation initiatives as successful, compared to beginners.
Overall, 48% of those who had established observability said they were completely confident they could meet application availability/performance demands, whereas only 10% of those that had started to use observability thought this.
Yet despite these gains, very few organisations have fully embraced observability. A recent survey of 700 CIO’s found that just 11% have full observability while 13% have observability of the UX of their apps and websites. This seems surprising given that observability confers real operational advantages by allowing the business to:
Troubleshoot and resolve issues faster
Ensure uptime and performance
Accelerate time to market
Gain greater operating efficiency and produce high-quality software at scale
Understand the real-time fluctuations of your digital business performance
Optimise investments
Build a culture of innovation
Observing all of your data means you can fix and improve systems and people-driven functions such as operations, stock management or product pricing, allowing you to understand buyer behaviours, find ways to improve the bottom line, better differentiate your product offering, or gather insights to improve your marketing and sales campaigns.
Barriers to observability
The pandemic has fuelled the need for observability, creating greater complexity and a much wider IT estate across distributed architectures. A recent report shows that 68% of technologists believe they waste time trying to isolate where performance issues are happening, while 66% thought they lacked the tools to determine how technology decisions were impacting the business.
It’s a problem made more acute by the way systems are set up. Each application has its own agent, which consumes resources, but these often have overlapping functionality, so it makes sense to consolidate or reuse them. Systems are also built to send all the data they generate rather than a sample, suppress or aggregate the feed to minimise data throughput, and, if you need access to more granular data to follow-up on an incident, reconfiguration can take hours.
The current situation sees IT, DevOps and SREs caught in a fire fighting stance, which means it will fall to senior business leaders to demonstrate the value and prove the business case for observability. But the problem they have is that, at first sight, observability looks like a costly exercise. It continually generates exponential amounts of data which must be collected using specific tools, then analysed and stored. So the business needs to be able to control this data flow, extract value, and minimise analysis and storage costs.
How to achieve observability
The general consensus is that there are three pillars to achieving observability – event logs, metrics and traces – but there’s since been some conjecture over whether this is too focused on back-end applications and systems. Some now believe this telemetry data should be tempered with the data derived from the user experience to give a more realistic view of performance.
Collecting these events requires more processing and storing of data which equals more expense, requiring us to look for more efficient ways to store and use data.
Observational insights are usually derived from your analytics platform (whether it is Splunk or another similar tool), and you can minimise the costs of this data processing by using an Event Stream Processor (ESP), which can be used to aid observability.
Cribl LogStream is an ESP that operates as an observability pipeline, unifying data processing across all types (metrics, logs and traces), collecting the data required, enriching it, eliminating noise and waste, and delivering that data to any tool in the organisation designed to work with observability data. This means you no longer need to determine what data to send from containers, virtual machines or infrastructure – it simply all gets sent through the pipeline, filtered and forwarded to the right destination.
LogStream unlocks the value of machine data by giving you the power to make choices that best serve your business without the negative trade-offs around licence and storage costs and platform lock-in. It can generate massive savings in overall observability spend, often reducing data volumes by 30% to 50%. Can reuse existing agents to feed other systems, greatly reducing system overheads and saving on infrastructure costs. And it avoids vendor lock-in, allowing you to easily try out new tools without a huge investment in deploying new agents.
To find out how you can use Cribl LogStream as an observability pipeline and to explore the potential cost savings, contact us today for a demo or a free trial.