Slashing data analysis costs by reducing data

Written by Ian Tinney

July 30, 2021

Graphic representing cutting costs for data analysis

Reducing your data flow can generate significant cost savings, as Ian Tinney explains.

Reducing data volumes can cut your data analysis costs to such a degree that your licence practically becomes free. Using the Cribl Logstream Event Stream Processor (ESP) to drop, sample or suppress events can see ingest volumes fall by between 25-50% which translates into massive cost savings. It’s so effective that one of our customers saw a 93% reduction in their licensing costs and an immediate return on investment.

Many data analysis tools today charge by the volume of data you use or the number of events ingested (events per second), or even infrastructure/workload based processing. At the very least, someone will be charging you for the expensive, fast storage that these tools need. And the more you use, the more licences you’ll require and the higher your storage costs will be.

It’s the answers you derive from that data that are important. The more data you send to your analysis platform for indexing, the higher the costs and the longer you’ll take to arrive at those answers. If you can reduce and optimise that data by pre-processing, you can slash analysis costs and get the most out of your licence, protecting your investment.

Reduce to increase gains

Reducing data is a win-win. Streamlining the process results in speedier processing, so you spend less time crunching through the data, and the data flow itself is pared down. This not only makes it easier to manage but also means you can increase your throughput via the analytics platform, effectively optimising your use of the data licence you hold to get more bang for your buck.

Let’s say you hold a licence for 1TB but find yourself bumping up against that threshold. By cutting back on the data being sent to the platform by filtering, parsing and reformatting, you can keep under the threshold. Plus the data itself will be simplified and richer, making it far easier to analyse.

Speedier processing also means you stand to benefit from quicker time to insight, particularly as you’ve identified your parameters further upstream, allowing you to make decisions more rapidly. And filtering the data gives you more control, so you’re not having to battle to keep on top of compliance requirements and don’t have to customise to solve problems.

The concern many have is that they’ll be slimming down their data, they could find it difficult to get it back, to amplify and scrutinise at a later stage. But crucially you don’t need to divest the data. You can subject it to various reduction processes for analysis purposes while keeping a copy of your original data archived on cheaper object storage.

Six ways to save

There are six quick ways to cull data volumes that will instantly save you money:

1. Filtering events – Filter whole events from specific sources and which match prescribed conditions and drop any events that do not have useful data.
2. Filtering fields – Drop any fields based on their name or values (ie value=“null”, “n/a”, or “-“), unwanted headers or footers, or those that are simply empty.
3. Deduplication – Identify duplicate feeds or events, summarise the quantity of similar events and only supply the information once to your analysis platform. This can reduce data ingest by a staggering 93%.
4. Throttling – Sometimes we get a spike of data which needs to be tamed. By throttling the data source, you can protect the destination from becoming overloaded, avoiding bill shock in the analysis platform
5. Convert to metrics – If the events you are collecting contain mostly numeric values you can convert these events to metrics inflight. For example, if each event has a series of header fields and one or more counters but we only require the counters to identify trends, we can turn these events into much smaller metrics in the pipeline. These metrics are far less verbose than raw events.
6. Sampling – If you only need to get a sense of the current situation (ie a statistical analysis) you can sample the data using a representative sample and only send aggregated statistics rather than ingesting all the data.

These processes can be set-up in minutes using the Cribl Logstream solution which routes and parses the data inflight, ensuring the original is sent to storage and the stripped down data to the analysis platform.

The results can be instantaneous. On a recent project, the business wanted to optimise its use of Splunk and process more data but without increasing licensing costs. Having experienced rapid growth, the business was also retaining indexed logs on the platform despite the fact it was only utilising the last 24 hours of data. By dropping the ‘start’ firewall log events which contained no information and retaining the ‘end’ log events, sampling events using specific filters, and trimming Windows log event descriptions, the business reduces its firewall logs by 62.5%. Overall, the measures put in place had saved the business £55,000 on Splunk licensing costs in just an hour.

Using a single tool to reduce your data also confers benefits because you no longer need to use and maintain multiple log forwarders. Consolidating these reduction methods gives you one view of the dataflow, allowing you to ingest, pre-process and forward rich data to the analysis platform of your choice.

To discover how others have saved money by reducing their data, see our Splunk and Cribl Logstream datasheet. Or, to find out the kind of cost savings you can expect to make using your processes, contact us for a one-to-one consultation.

Follow Us