Having the ability to route the same data to multiple destinations is important, but only if you can transform it into the format it needs to arrive in, as Ian Tinney explains.

To get the most from your data, you’ll need to draw it from multiple sources and send it to a variety of destinations, often using different data formats. You may also need to alter the data along the way and certainly, you will want to do all of this as fast, as simply and as cost-effectively as possible.

Transforming your data is, therefore, a must to make it intelligible but also to help you reduce the total cost of ownership (TCO), protect sensitive data, meet your compliance requirements, and streamline your data processing.

Lowering TCO

One of the challenges faced by the modern enterprise is monetising its existing investment in legacy systems while moving to a cloud-based architecture. You’ll want to be able to span both worlds by sending the data from your on-premises solutions to cutting-edge cloud applications. This causes issues because these legacy systems often use verbose formats, such as XML, making it difficult to then transfer data to cloud-based systems, which often prefer more efficient formats like JSON or metrics.

What’s needed is some method of transforming the data in real-time and inflight as it passes between the source and destination tools and systems. Using an ESP such as Cribl LogStream, you can take the data generated by your legacy systems and convert it into a protocol that the destination tool is capable of reading. This ensures you can take advantage of emerging cloud technologies and extends the life of your legacy systems, effectively reducing the total cost of ownership (TCO).

Shaping data

There are also other ways to save costs by transforming or shaping your data. You can compress it, for example, to limit its volume. Let’s say you need to write the data in your analytics platform to object storage like S3 or Azure Blob. By compressing it in transit, you can store the data at a fraction of its original size, ensuring you only pay for the storage you need.

In addition to transforming data for efficiency purposes, you may also need to do so to meet a business need or a compliance dictum.

Log data can be highly sensitive so will often need to be encrypted with role-based access privileges applied to determine who can decrypt. In some cases, data may even need to be redacted in real-time. Compliance regulations, such as those relating to data privacy (GDPR, Schrems II, GoBD etc) or financial transactions (PCI), often require personally identifiable information (PII) to be encrypted or obfuscated and to be sent in a certain format.

We recently worked with a large retailer that needed to ensure customer data was dynamically recognised and encrypted. Using the Cribl LogStream solution, we ensured the data was suitably shaped before being sent to their data analysis platform where it could be decrypted by those with the appropriate access privileges.

Simplifying data

It’s also possible to shape data to make it easier for the data analysis platform to assimilate, thereby reducing complexity and saving processing time. For example, Amazon Kinesis Data Firehose can send data to Splunk and in the event of failure, Firehose will send events to a backup S3 bucket.

When this happens, the original event data is placed in a base64-encoded field, a process that Splunk then needs to reverse. That’s horribly complicated. Why not just make all the data accessible to Splunk? So that’s just what we did, using the Cribl LogStream solution to reformat the failed Firehose data back to its original format, inflight before it got to Splunk.

You can also simplify events with a deeply nested JSON structure. Or split an event with an XML, JSON or other multi-line event.

Similarly, it’s also possible to add metadata to your log records, by enriching them with third party data or transform data from your existing sources into multiple schemas.

You can add context to data by adding look-up values like IP location, CMDB, or threat feed info, all achieved in-flight between the sources and destinations of your data.