When to use it

Once high-volume log sources start to increase costs dramatically, we recommend moving these logs to a data lake in S3 - and indexing them with Scanner for fast search.

Problem - modern log scale can become unsustainable

Scanner was designed to solve the problem of modern log scale. In our opinion, traditional log management tools and SIEMs like Splunk Enterprise Security become far too expensive once logs reach high volume.

If you are ingesting 100GB of logs day into a log management tool like Splunk, you might be spending on the order of $100k per year. This is somewhat pricey, but not too terrible.

However, as your company starts to grow, it's very easy to reach the point where you are generating 1TB of logs per day. This can cost $1M per year in most SIEM tools. This is extremely painful.

At this scale, teams often split their logs into two categories: low volume log sources, and high volume log sources.

Half of the total ingestion volume comes from only 3-5 high volume log sources, like web application firewall logs, VPC flow logs, CloudTrail logs, Cloudflare DNS and HTTP logs, etc.

These high volume logs tend to be less critical, but they are still incredibly helpful for investigations and detecting threats.

Solution - Move high volume logs to a data lake, and index the data lake with Scanner

Here's what we propose. Teams can continue to ingest their low volume log sources into Splunk, but they should move their high volume logs to a data lake in S3.

They can then use Scanner to index their data lake for fast search from Scanner's UI. Additionally, thanks to Scanner's custom search command in Splunk, their data lake is also searchable directly from Splunk.

Cost improvement

Before Scanner

And now here is how the costs change. Before Scanner, the bill for ingesting 1TB of logs per day into the SIEM is around $1M/year. Low volume logs are responsible for $500k of costs, and high volume logs are also responsible for $500k of costs.

# of log sourcesIngest volumeIngest cost

Low volume log sources in Splunk

25-100 log sources

500GB/day

$500k/year

High volume log sources in Splunk

3-5 log sources

500GB/day

$500k/year

Total

1TB/day

$1M/year

After Scanner

After moving high volume logs to an S3 data lake and indexing them with Scanner, the cost of high volume logs drops down 80% to $100k per year, reducing the overall cost from $1M down to $600k.

Low volume log sources in Splunk

25-100 log sources

500GB/day

$500k/year

High volume log sources in data lake indexed by Scanner

3-5 log sources

500GB/day

$100k/year

Total

1TB/day

$600k/year

By moving high volume logs to a data lake and indexing them with Scanner, overall costs are reduced by 40%, which can free up meaningful budget for other projects.

What are the tradeoffs?

When you move the high volume log sources out of a tool like Splunk and into a data lake in S3 indexed by Scanner, there can be strong cost savings, and your search speed in Scanner will continue to be fast, but there are some practical tradeoffs to consider.

  • You can run queries supported by Scanner's query language, which may not be exactly the same as the queries supported by your prior log tool. For more information about the kinds of queries supported by Scanner's query language, see:

  • For Splunk users, you can query your data lake directly from Splunk via the custom search command. It is a generating command, so it can be used for ad-hoc search, dashboards, saved searches, and correlation searches. However, since only streaming commands can be fed into accelerated data models, you cannot use accelerated data models with your data lake logs. For more information on the custom Splunk command, see:

Last updated