Architecture

Learn how Scanner works

High-level architecture

Scanner indexes and searches log files stored in your S3 buckets. Compute and storage are decoupled: compute runs in a Scanner instance, and your S3 buckets are used for storage.

A Scanner instance is a dedicated AWS account containing compute resources. The instance reads log files from your S3 buckets and writes Scanner index files to an S3 bucket in your AWS account. You can deploy one Scanner instance per AWS region where you operate.

To allow your Scanner instance to access S3 buckets in your account, you can run our CloudFormation, Terraform, or Pulumi template. The template will do the following:

  • Create a new S3 bucket named scanner-index-files-<id>.

  • Create an IAM role named ScannerRole that has permission to read objects in specified S3 buckets and write Scanner index files to the scanner-index-files-<id> S3 bucket.

When you execute a query, your web browser sends a request to the Scanner instances in all of your regions and renders the aggregated search results.

Indexing

When a new log file is created in your S3 bucket, a bucket notification subscription sends a message to your Scanner instance's SQS queue. Indexers receive the SQS message, read the log file from your S3 bucket, and write a Scanner index file to the scanner-index-files-<id> bucket. Indexers will merge small index files into larger ones over time, which keeps query performance high.

Querying

When you execute a query in your web browser, each of your Scanner instances will invoke Lambda functions to traverse Scanner index files at high speed. The Lambda functions report their results to the Scanner instance's aggregator, which summarizes the results and returns them to your web browser.

Scanner index files

During queries, Lambda functions traverse your Scanner index files at high speed. The scan speed is typically hundreds of gigabytes per second and can reach up to one terabyte per second. The Lambda functions quickly narrow down the search space to the data regions that contain hits, allowing queries to finish rapidly without scanning extra data unnecessarily.

For example, if we are performing a needle-in-haystack search for an IP address that appears 100 times in a 25TB data set, the Scanner Lambda functions will need to scan only a few hundred megabytes. Other S3-based tools like Amazon Athena or CloudWatch take 10-20 minutes to complete this query, but Scanner will complete it in less than 5 seconds because of the way the index files work.

For every 1TB of uncompressed logs indexed, your Scanner instance will write roughly 150GB of Scanner index files into your scanner-index-files-<id> S3 bucket.

Getting started

Once a Scanner instance has been deployed to a region where you operate, you can run our CloudFormation, Terraform, or Pulumi template in each AWS account in the region that contains S3 buckets that you want Scanner to index. See Getting started for more info.

Last updated