stats()

stats(f [, ...fs]) by col [, ...cols] produces an aggregation table.

stats computes the aggregations specified by f [, ...fs] for each group of col [, ...cols].

Any function that returns a single row and single column can be used in place of f:

  • avg()

  • count()

  • countdistinct()

  • max()

  • min()

  • percentile()

  • sum()

  • var()

Each argument can also be aliased with the as keyword, which will rename it in the output.

In addition, f is optional. stats always calculates the count, e.g. stats by foo will aggregate counts by foo.

by columns are also optional. If no by columns are specified, stats will aggregate over the input datastream. For example, stats max(a), max(b) will calculate the maximum of column a and the maximum of column b.

Technical Notes

  • If there are fewer than 1000 distinct groups, stats() will return those groups, their exact counts, and the statistics for them.

  • If there are >1000 distinct groups, it will return rows for the 1000 most numerous groups.

    • These 1000 groups are selected using a variant of the count-min sketch—specifically the count-mean-min sketch detailed here.

      • For most inputs, this has a p99 error of 0.1%.

    • The actual values returned for each group/row may be exact depending on the aggregation; only the process of selecting these groups is approximate.

Returns

A table with one row for each distinct value of col [, ...cols], with the following columns:

  • One column for each col provided, containing the value of col.

  • @q.count, containing the number of occurrences of that value.

  • One column for each of f [, ...fs], named according to the as field, or, if no as was used, named the same as f.

Examples

# a table with columns named `@q.col`, `@q.count`, `elapsed_ms`, and `max(timestamp)`
* | stats max(elapsed_ms) as elapsed_ms, max(timestamp) by hostname

# get the average, median, and 90th percentile S3 request counts by IAM user
userIdentity.type: "IAMUser" and eventSource: "s3.amazonaws.com"
| stats count() as numReqs, userIdentity.arn by userIdentity.arn
| stats avg(numReqs), percentile(50, numReqs), percentile(90, numReqs)

Last updated