countdistinct()

countdistinct(col [, ...cols]) counts the number of distinct values of the provided column(s) in the input stream. If multiple columns are provided, their values are treated as a tuple.

Technical Notes

  • If there are at most 1000 distinct values in the input stream, it will return the exact value.

  • If there are >1000 distinct values, it will fall back to a variant of the hyperloglog algorithm—specifically the HLL+ algorithm detailed here with a precision of 14.

    • For most inputs, this has an average error of 0.5% and a p95 error of 1.5%.

Returns

A table with one row and one column, called @q.uniq.

Example

Analyze AWS CloudTrail logs to count the number of distinct IAM users making AWS API calls.

%ingest.source_type: "aws:cloudtrail"
and userIdentity.type: "IAMUser"
| countdistinct(userIdentity.arn)

Last updated