Return to main CLIChart documentation page.
Aggregate is a Python script for extracting aggregate (or summary) data from numeric columns in tabular data, such as CSV files.
A typical use for Aggregate is generating data to show long-term trends, where you already have a number of tabular data files covering shorter periods. For example, we use Aggregate to extract day-by-day transaction rates and total volumes, memory and thread use etc. from daily stats for a number of servers - this allows a quick glance to pick up any anomaly that should be investigated further.
Aggregate can:
- Accept data from stdin or from one or more files
- Accept and output comma-separated or whitespace-separated data
- Output minimum, maximum, average, total, count, first or last value for any numeric column
- Ignore the first line of the file, e.g. when it contains column headings.
The output of the script will be 1 row of data for each input file (or only one row if stdin is used). Output is in the same format as the input, i.e. whitespace or comma-separated.
You use the tool like this:
aggregate.py [options] [inputFile*]
If no input file is specified, reads from stdin. Output is always to stdout.
-c Data is CSV (default is whitespace-separated)
-f First row of file is a header row, and should be skipped
-h Show help (this information) and exit
-l <columns> A comma-separated ordered list of columns and aggregate types
to include in the output (required). The column index is the
0-based number of the column in the file, while the aggregate
type is 'min', 'av', 'max', 'tot', 'cnt', 'first' or 'last'.
Note that the count is just the number of lines in the file,
not counting any header row.
For example, 1:max,2:av,2:max,0:cnt,4:tot
Simple expressions can also be used, e.g. '1:max / 3:cnt / 60'
-p A prefix column value to write the aggregate columns for each
line. Pass 1 or more of these, e.g. to provide a date column
value. Example: -p 12/03/2007
-s If no rows of data were found, silently output nothing.
--nojit Disable Psyco Just-In-Time compiler
Aggregate requires Python - see the installation page
Every column specified with the -l option must be numeric
Expressions can use:
- Any column aggregate, e.g. '3:av'
- Integer and floating point numbers
- Standard mathematical operators: '+', '-', '*', '/', as well as '%' (remainder after integer division)
- Standard Python functions (if you know some Python). However, you won't be able to use any function requiring more than one argument, since the comma is used to separate the column expressions.
Expressions must return a single numericial value.
Using data from the samples directory, we can extract the minimum, average and maximum CPU temperature and fan speed:
$ aggregate.py -cfl 1:min,1:av,1:max,3:min,3:av,3:max samples/SystemTemps.csv 57.80, 64.73, 68.20, 2376, 2448, 2510
If we're only interested in what the values were at the start and end of the day:
$ aggregate.py -cfl 1:first,1:last,3:first,3:last samples/SystemTemps.csv 58.50, 65.40, 2376, 2471
And if we want the date as the first column in the output:
$ aggregate.py -cfl 1:first,1:last,3:first,3:last -p 12/03/2007 samples/SystemTemps.csv 12/03/2007, 58.50, 65.40, 2376, 2471