Aggregate, Running Aggregate, Time Series, and Reporting Functions

This section contains information about aggregate functions, running aggregate functions, and time series functions:

Aggregate Functions
Running Aggregate Functions
Time Series Functions

Aggregate Functions

Aggregate functions perform operations on multiple values to create summary results.

Aggregate functions include:

AGGREGATE AT
AVG
AVGDISTINCT
BOTTOMN
COUNT
COUNTDISTINCT
COUNT(*)
MAX
MEDIAN
MIN
NTILE
PERCENTILE
RANK
STDDEV
STDDEV_POP
SUM
SUMDISTINCT
TOPN

AGGREGATE AT

This function aggregates columns based on the level or levels you specify. Using AGGREGATE AT guarantees that the aggregate for the measure always occurs at the levels specified after the keyword AT, regardless of the WHERE clause.

Syntax

AGGREGATE(expr AT level [, level1, levelN])

Where:

expr is any expression that references at least one measure column

level is the level at which you want to aggregate. You can optionally specify multiple levels.

You cannot specify a level from a dimension that contains levels that are being used as the measure level for the measure you specified in the first argument. For example, you cannot write the function as AGGREGATE(yearly_sales AT month) because "month" is from the same time dimension that is being used as the measure level for "yearly_sales."

Example

The following example shows the AGGREGATE AT function and example results:

SELECT month, year, AGGREGATE(sales AT Year)FROM timeseriestestingWHERE year = 1994 AND month = 12

Result:

Month    Year    AGGREGATE AT year12       1994    7396Row count: 1

Because the AGGREGATE AT operator is always executed before the predicates, it always returns the correct total for the time level specified after the keyword AT.

AVG

This function calculates the average (mean) value of an expression in a result set. It must take a numeric expression as its argument.

Note that the denominator of AVG is the number of rows aggregated. For this reason, it is usually a mistake to use AVG(x) in a calculation in Oracle Business Intelligence. Instead, write the expression manually so that you can control both the numerator and denominator (x/y).

Syntax

AVG(numExpr)

Where:

numExpr is any expression that evaluates to a numeric value.

AVGDISTINCT

This function calculates the average (mean) of all distinct values of an expression. It must take a numeric expression as its argument.

Syntax

AVG(DISTINCT numExpr)

Where:

numExpr is any expression that evaluates to a numeric value.

BOTTOMN

This function ranks the lowest n values of the expression argument from 1 to n, 1 corresponding to the lowest numeric value. The BOTTOMN function operates on the values returned in the result set. A request can contain only one BOTTOMN expression.

Syntax

BOTTOMN(numExpr, integer)

Where:

numExpr is any expression that evaluates to a numeric value.

integer is any positive integer. Represents the bottom number of rankings displayed in the result set, 1 being the lowest rank.

COUNT

This function calculates the number of rows having a nonnull value for the expression. The expression is typically a column name, in which case the number of rows with nonnull values for that column is returned.

Syntax:

COUNT(expr)

Where:

expr is any expression.

COUNTDISTINCT

This function adds distinct processing to the COUNT function.

Syntax

COUNT(DISTINCT expr)

Where:

expr is any expression.

COUNT(*)

This function counts the number of rows.

Syntax

COUNT(*)

Example

For example, if a table named Facts contained 200,000,000 rows, the sample request would return the results shown:

SELECT COUNT(*) FROM Facts

Result:

200000000

MAX

This function calculates the maximum value (highest numeric value) of the rows satisfying the numeric expression argument.

Syntax

MAX(numExpr)

Where:

numExpr is any expression that evaluates to a numeric value.

MEDIAN

This function calculates the median (middle) value of the rows satisfying the numeric expression argument. When there are an even number of rows, the median is the mean of the two middle rows. This function always returns a double.

Syntax

MEDIAN(numExpr)

Where:

numExpr is any expression that evaluates to a numeric value.

MIN

This function calculates the minimum value (lowest numeric value) of the rows satisfying the numeric expression argument.

Syntax

MIN(numExpr)

Where:

numExpr is any expression that evaluates to a numeric value.

NTILE

This function determines the rank of a value in terms of a user-specified range. It returns integers to represent any range of ranks. In other words, the resulting sorted data set is broken into several tiles where there are roughly an equal number of values in each tile.

NTile with numTiles = 100 returns what is commonly called the "percentile" (with numbers ranging from 1 to 100, with 100 representing the high end of the sort). This value is different from the results of the Oracle BI PERCENTILE function, which conforms to what is called "percent rank" in SQL 92 and returns values from 0 to 1.

Syntax

NTILE(numExpr, numTiles)

Where:

numExpr is any expression that evaluates to a numeric value.

numTiles is a positive, nonnull integer that represents the number of tiles.

If the numExpr argument is not null, the function returns an integer that represents a rank within the requested range.

PERCENTILE

This function calculates a percent rank for each value satisfying the numeric expression argument. The percentile rank ranges are from 0 (1st percentile) to 1 (100th percentile), inclusive.

The percentile is calculated based on the values in the result set.

Syntax

PERCENTILE(numExpr)

Where:

numExpr is any expression that evaluates to a numeric value.

RANK

This function calculates the rank for each value satisfying the numeric expression argument. The highest number is assigned a rank of 1, and each successive rank is assigned the next consecutive integer (2, 3, 4,...). If certain values are equal, they are assigned the same rank (for example, 1, 1, 1, 4, 5, 5, 7...).

The rank is calculated based on the values in the result set.

Syntax

RANK(numExpr)

Where:

numExpr is any expression that evaluates to a numeric value.

STDDEV

This function returns the standard deviation for a set of values. The return type is always a double. STDEV_SAMP is a synonym for STDDEV.

Syntax

STDDEV([ALL | DISTINCT] numExpr)

Where:

numExpr is any expression that evaluates to a numeric value.

If ALL is specified, the standard deviation is calculated for all data in the set.

If DISTINCT is specified, all duplicates are ignored in the calculation.

If nothing is specified (the default), all data is considered.

STDDEV_POP

This function returns the standard deviation for a set of values using the computational formula for population variance and standard deviation.

Syntax

STDDEV_POP([ALL | DISTINCT] numExpr)

Where:

numExpr is any expression that evaluates to a numeric value.

If ALL is specified, the standard deviation is calculated for all data in the set.

If DISTINCT is specified, all duplicates are ignored in the calculation.

If nothing is specified (the default), all data is considered.

SUM

This function calculates the sum obtained by adding up all values satisfying the numeric expression argument.

Syntax

SUM(numExpr)

Where:

numExpr is any expression that evaluates to a numeric value.

SUMDISTINCT

This function calculates the sum obtained by adding all of the distinct values satisfying the numeric expression argument.

Syntax

SUM(DISTINCT numExpr)

Where:

numExpr is any expression that evaluates to a numeric value.

TOPN

This function ranks the highest n values of the expression argument from 1 to n, 1 corresponding to the highest numeric value. The TOPN function operates on the values returned in the result set. A request can contain only one TOPN expression.

Syntax

TOPN(numExpr, integer)

Where:

numExpr is any expression that evaluates to a numeric value.

integer is any positive integer. Represents the top number of rankings displayed in the result set, 1 being the highest rank.

Running Aggregate Functions

Running aggregate functions are similar to functional aggregates in that they take a set of records as input, but instead of outputting the single aggregate for the entire set of records, they output the aggregate based on records encountered so far.

This section describes the running aggregate functions supported by the Oracle BI Server. Functions include:

MAVG
MSUM
RSUM
RCOUNT
RMAX
RMIN

MAVG

This function calculates a moving average (mean) for the last n rows of data in the result set, inclusive of the current row.

The average for the first row is equal to the numeric expression for the first row. The average for the second row is calculated by taking the average of the first two rows of data. The average for the third row is calculated by taking the average of the first three rows of data, and so on until you reach the nth row, where the average is calculated based on the last n rows of data.

Syntax

MAVG(numExpr, integer)

Where:

numExpr is any expression that evaluates to a numeric value.

integer is any positive integer. Represents the average of the last n rows of data.

MSUM

This function calculates a moving sum for the last n rows of data, inclusive of the current row.

The sum for the first row is equal to the numeric expression for the first row. The sum for the second row is calculated by taking the sum of the first two rows of data. The sum for the third row is calculated by taking the sum of the first three rows of data, and so on. When the nth row is reached, the sum is calculated based on the last n rows of data.

Syntax

MSUM(numExpr, integer)

Where:

numExpr is any expression that evaluates to a numeric value.

integer is any positive integer. Represents the average of the last n rows of data.

Example

This example shows a query that uses the MSUM function, along with example query results.

select month, revenue, MSUM(revenue, 3) as 3_MO_SUM from sales_subject_area

Result:

MONTH    REVENUE    3_MO_SUM
JAN      100.00     100.00
FEB      200.00     300.00
MAR      100.00     400.00
APRIL    100.00     400.00
MAY      300.00     500.00
JUNE     400.00     800.00
JULY     500.00     1200.00
AUG      500.00     1400.00
SEPT     500.00     1500.00
OCT      300.00     1300.00
NOV      200.00     1000.00
DEC      100.00     600.00

RSUM

This function calculates a running sum based on records encountered so far. The sum for the first row is equal to the numeric expression for the first row. The sum for the second row is calculated by taking the sum of the first two rows of data. The sum for the third row is calculated by taking the sum of the first three rows of data, and so on.

Syntax

RSUM(numExpr)

Where:

numExpr is any expression that evaluates to a numeric value.

In Answers, you can also use the following alternate syntax:

RSUM(expression1 [BY expression2[, expression3[, ...]]])

Where:

expression1, expression2, expression3 ... can be any column reference, or an arithmetic expression on column references.

The BY clause causes the RSUM computation to restart at the row where the value of any of the BY columns differs from the previous row.

Example

This example shows a query that uses the RSUM function, along with example query results.

SELECT month, revenue, RSUM(revenue) as RUNNING_SUM from sales_subject_area

Result:

MONTH    REVENUE    RUNNING_SUM
JAN      100.00     100.00
FEB      200.00     300.00
MAR      100.00     400.00
APRIL    100.00     500.00
MAY      300.00     800.00
JUNE     400.00     1200.00
JULY     500.00     1700.00
AUG      500.00     2200.00
SEPT     500.00     2700.00
OCT      300.00     3000.00
NOV      200.00     3200.00
DEC      100.00     3300.00

RCOUNT

This function takes a set of records as input and counts the number of records encountered so far.

Syntax

RCOUNT(expr)

Where:

expr is an expression of any data type.

In Answers, you can also use the following alternate syntax:

RCOUNT(expression1 [BY expression2[, expression3[, ...]]])

Where:

expression1, expression2, expression3 ... can be any column reference, or an arithmetic expression on column references.

The BY clause causes the RCOUNT computation to restart at the row where the value of any of the BY columns differs from the previous row.

Example

This example shows a query that uses the RCOUNT function, along with example query results.

select month, profit, RCOUNT(profit) from sales_subject_area where profit > 200

Result:

MONTH    PROFIT    RCOUNT(profit)
MAY      300.00    2
JUNE     400.00    3
JULY     500.00    4
AUG      500.00    5
SEPT     500.00    6
OCT      300.00    7

RMAX

This function takes a set of records as input and shows the maximum value based on records encountered so far. The specified data type must be one that can be ordered.

Syntax

RMAX(expr)

Where:

expr is an expression of any data type. The data type must be one that has an associated sort order.

In Answers, you can also use the following alternate syntax:

RMAX(expression1 [BY expression2[, expression3[, ...]]])

Where:

expression1, expression2, expression3 ... can be any column reference, or an arithmetic expression on column references.

The BY clause causes the RMAX computation to restart at the row where the value of any of the BY columns differs from the previous row.

Example

This example shows a query that uses the RMAX function, along with example query results.

SELECT month, profit, RMAX(profit) from sales_subject_area

Result:

MONTH    PROFIT    RMAX(profit)
JAN      100.00    100.00
FEB      200.00    200.00
MAR      100.00    200.00
APRIL    100.00    200.00
MAY      300.00    300.00
JUNE     400.00    400.00
JULY     500.00    500.00
AUG      500.00    500.00
SEPT     500.00    500.00
OCT      300.00    500.00
NOV      200.00    500.00
DEC      100.00    500.00

RMIN

This function takes a set of records as input and shows the minimum value based on records encountered so far. The specified data type must be one that can be ordered.

Syntax

RMIN(expr)

Where:

expr is an expression of any data type. The data type must be one that has an associated sort order.

In Answers, you can also use the following alternate syntax:

RMIN(expression1 [BY expression2[, expression3[, ...]]])

Where:

expression1, expression2, expression3 ... can be any column reference, or an arithmetic expression on column references.

The BY clause causes the RMIN computation to restart at the row where the value of any of the BY columns differs from the previous row.

Example

This example shows a query that uses the RMIN function, along with example query results.

select month, profit, RMIN(profit) from sales_subject_area

Result:

MONTH    PROFIT    RMIN(profit)
JAN      400.00    400.00
FEB      200.00    200.00
MAR      100.00    100.00
APRIL    100.00    100.00
MAY      300.00    100.00
JUNE     400.00    100.00
JULY     500.00    100.00
AUG      500.00    100.00
SEPT     500.00    100.00
OCT      300.00    100.00
NOV      200.00    100.00
DEC      100.00    100.00

Time Series Functions

Time series functions operate on time-oriented dimensions. The time series functions calculate AGO, TODATE, and PERIODROLLING functions based on user supplied calendar tables, not on standard SQL date manipulation functions.

To use time series functions on a particular dimension, you have to designate the dimension as a Time dimension and set one or more keys at one or more levels as chronological keys. See Metadata Repository Builder's Guide for Oracle Business Intelligence Enterprise Edition for more information.

Functions include:

AGO
PERIODROLLING
TODATE

AGO

This function is a time series aggregation function that calculates the aggregated value from the current time back to a specified time period. For example, AGO can produce sales for every month of the current quarter and the corresponding quarter-ago sales.

Time series functions operate on members of time dimensions which are at or below the level of the function. Because of this, one or more columns that uniquely identify members at or below the given level must be projected in the query. Alternatively, you can apply a filter to the query that specifies a single member at or below the given level. See "About the AGO Function Level" for more information about the level of the function.

Multiple AGO functions can be nested if all the AGO functions have the same level argument. You can nest exactly one TODATE and multiple AGO functions if they each have the same level argument.

Syntax

AGO(expr, [time_level], offset)

Where:

expr is an expression that references at least one measure column.

time_level is an optional argument that specifies the type of time period, such as quarter, month, or year.

In Answers, specify a presentation level from a presentation hierarchy for time_level.

offset is an integer literal that represents the time shift amount.

Example

The following example returns last year's sales:

SELECT Year_ID, AGO(sales, year, 1)

About the AGO Function Level

It is recommended that you explicitly specify the level of the AGO function using the [time_level] argument.

If you do not explicitly specify the [time_level] argument, the default level is determined as follows:

If the measure used in the expression is a level-based measure in the time dimension (as set in the Administration Tool), then that same level is considered the default AGO level.
Otherwise, the grain of the measure used in the expression, as determined by the BY clause of the measure shown in the logical request, is the default Ago level.

For example, the result of the query:
```
SELECT year, AGO(sales, 1) WHERE quarter=1
```
is the same as:
```
SELECT year, AGO(sales, year_level, 1) WHERE quarter=1
```

You can see the default AGO level for a given query in the Logical Request section of the query log.

PERIODROLLING

This function computes the aggregate of a measure over the period starting x units of time and ending y units of time from the current time. For example, you can use PERIODROLLING to compute sales for a period that starts at a certain quarter before and ends at a certain quarter after the current quarter.

Time series functions operate on members of time dimensions which are at or below the level of the function. Because of this, one or more columns that uniquely identify members at or below the given level must be projected in the query. Alternatively, you can apply a filter to the query that specifies a single member at or below the given level. See "Determining the Level Used by the PERIODROLLING Function" for more information about the level of the function.

You cannot nest AGO and TODATE functions within a PERIODROLLING function. Also, you cannot nest PERIODROLLING, FIRST, FIRST_PERIOD, LAST, and LAST_PERIOD functions.

If you embed other aggregate functions (like RANK, TOPN, PERCENTILE, FILTER, or RSUM) inside PERIODROLLING, the PERIODROLLING function is pushed inward. For example, PERIODROLLING(TOPN(measure)) is executed as TOPN(PERIODROLLING(measure)).

Syntax

PERIODROLLING(measure, x ,y [,hierarchy])

Where:

measure is the name of a measure column.

x is an integer that specifies the offset from the current time. Precede the integer with a minus sign (-) to indicate an offset into the past.

y specifies the number of time units over which the function will compute. To specify the current time, enter 0.

hierarchy is an optional argument that specifies the name of a hierarchy in a time dimension, such as yr, mon, day, that you want to use to compute the time window. This option is useful when there are multiple hierarchies in a time dimension, or when you want to distinguish between multiple time dimensions.

If you want to roll back or forward the maximum possible amount, use the keyword UNBOUND. For example, the function PERIODROLLING (measure, -UNBOUND, 0) sums over the period starting from the beginning of time until now.

You can combine PERIODROLLING and AGGREGATE AT functions to specify the level of the PERIODROLLING function explicitly. For example, if the query level is day but you want to find the sum of the previous and current months, use the following:

SELECT year, month, day, PERIODROLLING(AGGREGATE(sales AT month), -1)

Examples

SELECT Month_ID, PERIODROLLING(monthly_sales, -1, 1)

SELECT Month_ID, PERIODROLLING(monthly_sales, -UNBOUND, 2)

SELECT Month_ID, PERIODROLLING(monthly_sales, -UNBOUND, UNBOUND)

Determining the Level Used by the PERIODROLLING Function

The unit of time (offset) used in the PERIODROLLING function is called the level of the function. This value is determined by the measure level of the measures in its first argument and the query level of the query to which the function belongs. The measure level for the measure can be set in the Administration Tool. If a measure level has been set for the measure used in the function, the measure level is used as the level of the function. The measure level is also called the storage grain of the function.

If a measure level has not been set in the Administration Tool, then the query level is used. The query level is also called the query grain of the function. In the following example, the query level is month, and the PERIODROLLING function computes the sum of the last, current, and next month for each city for the months of March and April:

SELECT year, month, country, city, PERIODROLLING(sales, -1, 1)
WHERE month in ('Mar', 'Apr') AND city = 'New York'

When there are multiple hierarchies in the time dimension, you must specify the hierarchy argument in the PERIODROLLING function. For example:

SELECT year, fiscal_year, month, PERIODROLLING(sales, -1, 1, "fiscal_time_hierarchy")

In this example, the level of the PERIODROLLING function is fiscal_year.

TODATE

This function is a time series aggregation function that aggregates a measure from the beginning of a specified time period to the currently displayed time. For example, this function can calculate Year to Date sales.

Time series functions operate on members of time dimensions which are at or below the level specified in the function. Because of this, one or more columns that uniquely identify members at or below the given level must be projected in the query. Alternatively, you can apply a filter to the query that specifies a single member at or below the given level.

A TODATE function may not be nested within another TODATE function. You can nest exactly one TODATE and multiple AGO functions if they each have the same level argument.

TODATE is different from the TO_DATE SQL function supported by some databases. Do not use TO_DATE to change to a DATE data type. Instead, use the CAST function. See "CAST" for more information.

Syntax

TODATE(expr, time_level)

Where:

expr is an expression that references at least one measure column.

time_level is the type of time period, such as quarter, month, or year.

Example

The following example returns the year-to-month sales:

SELECT Year_ID, Month_ID, TODATE(sales, year)