Enhance Analytics with SQL Window Framing

Window Framing

Window framing is a concept of accessing a range of data while executing window functions. Window frame comes into the picture whenever you do calculations using window functions. It enables you to decide the range of data to perform the window function calculation in a particular partition.

Window Frame

A set of rows where the window function is used for calculation. A Window Frame is used to specify how many rows are around the current row (calculation happens row-wise in SQL). The window function should be used for calculation.

The Window Frame is evaluated separately for each partition.

Syntax: ROWS|RANGE BETWEEN Lower_bound and Upper_Bound

The bounds can be any of the five options.

UNBOUNDED PRECEDING
n PRECEDING
CURRENT ROW
n FOLLOWING
UNBOUNDED FOLLOWING

UNBOUNDED PRECEDING means all the rows before the current row in a particular partition.

n PRECEDING means n rows before the current row in a particular partition.

Current Row includes only the current row, which is evaluated by the window function.

n Following means all the rows after the current rows in a particular partition.

UNBOUNDED FOLLOWING means all the rows after the current rows in a particular partition.

We’ll see some query-based examples on the monthly_car_sales data of a car showroom.

The above data is at the model level of a car, which means the revenue column shows data for each model of a car company in a month from the car showroom.

Note: – We only have data for the year 2021.

If we toned to calculate the total revenue produced by a particular car company for the showroom, then we can use the sum () window function with partition by company, which will give you the total revenue produced by a car company for that showroom in the year 2021.

Code:

select month,

company,

model,

revenue,

sum(revenue) over (partition by company) as total_revenue

from monthly_car_sales

In the above query result, each partition consider a single frame.

We all know that SQL performs row-wise calculations, so when it is at the first row of the partition, then it has access to the whole data of the partition.

That is why the total revenue shows the same value for every row in a partition.

Use Order By subclause in the window function.

Code:

select month,

company,

model,

revenue,

sum(revenue) over (partition by company order by month) as total_revenue_order_by_month

from monthly_car_sales

Query result:

when we include order by month in our window function, it is similar as

sum(revenue) over (partition by company order by month RANGE BETWEEN UNBOUNDED PRECEDING and CURRENT ROW)

Code:

select month,

company,

model,

revenue,

sum(revenue) over (partition by company order by month RANGE BETWEEN UNBOUNDED PRECEDING and CURRENT ROW) as total_revenue_order_by_month

from monthly_car_sales

Query result:

The widow function accesses the rows before the current row in a partition.

So, when it is at the first row in a partition, it has access to the first row only, and when it is at the second row, it has access to the first row and second row.

When execution is at the first row, then it has access to the first row only, so, total_revenue_order_by_month column should show a value as 2,500,000 for the first row, but it is showing 3,510,000 which is a sum of the first and second row.
The reason behind this is RANGE.

RANGE looks value logically which means when execution is at the first row, then it has access to the first row only, but when you see the value for the month column, it has the same value for the first row and second row, so it will have access to both rows when execution happens on the first row.

Now if you see the value of the 5^th row for total_revenue_order_by_month it is equal to the sum of the first five rows because it has access to all the rows before the current row and current row in a partition.

Now let’s see one example using ROWS BETWEEN.

Code:

select month,

company,

model,

revenue,

sum(revenue) over (partition by company order by month ROWS BETWEEN UNBOUNDED PRECEDING and CURRENT ROW) as total_revenue_order_by_month_Rows_between

from monthly_car_sales

Query result:

If you closely observe the output of total_revenue_order_by_month_Rows_between column, it is a cumulative sum of the revenue column.

The ROWS method does not look at the value at the logical level. So, when execution is at the first row, it only has access to the first row despite having the same values in the month column for the second row as the first row.

Aman Verma