Window framing in SQL window function.

Window framing in SQL window function.

what are window functions in SQL?

Window functions, also known as windowing or analytical functions, are a powerful feature in SQL that allows you to perform calculations across a "window" of rows related to the current row. These functions can help you gain deeper insights into your data by partitioning, sorting, and analyzing it in various ways without the need for complex subqueries or self-joins.

Unlike regular aggregate functions that collapse multiple rows into a single result, window functions generate new values for each row based on the data within the defined window.

Different types of window functions.

In SQL, there are several types of window functions that you can use to perform various calculations and analyses across a window of rows. We will categorize the window function into three types based on its functionality.

  1. Ranking Functions

  2. Aggregate Functions

  3. Analytical Functions

Before we divide into different types of window functions we will understand the basic terminology that are important to consider with window function like

1) what is a window frame?

2) how to create a window frame?

3) how to sort the table to have a specific outcome of the window function?

4) what happens if I don't provide any window specification to the function?

And many more. Let's start.

What is a window?

In the context of window functions, a "window" is like a virtual frame or group of rows that the function looks at to perform calculations. It's like looking through a specific "window" of data.

Imagine you have a dataset with rows of information, and you want to do some calculations on each row, but not just based on that row alone. Instead, you want to take into account some neighboring rows as well.

To understand the window and window function let's take the below example of sales Table.

Have a look at the above table, assume you want to see the order history customer-wise, for that you would order the entire table by customer_id a column with would look like below.

As you can see the entire table is ordered on customer_id column and we have a set of rows that are associated with each customer, framed as a window. you may think that this is the same as that of grouping data with the group by clause but there exist differences in creating the windows/groups which will cover in "Differences between window function and group by" in a later section of the blog.

How to create the Window frame in Windows function?

Now that we have understood what a window frame is we will see how to create windows frames. But before we understand how Windows functions are created we need to learn about the function over() that comes in conjunction with all the window functions.

Let's have a look at the syntax of any window function:

The above picture gives the generic syntax of window functions along with some examples of the window function which will study in upcoming topics. As you can observe every window function has two main components

  1. FUNC(): This is the actual calculation or operation that you want to perform on the data. It can be any standard SQL function like SUM, AVG, RANK, ROW_NUMBER, LEAD, LAG, and more. The window function generates a value for each row based on the data within the window.

  2. OVER() Clause: The OVER() clause is where you define the window or subset of rows that the window function will operate on. It allows you to specify the window's boundaries and control how the function should partition and order the rows.

Let's go in-depth about over() the clause to understand how the window frame is created.

Working of the OVER() Function: The OVER() function in SQL defines the window or subset of rows that a window function operates on. When you use a window function with the OVER() clause, it means the function will perform calculations on a specific set of rows rather than the entire table. The window is determined based on the clauses specified within the OVER() function.

The over() function can include two main parameters: PARTITION BY and ORDER BY, but both are optional.

💡
Certain window functions, such as RANK(), DENSE_RANK(), LEAD(), LAG(), FIRST_VALUE(), and LAST_VALUE(), among others, heavily rely on the ORDER BY clause to determine the order of rows within the window. we will explore each window function to see which are dependent and which are not.

OVER() Function with PARTITION BY: When you include the PARTITION BY clause in the OVER() function, you are dividing the result set into partitions or groups based on the values of one or more columns. Each partition acts as a separate window, and the window function's calculations are performed independently within each window.

OVER() Function with PARTITION BY and ORDER BY: When you include both PARTITION BY and ORDER BY clauses in the OVER() function, the result set is partitioned based on the specified columns, and within each partition, the rows are sorted according to the ORDER BY criteria.

OVER() Function with only ORDER BY (No PARTITION BY): When you include only the ORDER BY clause in the OVER() function (without PARTITION BY), the window function operates on the entire table set as a single window. The rows are sorted based on the specified ORDER BY criteria and the function performs calculations on the ordered rows.

Let's understand the function ROW_NUMBER() in detail to understand how the window frame is created and how the partition by and order by works.

Let's say you want to have a look at the customer orders based on the order_date column. which means you want to partition the entire table based on customer_id, then order each partition by order_date column.

SELECT *,
ROW_NUMBER() OVER(PARTITION BY customer_id order by order_date) AS row_no
FROM sales

output:

Let's break down the window function in detail,

a. ROW_NUMBER(): Is a window function that assigns a unique sequential integer to each row within the specified window. In this case, it calculates the row number for each row within the partition defined by PARTITION BY (explained in the next step).

b. OVER: The OVER() clause defines the window within which the ROW_NUMBER() function operates. In this query, it includes both PARTITION BY and ORDER BY clauses.

c. PARTITION BY customer_id: The PARTITION BY clause divides the table into partitions or groups based on the distinct values in the customer_id column. Each partition acts as a separate window for the ROW_NUMBER() function. Rows with the same customer_id will be assigned row numbers starting from 1 within their partition.

d. ORDER BY order_date: The ORDER BY clause specifies the sorting order of rows within each partition. The rows are ordered based on the values in the order_date column. The ROW_NUMBER() function assigns row numbers based on this ordering within each partition.

Have a look at the customer 104,105,106 and 107 since there is only a single row in the entire window the row_no assigned is 1.

what happens when we do not have partition by clause?

SELECT *,
ROW_NUMBER() OVER(order by order_date) AS row_no
FROM sales

Output:

As you can see when partition by clause is not present the entire table is considered as a single frame and the ordering of the table is done based on the order_date column and row_no are assigned uniquely to each row.

💡
Keep in mind that some function has dependent on order by clause while some functions are not. The ROW_NUMBER() is one such function that is dependent on order by clause.

Different types of window functions in each category :

In SQL, there are several types of window functions that you can use to perform various calculations and analyses across a window of rows. Some of the commonly used window functions include:

  1. Ranking Functions:

    • ROW_NUMBER(): Assigns a unique sequential integer to each row within the window.

    • RANK(): Assigns a rank to each row within the window based on the ORDER BY clause, with ties receiving the same rank, and leaving gaps in the ranking sequence.

    • DENSE_RANK(): Similar to RANK(), but it does not leave gaps in the ranking sequence for tied rows.

  2. Aggregate Functions:

    • SUM(): Calculates the sum of a specified column over the window.

    • AVG(): Calculates the average of a specified column over the window.

    • MIN(): Returns the minimum value of a specified column within the window.

    • MAX(): Returns the maximum value of a specified column within the window.

    • COUNT(): Counts the number of rows in the window.

  3. Analytical Functions:

    • LEAD(): Retrieves the value from a specified offset after the current row within the window.

    • LAG(): Retrieves the value from a specified offset before the current row within the window.

    • FIRST_VALUE(): Returns the first value in the window.

    • LAST_VALUE(): Returns the last value in the window.

    • NTILE(): Divides the result set into a specified number of "tiles" or buckets.

  4. Other Window Functions:

    • PERCENT_RANK(): Calculates the relative rank of a row as a percentage within the window.

    • CUME_DIST(): Calculates the cumulative distribution of a value within the window.

    • NTH_VALUE(): Returns the value of the expression in the Nth position of the window.

Need not worry will cover each type of function in an upcoming blog post.

If you had paid attention you will observe that Aggregate functions are also available with GROUP BY clause.

The upcoming blog cover understanding the difference between using an aggregate function with that window function and that of GROUP BY clause.

I hope you guys understood the basic building block of the window function.

Find a GitHub repository for creating the table.

THANK YOU FOR READING !!!!!

HAPPY ANALYSIS.