Traps and Tricks: The Art of Filtering Joined Data in SQL

Traps and Tricks: 
The Art of Filtering Joined Data in SQL

As an SQL developer, you may be well-versed in the power and versatility of JOINs, essential for combining data from multiple tables. While JOINs are indispensable tools, filtering joined data introduces a set of challenges and pitfalls that even seasoned professionals may encounter. Filtering joined data plays a pivotal role in shaping the accuracy and specificity of your SQL queries. It enables you to retrieve precisely the data you need, narrowing down results based on specific criteria. We'll start with a concise overview of JOINs, both INNER JOIN and LEFT JOIN, to lay a solid foundation. Then, we'll delve into the challenges that arise when applying filtering conditions to your JOINed data.

Table Introduction: Instagram Examples

In this table introduction, we present two essential tables from our Instagram examples: the users table and the posts_upload table.

In users table, we store key attributes of users, such as their unique user_id, chosen username, number of followers and following, and an intriguing measure of their online presence - the reputation.

In posts_upload table each entry corresponds to a distinct post and is linked to its creator through the user_id attribute, which establishes a connection with the users table. Alongside the caption, the likes, comments, and creation_date provide vital insights into the post's popularity and the timeline of its creation.

Part 1: The Solid Foundation - INNER JOIN vs. LEFT JOIN

1. INNER JOIN:

  • INNER JOIN combines rows from two tables based on a common column, known as the join condition.

  • It filters out rows that do not have matching values in both tables.

  • The result contains only the rows that meet the matching criteria from both tables.

      SELECT u.*,pu.post_id,pu.creation_date
      FROM users as u JOIN  posts_upload as pu 
      ON u.user_id = pu.user_id;
    

    output:

When we did Inner Join, user_id the present in both the table are matched.

2. LEFT JOIN:

  • LEFT JOIN returns all rows from the left table and matching rows from the right table.

  • When there's no match in the right table, it fills the result with NULL values for the right table columns.

  • This makes LEFT JOIN perfect for keeping all rows from the left table, even if there are no corresponding matches in the right table.

      SELECT u.*,pu.post_id,pu.creation_date
      FROM users as u LEFT JOIN  posts_upload as pu 
      ON u.user_id = pu.user_id;
    

    output:

The user_id 100 has not created any posts yet.

Part 2: Accidental INNER JOIN - Beware the WHERE Clause!

Now that we've established the foundation, let's dive into the hidden intricacies of LEFT JOIN. Brace yourself for the accidental INNER JOIN!

Assume you want to see the users who have not created any posts between the '2023-07-16' and '2023-07-18'

SELECT u.*,pu.post_id,pu.creation_date
FROM users as u LEFT JOIN  posts_upload as pu 
ON u.user_id = pu.user_id
WHERE pu.creation_date NOT  BETWEEN '2023-07-16' AND '2023-07-18';

output:

There is a problem with this query because if you pay attention to the user with the user_id 100 is missing from the above output. yes, the user_id 100 has not yet created any posts which means he has to be in the above output but is not present.

This happened because SQL doesn't evaluate value comparisons to TRUE when one of the values being compared is NULL. Which makes the above query accidentally behave like an INNER JOIN.

💡
Have a look at the INNER join we created in the Part:1 section

Part 3: Preserving LEFT JOIN - Filter Conditions Done Right

It's time to correct our mistakes and discover the art of preserving the LEFT JOIN behavior while still applying filtering conditions.

1. Filter Conditions in the JOIN:

  • To preserve the LEFT JOIN and filter rows in the right table, move the condition from the WHERE clause to the JOIN condition.

  • This ensures the filtering occurs before the LEFT JOIN, retaining NULL values in the result.

      SELECT u.*,pu.post_id,pu.creation_date
      FROM users as u LEFT JOIN  posts_upload as pu 
      ON u.user_id = pu.user_id
      AND pu.creation_date NOT  BETWEEN '2023-07-16' AND '2023-07-18';
    

    output:

    Now have a look at the user with user_id 100, who is present in the above output and has not created any posts between the '2023-07-16' and '2023-07-18'.

2. Add NULL filter in the WHERE clause:

  • We can add a second condition that checks for rows that pu.user_id is NULL. This condition ensures that any users without corresponding posts in the posts_upload table are included in the result set.

      SELECT u.*,pu.post_id,pu.creation_date
      FROM users as u LEFT JOIN  posts_upload as pu 
      ON u.user_id = pu.user_id
      WHERE (pu.creation_date NOT  BETWEEN '2023-07-16' AND '2023-07-18')
      OR (pu.user_id IS NULL) ;
    

    output:

    The result will include all users, along with their corresponding posts except for those uploaded between '2023-07-16' and '2023-07-18'. Users without any posts (with NULL user_id in posts_upload) will also be included.

    Conclusion:

    In this blog post, we explored the challenges and pitfalls of filtering joined data, and we learned two crucial techniques to preserve the desired LEFT JOIN behavior.

    First, by moving the filtering conditions to the JOIN clause, we ensure that filtering occurs before the JOIN, avoiding accidental exclusions of rows. Second, adding a NULL filter in the WHERE clause enables us to include rows without matching data in the right table.

    Mastering these approaches empowers SQL developers to retrieve accurate and meaningful results, enhancing the precision and reliability of data analysis and reporting. By understanding the art of filtering joined data, we can make the most of SQL's power and versatility while avoiding common traps and pitfalls.

    I hope you found this blog post useful. In the next section, we will see the concept of Granularity in SQL. Make sure to check it.

    GitHub link for code

    Thank you for reading!

    HAPPY ANALYSIS !!!!!!!!!!!