SQL joins on NULL and Duplicate values.

SQL joins are a powerful way to combine data from multiple tables. However, what happens when you have null and duplicate data in one of the tables? In this blog post, we will explore how to use SQL joins with null data and duplicate data.

What is Null Data?

Null data is a special value that represents an unknown value. Null data is not the same as zero, and it is not the same as an empty string. Null data simply means that the value is unknown.

How to use SQL joins with Null Data?

When you join two tables that have null data, the null values will not be matched. This means that any rows that contain null values in the join condition will not be included in the result set.

How does duplicate Data affect SQL joins?

When you join two tables that have Duplicate data, The matching rows will be repeated a number of times the data on the other table is duplicated. will see this with the example below.

The Scenario:

let's say we have two tables: table1 and table2. We want to join table1 and table2 On their respective columns id1 and id2.

As you can see we have the duplicate and null values in both the table1 and table2

We will see different types of join in this example.

INNER JOIN:

The INNER JOIN combines rows from both tables where the join condition is satisfied. It returns only the matching rows.

SELECT *
FROM table1 t1 INNER JOIN table2 t2
ON t1.id1 = t2.id2;

Output:

Explanation of the above output:

As you can see from the above picture that each 1's in the table1 get matched with 1's of the table2 . So in total, we have 6 outputs. (3 pink arrows + 3 purple arrows). You see that the null values have not got matched tho there are present in both tables because the null can't be compared with the null value.

LEFT JOIN:

The left join returns all rows from the left table (the table specified before the JOIN keyword) and the matching rows from the right table. If no match is found, NULL values are returned for the columns of the right table.

SELECT *
FROM table1 t1 LEFT JOIN table2 t2
ON t1.id1 = t2.id2;

Output:

Explanation of the above output:

As you can see from the above picture that each 1's in the table1 get matched with 1's of the table2 and remaining rows of the left table. So in total, we have 10 outputs. (3 pink arrows + 3 purple arrows+4 box of table1).

💡

This may seem confusing for programmers, used to comparing NULL values, But in a database NULL != NULL.

RIGHT JOIN:

The right join returns all rows from the right table (the table specified after the JOIN keyword) and the matching rows from the left table. If no match is found, NULL values are returned for the columns of the left table.

SELECT *
FROM table1 t1 RIGHT JOIN table2 t2
ON t1.id1 = t2.id2;

Output:

Explanation of the above output:

As you can see from the above picture that each 1's in the table1 get matched with 1's of the table2 and remaining rows of the right table. So in total, we have 9 outputs. (3 pink arrows + 3 purple arrows+3 box of table2).

Full OUTER JOIN:

The Full Outer Join, also known as Full Join, combines the result sets of both the Left Join and Right Join. It returns all rows from both tables and includes NULL values for non-matching rows. Full Outer Join is helpful when you need to retrieve all records from both tables, regardless of matches.

SELECT *
FROM table1 t1 FULL JOIN table2 t2
ON t1.id1 = t2.id2;

Output:

Explanation of the above output:

As you can see from the above picture that each 1's in the table1 get matched with 1's of the table2 and remaining rows of table1 and table2 . So in total, we have 9 outputs. (3 pink arrows + 3 purple arrows + 4 box of table1 + 3 box of table2 ).

💡

In real-time, it is not ideal to have null or duplicate values on joining columns of a table. This is because null values can make it difficult to join tables, and duplicate values can skew the results of a join.

Handling Null Values

Use the ISNULL() function: The ISNULL() function takes a value as input and returns TRUE if the value is null, or FALSE if the value is not null. You can use the ISNULL() function to check for null values in your queries.
Use the COALESCE() function: The COALESCE() function takes a list of values as input and returns the first non-null value. You can use the COALESCE() function to replace null values with default values.
Use the NULLIF() function: The NULLIF() function takes two values as input and returns NULL if the two values are equal, or the first value if the two values are not equal. You can use the NULLIF() function to compare two values and return null if they are equal.

Handling Duplicate Values

Use the DISTINCT keyword: The DISTINCT keyword tells SQL to return only the unique rows from a result set. You can use the DISTINCT keyword to remove duplicate values from your queries.
Use the GROUP BY clause: The GROUP BY clause tells SQL to group rows together based on a common value. You can use the GROUP BY clause to count the number of duplicate values in a table.
Use the HAVING clause: The HAVING the clause allows you to filter rows from a group by clause. You can use the HAVING clause to filter out rows that contain duplicate values.

Conclusion:

Here are some key takeaways from the blog post:

Null and duplicate data can affect the results of SQL joins.
There are different ways that SQL joins handle null and duplicate data.
There are a number of functions and clauses that can be used to handle null and duplicate data in SQL joins.
It is important to use the correct functions and clauses to handle null and duplicate data correctly.

I hope you found this blog post useful. In the next section, we will see how we can perform the filtering on the JOINed Tables. Make sure to check it.

GitHub link for code

Thank you for reading!

HAPPY ANALYSIS !!!!!!!!!!!