Using SQL CASE and UNION statements in data analysis

SQL_CASE_view1

The SQL CASE statement

The SQL CASE statement is a powerful conditional expression used to create or modify new columns based on specified conditions. It allows you to add logic to your SQL queries, enabling the transformation and categorization of data within the query itself.

 

Understanding SQL CASE

The CASE statement goes through conditions and returns a value when the first condition is met (like an if-else statement). Once a condition is true, it stops reading and returns the result. If no conditions are true, it returns the value in the ELSE clause. If there is no ELSE part and no conditions are true, it returns NULL.

Syntax: 

CASE
WHEN condition1 THEN result1
WHEN condition2 THEN result2

ELSE resultN
END

Example 1:

SELECT [invoice_no],
[price],
CASE
WHEN [price] > 1000.00 THEN ‘High’
WHEN [price] BETWEEN 500.00 AND 1000.00 THEN ‘Medium’
ELSE ‘Low’
END AS sale_category
FROM [brazecommerce].[dbo].[SalesData]

SQL_CASE_example1

Example 2:

SELECT [EmployeeKey],
[MaritalStatus],
[Gender],
[SickLeaveHours],
CASE
WHEN [SickLeaveHours] <= 10 THEN ‘excellent’
WHEN [SickLeaveHours] <= 20 THEN ‘good’
WHEN [SickLeaveHours] <= 40 THEN ‘fair’
WHEN [SickLeaveHours] <= 60 THEN ‘poor’
ELSE ‘very poor’
END AS health
FROM [AdventureWorksDW2022].[dbo].[DimEmployee]

SQL_CASE_example2

Example 3:

SELECT [EmployeeKey],
[StartDate],
CASE
   WHEN [DepartmentName] IS NULL THEN ‘Unknown’
   ELSE [DepartmentName]
END AS [DepartmentName]
FROM [AdventureWorksDW2022].[dbo].[DimEmployee];
 
SQL_CASE_example3

Example 4:

SELECT [EmployeeKey],
[FirstName],
[DepartmentName],
CASE
WHEN [HireDate] < ‘2004-01-01’ THEN ‘Legacy’
WHEN [HireDate] < ‘2014-01-01’ THEN ‘Veteran’
WHEN [HireDate] < ‘2020-01-01’ THEN ‘Seasoned’
WHEN [HireDate] < ‘2022-01-01’ THEN ‘Established’
WHEN [HireDate] < ‘2024-03-01’ THEN ‘Newcorner’
ELSE ‘Fresh Hire’
END AS employee_status
FROM [AdventureWorksDW2022].[dbo].[DimEmployee];
 
SQL_CASE_example4

The SQL UNION statement

The SQL UNION operator is a powerful tool used in data analysis to combine the results of two or more SELECT statements into a single result set. This is particularly useful when dealing with data from multiple tables or queries that need to be aggregated into a comprehensive view. 

Understanding SQL UNION

The UNION operator merges the results of two or more SELECT queries. Each SELECT statement within the UNION must have the same number of columns in the result sets with similar data types.

Key Characteristics:

  1. Combines Result Sets: UNION combines rows from different queries into a single result set.
  2. Eliminates Duplicates: By default, UNION removes duplicate rows. If you want to include duplicates, you can use UNION ALL.
  3. Column Compatibility: The corresponding columns in each SELECT statement must have compatible data types.
  4. Ordering: The ORDER BY clause can be used at the end of the UNION statement to sort the final result set.

Syntax: 

SELECT column1, column2, …
FROM table1
UNION
SELECT column1, column2, …
FROM table2;

Example 1:

SELECT ‘Country’ AS Country_language, [id], [first_name], [country]
FROM [AW_SSIS_DEV_DB].[dbo].[Sport_club_members]
UNION
SELECT ‘Language’, [id], [first_name], [language]
FROM [AW_SSIS_DEV_DB].[dbo].[Members];

SQL_UNION_sytnax_example2

UNION vs. UNION ALL

The choice between UNION and UNION ALL depends on whether you want to include duplicate rows.

  • UNION: Removes duplicate rows from the combined result set.
  • UNION ALL: Includes all rows from the combined result set, including duplicates. This is faster because it doesn’t require the database to check for and remove duplicates.

Understanding SQL UNION ALL

Syntax: 

SELECT column1, column2, …
FROM table1
UNION ALL
SELECT column1, column2, …
FROM table2;

Example 1:

SELECT ‘Customer’ AS Type, [customer_id], [customer_city]
FROM [brazecommerce].[dbo].[customers]
WHERE [customer_city] = ‘sao paulo’
UNION ALL
SELECT ‘Seller’, [seller_id], [seller_city]
FROM [brazecommerce].[dbo].[sellers]
WHERE [seller_city] = ‘sao paulo’
ORDER BY [customer_city];

SQL_UNION_ALL_sytnax_example3

Performance Considerations

Using UNION and UNION ALL can impact performance, especially with large datasets. Here are some tips to optimize performance:

  • Indexes: Ensure appropriate indexes on columns used in the SELECT statements to speed up query execution.
  • Filter Early: Apply WHERE conditions in individual SELECT statements to reduce the number of rows before applying UNION.
  • UNION ALL: Use UNION ALL when duplicates are not a concern, as it is faster than UNION.

Practical Applications in Data Analysis

  1. Data Consolidation: Combine similar data from different sources or tables into a single result set for comprehensive analysis.
  2. Historical Data Analysis: Merge data from different periods (e.g., monthly, yearly) to perform trend analysis.
  3. Reporting: Generate unified reports from various subsets of data, such as merging sales data from different regions.

Trademark Disclaimer:

All trademarks, logos, and brand names are the property of their respective owners. All company, product, and service names used in this website are for identification purposes only. Use of these names trademarks, and brands do not imply endorsement.

Leave a Reply