Advanced SQL Queries you must know
Published on: December 24, 2024
> The Top 8 Advanced SQL commands for Data Analysis and Data Engineering!
If you are into Data Analysis for a while now, you must be already aware of the basic commands such as SELECT,INSERT, UPDATE and DELETE etc, though they may be the most used it is also good to know the below queries for some deep diving into data.
1. Window Functions:
Window functions are used for calculations across a set of rows related to the current row. Let’s consider an example where we calculate the running total of sales using the SUM( ) function with the OVER( ) clause.Let us imagine we have a sales data table ‘Sales_Data’ that records the sales amounts on various dates. We want to calculate the running total of sales for each date, which means the total sales up to and including each date.
SELECT date,sales,SUM(sales) OVER (ORDER BY date) AS running_total FROM sales_data;
This query provides a running total of sales over time, making it easy to track cumulative sales growth.
Output:
date |sales |running_total
2023–01–01 | 100 | 100
2023–01–02 | 150 | 250
2023–01–03 | 200 | 450
2023–01–04 | 250 | 700
Window functions can be used for various tasks such as calculating running totals, moving averages, ranks, and more, without collapsing the result set into a single row per group.
2. Common Table Expressions (CTEs):
CTEs give a way to create temporary result sets that can be referenced within a query. They improve
readability and simplify complex queries. Here’s how we can use a CTE to calculate the total revenue for
each product category.
WITH category_revenue AS (
SELECT category, SUM(revenue) AS total_revenue
FROM sales
GROUP BY category
)
SELECT * FROM category_revenue;
The query defines a CTE named ‘category_revenue’. It calculates the total revenue for each category by summing up the revenue from the sales table and grouping the results by the category column.The main query selects all columns from the ‘category_revenue’ CTE, effectively displaying the calculated total revenue for each category.
Output:
category |total_revenue
A |5000
B |7000
C|4500
3. Recursive Queries:
Recursive queries enable travel of hierarchical data structures like organisational charts or bill of materials. Suppose we have a table representing employee relationships, and we want to find all the subordinates of a given manager.
WITH RECURSIVE subordinates AS (
SELECT employee_id, name, manager_id
FROM employees
WHERE manager_id = ‘manager_id_of_interest’
UNION ALL
SELECT e.employee_id, e.name, e.manager_id
FROM employees e
JOIN subordinates s ON e.manager_id = s.employee_id
)
SELECT * FROM subordinates;
This recursive CTE finds all employees who report directly or indirectly to a specific manager ‘manager_id_of_interest’. It starts with employees directly reporting to the manager and then recursively finds their subordinates, building the hierarchy.
Output:
employee_id |name |manager_id
2|Alice |manager_id_of_interest
3|Bob |2
4|Charlie |3
4. Pivot Tables:
Pivot tables transform rows into columns, summarising data in a tabular format. Let’s say we have a table containing sales data, and we want to pivot the data to display total sales for each product in different months.
SELECT product,
SUM(CASE WHEN month = ‘Jan’ THEN sales ELSE 0 END) AS Jan,
SUM(CASE WHEN month = ‘Feb’ THEN sales ELSE 0 END) AS Feb,
SUM(CASE WHEN month = ‘Mar’ THEN sales ELSE 0 END) AS Mar
FROM sales_data
GROUP BY product;
This query aggregates sales data for each product by month using conditional aggregation. It sums the sales for January, February, and March separately for each product, resulting in a table showing total sales per product for these months.
Output:
Product | Jan | Feb | Mar
Product A | 100 | 200 | 150
Product B | 80 | 190 | 220
Product C | 60 | 140 | 130
5.Analytic Functions
Analytic functions compute aggregate values based on a group of rows. For example, we can use the ROW_NUMBER( ) function to assign a unique row number to each record in a dataset.
SELECT customer_id, order_id,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) AS order_rank
FROM orders;
This query assigns a unique rank to each order per customer based on the order date using the ROW_NUMBER( ) window function. The result shows the sequence of orders placed by each customer.
Output:
customer_id | order_id | order_rank
1 | 101 | 1
1 | 102 | 2
2 | 201 | 1
2 | 202 | 2
2 | 203 | 3
6. Conditional Aggregation:
Conditional aggregation involves applying aggregate functions conditionally based on specified criteria. For example, we might want to calculate the average sales amount only for orders placed by repeat customers.
SELECT customer_id,
AVG(CASE WHEN order_count > 1 THEN order_total ELSE NULL END) AS avg_sales_repeat_customers
FROM (
SELECT customer_id, COUNT(*) AS order_count, SUM(order_total) AS order_total
FROM orders
GROUP BY customer_id
) AS customer_orders;
This query calculates the average order total for customers who have placed more than one order. It aggregates the order count and total order amount for each customer, then computes the average for repeat customers.
Output:
customer_id | avg_sales_repeat_customers
1 | 250
2 | 150
3 | 300
7. Merge Statements
Merge statements (also known as UPSERT or ON DUPLICATE KEY UPDATE) enable us to insert, update, or delete records in a target table based on the results of a join with a source table. Let’s say we want to synchronise two tables containing customer data.
MERGE INTO customers_target
USING customers_source s
ON t.customer_id = s.customer_id
WHEN MATCHED THEN
UPDATE SET t.name = s.name, t.email = s.email
WHEN NOT MATCHED THEN
INSERT (customer_id, name, email) VALUES (s.customer_id, s.name, s.email);
customers_target (before merge):
customer_id | name | email
1 | John Doe | john@example.com
2 | Jane Smith | jane@example.com
customers_source
customer_id | name | email
2 | Jane Johnson | jane.j@example.com
3 | Alice Brown | alice@example.com
Output:
customers_target (after merge):
customer_id | name | email
1 | John Doe | john@example.com
2 | Jane Johnson | jane.j@example.com
3 | Alice Brown | alice@example.com
8. Grouping Sets
Grouping sets allow for the aggregation of data at multiple levels of granularity in a single query. Let’s say we want to calculate the total sales revenue by month and year.
SELECT YEAR(order_date) AS year, MONTH(order_date) AS month, SUM(sales_amount) AS total_revenue
FROM sales
GROUP BY GROUPING SETS ((YEAR(order_date), MONTH(order_date)), YEAR(order_date), MONTH(order_date));
Example Data in the sales table:
order_date | sales_amount
2023–01–15 | 1000
2023–01–20 | 1500
2023–02–10 | 2000
2023–03–05 | 2500
2024–01–10 | 3000
2024–01–20 | 3500
2024–02–25 | 4000
Output:
year | month | total_revenue
2023 | 1 | 2500
2023 | 2 | 2000
2023 | 3 | 2500
2024 | 1 | 6500
2024 | 2 | 4000
2023 | NULL | 7000
2024 | NULL | 10500
NULL | 1 | 9000
NULL | 2 | 6000
NULL | 3 | 2500
The query groups sales data by year and month, by year only, and by month only using GROUPING SETS. This results in subtotals for each month of each year, overall totals for each year, and overall totals for each month across all years.
Using these 8 advanced SQL techniques, you can solve complex data problems with ease and accuracy. Whether you’re a data analyst, engineer, or scientist, improving your SQL skills will make you much better at handling data.
Happy Data Analysis!