CONTENT
Title Component
Advanced SQL
Master advanced SQL techniques like EXISTS, CASE, and RECURSIVE with practical examples using Customers and Orders tables.
Saartje Ly
Data Engineering Intern
August 13, 2024
INTRO
Let's say we have a table called Customers and a table called Orders.
EXISTS
EXISTS is a logical operator used to test for the existence of any record in a subquery. It returns TRUE if the subquery returns one or more rows, and FALSE if the subquery returns no rows.
Suppose you want to find all customers who have placed at least one order.
SELECT
c.customer_id,
c.customer_name,
c.country
FROM
Customers c
WHERE
EXISTS (
SELECT 1
FROM Orders o
WHERE o.customer_id = c.customer_id
);
The outer query selects data from the Customers table
The EXISTS clause contains a subquery that checks for the existence of rows in the Orders table where customer_id matches the customer_id in the Customers table.
For each row in Customers, the subquery runs to see if there is a matching order
If a matching order exists, that customer is included in the result.
NOT EXISTS
NOT EXISTS is a logical operator to test for the absence of any records in a subquery. It returns TRUE if the subquery returns no rows, and FALSE if the subquery returns one or more rows.
Suppose you want to find customers from a specific country who have not placed any orders.
SELECT
c.customer_id,
c.customer_name,
c.country
FROM
Customers c
WHERE
c.country = 'USA'
AND NOT EXISTS (
SELECT 1
FROM Orders o
WHERE o.customer_id = c.customer_id
);
This query selects customers from the USA who do not have any corresponding entries in the Orders table. The subquery inside NOT EXISTS checks for the existence of orders with a matching customer_id.
EXCEPT
EXCEPT helps us filter out specific customers based on the difference between two sets.
Suppose you want to find all customers who have placed an order, but only those who are not from the USA.
SELECT
customer_id,
customer_name,
country
FROM
Customers
WHERE
customer_id IN (SELECT DISTINCT customer_id FROM Orders)
EXCEPT
SELECT
customer_id,
customer_name,
country
FROM
Customers
WHERE
country = 'USA';
The first part of the query selects customers who have placed orders. The EXCEPT operator removes those customers who are from the USA from the result.
COMBINING NOT EXISTS WITH EXCEPT
The combination of NOT EXISTS with EXCEPT achieves the "for all" effect.
Suppose we want to find customers from the USA who have placed orders for all available products.
SELECT
customer_name, country
FROM
Customers c
WHERE
country = 'USA'
AND NOT EXISTS (
SELECT product_id
FROM Products p
WHERE product_id = p.product_id
EXCEPT
(SELECT product_id
FROM Orders o
WHERE o.customer_id = c.customer_id)
);
In the outer query we select the customer_name and country from the Customers table for customers in the USA.
SELECT product_id FROM Products p selects all product IDs in the Products table
The EXCEPT operation subtracts the products that the customer has ordered from the complete list of products
If the result of EXCEPT is empty (meaning there are no products left that the customer hasn't ordered), NOT EXISTS returns TRUE indicating that the customer has ordered all available products
RANKING
Suppose we want to rank customers based on the total amount they've spent on orders.
SELECT
c.customer_id,
c.customer_name,
c.country,
SUM(o.amount) AS total_spent,
RANK() OVER (ORDER BY SUM(o.amount) DESC) AS spending_rank
FROM
Customers c
JOIN
Orders o
ON
c.customer_id = o.customer_id
GROUP BY
c.customer_id, c.customer_name, c.country;
This query sums the amount for each customer and ranks them based on their total spending. The RANK function creates a rank based on the total spent.
COMBINING THE WHERE AND HAVING CLAUSES
WHERE filters rows before any grouping or aggregation occurs, and operates on individual rows of the data.
HAVING filters groups after aggregation has occurred, and operates on grouped data (e.g., after GROUP BY)
Suppose we want to find out which countries have more than 3 customers who have placed orders totaling more than $100.
SELECT
country, COUNT(*)
FROM
Customers c
WHERE
c.customer_id IN (
SELECT
o.customer_id
FROM
Orders o
GROUP BY
o.customer_id
HAVING
SUM(o.amount) > 100
)
GROUP BY
country
HAVING
COUNT(*) > 3;
The outer query selects the country and counts the number of customers from that country who meet the criteria defined in the subquery
The WHERE clause filters the Customers table to include only those customers whose total order amount exceeds $100.
GROUP BY country
groups the results by country, allowing us to count the number of customers per country who meet the spending criterion.HAVING COUNT(*) > 3
filters out any countries with 3 or fewer customers who meet the criteria. Only countries with more than 3 such customers will be included in the final result.
CASE AND UPDATE
CASE functions like an "if-then-else" statement, allowing you to apply logic based on the values in your data.
UPDATE
Orders
SET
priority_level = CASE
WHEN amount < 50 THEN 'Low'
WHEN amount >= 50 AND amount < 100 THEN 'Medium'
WHEN amount >= 100 AND amount < 500 THEN 'High'
WHEN amount >= 500 THEN 'Critical'
END;
Updates the Orders table, setting a priority_level column
The CASE statement evaluates the amount column for each order, setting a value for the priority_level
RECURSIVE
RECURSIVE: A SQL query is considered recursive when it references itself in the query definition. This self-reference allows the query to perform repetitive operations until a termination condition is met.
WITH RECURSIVE CustomerSequence AS (
SELECT
customer_id
FROM
Customers
WHERE
customer_id = 1 -- Start with customer_id = 1
UNION
SELECT
customer_id + 1
FROM
CustomerSequence
WHERE
customer_id < (SELECT MAX(customer_id) FROM Customers)
)
SELECT
customer_id
FROM
CustomerSequence;
The base case starts with a specific customer_id, in this case 1.
The recursive part of the query adds 1 to the pervious customer_id value and continues until it reaches the maximum customer_id in the Customers table.
The final SELECT retrieves the sequence of customer IDs generated by the recursive query
CREATE ASSERTION
CREATE ASSERTION specifies a query that selects any tuples that violate the desired condition. Unlike a CHECK constraint which applies to a single table, an assertion can apply to multiple tables.
Suppose you want to enforce a rule that ensures no customer from the Customers table is allowed to place an order in the Orders table if their country is Australia.
CREATE ASSERTION no_australian_orders
CHECK (
NOT EXISTS (
SELECT 1
FROM Customers c
JOIN Orders o
ON c.customer_id = o.customer_id
WHERE c.country = 'Australia'
)
);
NOT EXISTS checks if there are any rows where a customer from Australia has placed an order.
SELECT 1 returns 1 for each row that meets this criteria
If NOT EXISTS returns FALSE (meaning the subquery found one or more rows), the assertion condition is violated.
CREATE TRIGGER
CREATE TRIGGER specifies the type of action to be taken when certain events occur and when certain conditions are satisfied.
Suppose we want to ensure that every time a new order is inserted into the Orders table, the order_status is automatically set to 'Pending' if no status is provided.
CREATE TRIGGER set_default_order_status
BEFORE INSERT ON Orders
FOR EACH ROW
BEGIN
IF NEW.order_status IS NULL THEN
SET NEW.order_status = 'Pending';
END IF;
END;
CREATE VIEW
CREATE VIEW makes a virtual table that is based on a SQL query. It doesn't store data itself but presents data from one or more tables in a specific way. Once the view is created, you can query it like a regular table.
Suppose you want to create a VIEW that shows a summary of each customer's total spending, including their name, country, and the total amount they have spent on all their orders.
CREATE VIEW CustomerSpending AS
SELECT
c.customer_id,
c.customer_name,
c.country,
SUM(o.amount) AS total_spent
FROM
Customers c
LEFT JOIN
Orders o ON c.customer_id = o.customer_id
GROUP BY
c.customer_id,
c.customer_name,
c.country;
INTRO
Let's say we have a table called Customers and a table called Orders.
EXISTS
EXISTS is a logical operator used to test for the existence of any record in a subquery. It returns TRUE if the subquery returns one or more rows, and FALSE if the subquery returns no rows.
Suppose you want to find all customers who have placed at least one order.
SELECT
c.customer_id,
c.customer_name,
c.country
FROM
Customers c
WHERE
EXISTS (
SELECT 1
FROM Orders o
WHERE o.customer_id = c.customer_id
);
The outer query selects data from the Customers table
The EXISTS clause contains a subquery that checks for the existence of rows in the Orders table where customer_id matches the customer_id in the Customers table.
For each row in Customers, the subquery runs to see if there is a matching order
If a matching order exists, that customer is included in the result.
NOT EXISTS
NOT EXISTS is a logical operator to test for the absence of any records in a subquery. It returns TRUE if the subquery returns no rows, and FALSE if the subquery returns one or more rows.
Suppose you want to find customers from a specific country who have not placed any orders.
SELECT
c.customer_id,
c.customer_name,
c.country
FROM
Customers c
WHERE
c.country = 'USA'
AND NOT EXISTS (
SELECT 1
FROM Orders o
WHERE o.customer_id = c.customer_id
);
This query selects customers from the USA who do not have any corresponding entries in the Orders table. The subquery inside NOT EXISTS checks for the existence of orders with a matching customer_id.
EXCEPT
EXCEPT helps us filter out specific customers based on the difference between two sets.
Suppose you want to find all customers who have placed an order, but only those who are not from the USA.
SELECT
customer_id,
customer_name,
country
FROM
Customers
WHERE
customer_id IN (SELECT DISTINCT customer_id FROM Orders)
EXCEPT
SELECT
customer_id,
customer_name,
country
FROM
Customers
WHERE
country = 'USA';
The first part of the query selects customers who have placed orders. The EXCEPT operator removes those customers who are from the USA from the result.
COMBINING NOT EXISTS WITH EXCEPT
The combination of NOT EXISTS with EXCEPT achieves the "for all" effect.
Suppose we want to find customers from the USA who have placed orders for all available products.
SELECT
customer_name, country
FROM
Customers c
WHERE
country = 'USA'
AND NOT EXISTS (
SELECT product_id
FROM Products p
WHERE product_id = p.product_id
EXCEPT
(SELECT product_id
FROM Orders o
WHERE o.customer_id = c.customer_id)
);
In the outer query we select the customer_name and country from the Customers table for customers in the USA.
SELECT product_id FROM Products p selects all product IDs in the Products table
The EXCEPT operation subtracts the products that the customer has ordered from the complete list of products
If the result of EXCEPT is empty (meaning there are no products left that the customer hasn't ordered), NOT EXISTS returns TRUE indicating that the customer has ordered all available products
RANKING
Suppose we want to rank customers based on the total amount they've spent on orders.
SELECT
c.customer_id,
c.customer_name,
c.country,
SUM(o.amount) AS total_spent,
RANK() OVER (ORDER BY SUM(o.amount) DESC) AS spending_rank
FROM
Customers c
JOIN
Orders o
ON
c.customer_id = o.customer_id
GROUP BY
c.customer_id, c.customer_name, c.country;
This query sums the amount for each customer and ranks them based on their total spending. The RANK function creates a rank based on the total spent.
COMBINING THE WHERE AND HAVING CLAUSES
WHERE filters rows before any grouping or aggregation occurs, and operates on individual rows of the data.
HAVING filters groups after aggregation has occurred, and operates on grouped data (e.g., after GROUP BY)
Suppose we want to find out which countries have more than 3 customers who have placed orders totaling more than $100.
SELECT
country, COUNT(*)
FROM
Customers c
WHERE
c.customer_id IN (
SELECT
o.customer_id
FROM
Orders o
GROUP BY
o.customer_id
HAVING
SUM(o.amount) > 100
)
GROUP BY
country
HAVING
COUNT(*) > 3;
The outer query selects the country and counts the number of customers from that country who meet the criteria defined in the subquery
The WHERE clause filters the Customers table to include only those customers whose total order amount exceeds $100.
GROUP BY country
groups the results by country, allowing us to count the number of customers per country who meet the spending criterion.HAVING COUNT(*) > 3
filters out any countries with 3 or fewer customers who meet the criteria. Only countries with more than 3 such customers will be included in the final result.
CASE AND UPDATE
CASE functions like an "if-then-else" statement, allowing you to apply logic based on the values in your data.
UPDATE
Orders
SET
priority_level = CASE
WHEN amount < 50 THEN 'Low'
WHEN amount >= 50 AND amount < 100 THEN 'Medium'
WHEN amount >= 100 AND amount < 500 THEN 'High'
WHEN amount >= 500 THEN 'Critical'
END;
Updates the Orders table, setting a priority_level column
The CASE statement evaluates the amount column for each order, setting a value for the priority_level
RECURSIVE
RECURSIVE: A SQL query is considered recursive when it references itself in the query definition. This self-reference allows the query to perform repetitive operations until a termination condition is met.
WITH RECURSIVE CustomerSequence AS (
SELECT
customer_id
FROM
Customers
WHERE
customer_id = 1 -- Start with customer_id = 1
UNION
SELECT
customer_id + 1
FROM
CustomerSequence
WHERE
customer_id < (SELECT MAX(customer_id) FROM Customers)
)
SELECT
customer_id
FROM
CustomerSequence;
The base case starts with a specific customer_id, in this case 1.
The recursive part of the query adds 1 to the pervious customer_id value and continues until it reaches the maximum customer_id in the Customers table.
The final SELECT retrieves the sequence of customer IDs generated by the recursive query
CREATE ASSERTION
CREATE ASSERTION specifies a query that selects any tuples that violate the desired condition. Unlike a CHECK constraint which applies to a single table, an assertion can apply to multiple tables.
Suppose you want to enforce a rule that ensures no customer from the Customers table is allowed to place an order in the Orders table if their country is Australia.
CREATE ASSERTION no_australian_orders
CHECK (
NOT EXISTS (
SELECT 1
FROM Customers c
JOIN Orders o
ON c.customer_id = o.customer_id
WHERE c.country = 'Australia'
)
);
NOT EXISTS checks if there are any rows where a customer from Australia has placed an order.
SELECT 1 returns 1 for each row that meets this criteria
If NOT EXISTS returns FALSE (meaning the subquery found one or more rows), the assertion condition is violated.
CREATE TRIGGER
CREATE TRIGGER specifies the type of action to be taken when certain events occur and when certain conditions are satisfied.
Suppose we want to ensure that every time a new order is inserted into the Orders table, the order_status is automatically set to 'Pending' if no status is provided.
CREATE TRIGGER set_default_order_status
BEFORE INSERT ON Orders
FOR EACH ROW
BEGIN
IF NEW.order_status IS NULL THEN
SET NEW.order_status = 'Pending';
END IF;
END;
CREATE VIEW
CREATE VIEW makes a virtual table that is based on a SQL query. It doesn't store data itself but presents data from one or more tables in a specific way. Once the view is created, you can query it like a regular table.
Suppose you want to create a VIEW that shows a summary of each customer's total spending, including their name, country, and the total amount they have spent on all their orders.
CREATE VIEW CustomerSpending AS
SELECT
c.customer_id,
c.customer_name,
c.country,
SUM(o.amount) AS total_spent
FROM
Customers c
LEFT JOIN
Orders o ON c.customer_id = o.customer_id
GROUP BY
c.customer_id,
c.customer_name,
c.country;
INTRO
Let's say we have a table called Customers and a table called Orders.
EXISTS
EXISTS is a logical operator used to test for the existence of any record in a subquery. It returns TRUE if the subquery returns one or more rows, and FALSE if the subquery returns no rows.
Suppose you want to find all customers who have placed at least one order.
SELECT
c.customer_id,
c.customer_name,
c.country
FROM
Customers c
WHERE
EXISTS (
SELECT 1
FROM Orders o
WHERE o.customer_id = c.customer_id
);
The outer query selects data from the Customers table
The EXISTS clause contains a subquery that checks for the existence of rows in the Orders table where customer_id matches the customer_id in the Customers table.
For each row in Customers, the subquery runs to see if there is a matching order
If a matching order exists, that customer is included in the result.
NOT EXISTS
NOT EXISTS is a logical operator to test for the absence of any records in a subquery. It returns TRUE if the subquery returns no rows, and FALSE if the subquery returns one or more rows.
Suppose you want to find customers from a specific country who have not placed any orders.
SELECT
c.customer_id,
c.customer_name,
c.country
FROM
Customers c
WHERE
c.country = 'USA'
AND NOT EXISTS (
SELECT 1
FROM Orders o
WHERE o.customer_id = c.customer_id
);
This query selects customers from the USA who do not have any corresponding entries in the Orders table. The subquery inside NOT EXISTS checks for the existence of orders with a matching customer_id.
EXCEPT
EXCEPT helps us filter out specific customers based on the difference between two sets.
Suppose you want to find all customers who have placed an order, but only those who are not from the USA.
SELECT
customer_id,
customer_name,
country
FROM
Customers
WHERE
customer_id IN (SELECT DISTINCT customer_id FROM Orders)
EXCEPT
SELECT
customer_id,
customer_name,
country
FROM
Customers
WHERE
country = 'USA';
The first part of the query selects customers who have placed orders. The EXCEPT operator removes those customers who are from the USA from the result.
COMBINING NOT EXISTS WITH EXCEPT
The combination of NOT EXISTS with EXCEPT achieves the "for all" effect.
Suppose we want to find customers from the USA who have placed orders for all available products.
SELECT
customer_name, country
FROM
Customers c
WHERE
country = 'USA'
AND NOT EXISTS (
SELECT product_id
FROM Products p
WHERE product_id = p.product_id
EXCEPT
(SELECT product_id
FROM Orders o
WHERE o.customer_id = c.customer_id)
);
In the outer query we select the customer_name and country from the Customers table for customers in the USA.
SELECT product_id FROM Products p selects all product IDs in the Products table
The EXCEPT operation subtracts the products that the customer has ordered from the complete list of products
If the result of EXCEPT is empty (meaning there are no products left that the customer hasn't ordered), NOT EXISTS returns TRUE indicating that the customer has ordered all available products
RANKING
Suppose we want to rank customers based on the total amount they've spent on orders.
SELECT
c.customer_id,
c.customer_name,
c.country,
SUM(o.amount) AS total_spent,
RANK() OVER (ORDER BY SUM(o.amount) DESC) AS spending_rank
FROM
Customers c
JOIN
Orders o
ON
c.customer_id = o.customer_id
GROUP BY
c.customer_id, c.customer_name, c.country;
This query sums the amount for each customer and ranks them based on their total spending. The RANK function creates a rank based on the total spent.
COMBINING THE WHERE AND HAVING CLAUSES
WHERE filters rows before any grouping or aggregation occurs, and operates on individual rows of the data.
HAVING filters groups after aggregation has occurred, and operates on grouped data (e.g., after GROUP BY)
Suppose we want to find out which countries have more than 3 customers who have placed orders totaling more than $100.
SELECT
country, COUNT(*)
FROM
Customers c
WHERE
c.customer_id IN (
SELECT
o.customer_id
FROM
Orders o
GROUP BY
o.customer_id
HAVING
SUM(o.amount) > 100
)
GROUP BY
country
HAVING
COUNT(*) > 3;
The outer query selects the country and counts the number of customers from that country who meet the criteria defined in the subquery
The WHERE clause filters the Customers table to include only those customers whose total order amount exceeds $100.
GROUP BY country
groups the results by country, allowing us to count the number of customers per country who meet the spending criterion.HAVING COUNT(*) > 3
filters out any countries with 3 or fewer customers who meet the criteria. Only countries with more than 3 such customers will be included in the final result.
CASE AND UPDATE
CASE functions like an "if-then-else" statement, allowing you to apply logic based on the values in your data.
UPDATE
Orders
SET
priority_level = CASE
WHEN amount < 50 THEN 'Low'
WHEN amount >= 50 AND amount < 100 THEN 'Medium'
WHEN amount >= 100 AND amount < 500 THEN 'High'
WHEN amount >= 500 THEN 'Critical'
END;
Updates the Orders table, setting a priority_level column
The CASE statement evaluates the amount column for each order, setting a value for the priority_level
RECURSIVE
RECURSIVE: A SQL query is considered recursive when it references itself in the query definition. This self-reference allows the query to perform repetitive operations until a termination condition is met.
WITH RECURSIVE CustomerSequence AS (
SELECT
customer_id
FROM
Customers
WHERE
customer_id = 1 -- Start with customer_id = 1
UNION
SELECT
customer_id + 1
FROM
CustomerSequence
WHERE
customer_id < (SELECT MAX(customer_id) FROM Customers)
)
SELECT
customer_id
FROM
CustomerSequence;
The base case starts with a specific customer_id, in this case 1.
The recursive part of the query adds 1 to the pervious customer_id value and continues until it reaches the maximum customer_id in the Customers table.
The final SELECT retrieves the sequence of customer IDs generated by the recursive query
CREATE ASSERTION
CREATE ASSERTION specifies a query that selects any tuples that violate the desired condition. Unlike a CHECK constraint which applies to a single table, an assertion can apply to multiple tables.
Suppose you want to enforce a rule that ensures no customer from the Customers table is allowed to place an order in the Orders table if their country is Australia.
CREATE ASSERTION no_australian_orders
CHECK (
NOT EXISTS (
SELECT 1
FROM Customers c
JOIN Orders o
ON c.customer_id = o.customer_id
WHERE c.country = 'Australia'
)
);
NOT EXISTS checks if there are any rows where a customer from Australia has placed an order.
SELECT 1 returns 1 for each row that meets this criteria
If NOT EXISTS returns FALSE (meaning the subquery found one or more rows), the assertion condition is violated.
CREATE TRIGGER
CREATE TRIGGER specifies the type of action to be taken when certain events occur and when certain conditions are satisfied.
Suppose we want to ensure that every time a new order is inserted into the Orders table, the order_status is automatically set to 'Pending' if no status is provided.
CREATE TRIGGER set_default_order_status
BEFORE INSERT ON Orders
FOR EACH ROW
BEGIN
IF NEW.order_status IS NULL THEN
SET NEW.order_status = 'Pending';
END IF;
END;
CREATE VIEW
CREATE VIEW makes a virtual table that is based on a SQL query. It doesn't store data itself but presents data from one or more tables in a specific way. Once the view is created, you can query it like a regular table.
Suppose you want to create a VIEW that shows a summary of each customer's total spending, including their name, country, and the total amount they have spent on all their orders.
CREATE VIEW CustomerSpending AS
SELECT
c.customer_id,
c.customer_name,
c.country,
SUM(o.amount) AS total_spent
FROM
Customers c
LEFT JOIN
Orders o ON c.customer_id = o.customer_id
GROUP BY
c.customer_id,
c.customer_name,
c.country;
CONTENT
Title Component
SHARE