Advanced SQL

Master advanced SQL techniques like EXISTS, CASE, and RECURSIVE with practical examples using Customers and Orders tables.

Saartje Ly

Data Engineering Intern

August 13, 2024

INTRO

Let's say we have a table called Customers and a table called Orders.



EXISTS

EXISTS is a logical operator used to test for the existence of any record in a subquery. It returns TRUE if the subquery returns one or more rows, and FALSE if the subquery returns no rows.

Suppose you want to find all customers who have placed at least one order.

SELECT 
    c.customer_id,
    c.customer_name,
    c.country
FROM 
    Customers c
WHERE 
    EXISTS (
        SELECT 1
        FROM Orders o
        WHERE o.customer_id = c.customer_id
    );
  • The outer query selects data from the Customers table

  • The EXISTS clause contains a subquery that checks for the existence of rows in the Orders table where customer_id matches the customer_id in the Customers table.

  • For each row in Customers, the subquery runs to see if there is a matching order

  • If a matching order exists, that customer is included in the result.


NOT EXISTS

NOT EXISTS is a logical operator to test for the absence of any records in a subquery. It returns TRUE if the subquery returns no rows, and FALSE if the subquery returns one or more rows.

Suppose you want to find customers from a specific country who have not placed any orders.

SELECT 
    c.customer_id,
    c.customer_name,
    c.country
FROM 
    Customers c
WHERE 
    c.country = 'USA'
    AND NOT EXISTS (
        SELECT 1
        FROM Orders o
        WHERE o.customer_id = c.customer_id
    );
  • This query selects customers from the USA who do not have any corresponding entries in the Orders table. The subquery inside NOT EXISTS checks for the existence of orders with a matching customer_id.


EXCEPT

EXCEPT helps us filter out specific customers based on the difference between two sets.

Suppose you want to find all customers who have placed an order, but only those who are not from the USA.

SELECT 
    customer_id,
    customer_name,
    country
FROM 
    Customers
WHERE 
    customer_id IN (SELECT DISTINCT customer_id FROM Orders)

EXCEPT

SELECT 
    customer_id,
    customer_name,
    country
FROM 
    Customers
WHERE 
    country = 'USA';
  • The first part of the query selects customers who have placed orders. The EXCEPT operator removes those customers who are from the USA from the result.


COMBINING NOT EXISTS WITH EXCEPT

The combination of NOT EXISTS with EXCEPT achieves the "for all" effect.

Suppose we want to find customers from the USA who have placed orders for all available products.

SELECT 
    customer_name, country
FROM 
    Customers c
WHERE 
    country = 'USA'
    AND NOT EXISTS (
        SELECT product_id
        FROM Products p
        WHERE product_id = p.product_id
        EXCEPT
        (SELECT product_id
         FROM Orders o
         WHERE o.customer_id = c.customer_id)
    );
  • In the outer query we select the customer_name and country from the Customers table for customers in the USA.

  • SELECT product_id FROM Products p selects all product IDs in the Products table

  • The EXCEPT operation subtracts the products that the customer has ordered from the complete list of products

  • If the result of EXCEPT is empty (meaning there are no products left that the customer hasn't ordered), NOT EXISTS returns TRUE indicating that the customer has ordered all available products


RANKING

Suppose we want to rank customers based on the total amount they've spent on orders.

SELECT 
    c.customer_id,
    c.customer_name,
    c.country,
    SUM(o.amount) AS total_spent,
    RANK() OVER (ORDER BY SUM(o.amount) DESC) AS spending_rank
FROM 
    Customers c
JOIN 
    Orders o 
ON 
    c.customer_id = o.customer_id
GROUP BY 
    c.customer_id, c.customer_name, c.country;
  • This query sums the amount for each customer and ranks them based on their total spending. The RANK function creates a rank based on the total spent.


COMBINING THE WHERE AND HAVING CLAUSES

WHERE filters rows before any grouping or aggregation occurs, and operates on individual rows of the data.

HAVING filters groups after aggregation has occurred, and operates on grouped data (e.g., after GROUP BY)

Suppose we want to find out which countries have more than 3 customers who have placed orders totaling more than $100.

SELECT 
    country, COUNT(*)
FROM 
    Customers c
WHERE 
    c.customer_id IN (
        SELECT 
            o.customer_id
        FROM 
            Orders o
        GROUP BY 
            o.customer_id
        HAVING 
            SUM(o.amount) > 100
    )
GROUP BY 
    country
HAVING 
    COUNT(*) > 3;
  • The outer query selects the country and counts the number of customers from that country who meet the criteria defined in the subquery

  • The WHERE clause filters the Customers table to include only those customers whose total order amount exceeds $100.

  • GROUP BY country groups the results by country, allowing us to count the number of customers per country who meet the spending criterion.

  • HAVING COUNT(*) > 3 filters out any countries with 3 or fewer customers who meet the criteria. Only countries with more than 3 such customers will be included in the final result.


CASE AND UPDATE

CASE functions like an "if-then-else" statement, allowing you to apply logic based on the values in your data.

UPDATE 
    Orders
SET 
    priority_level = CASE
        WHEN amount < 50 THEN 'Low'
        WHEN amount >= 50 AND amount < 100 THEN 'Medium'
        WHEN amount >= 100 AND amount < 500 THEN 'High'
        WHEN amount >= 500 THEN 'Critical'
    END;

  • Updates the Orders table, setting a priority_level column

  • The CASE statement evaluates the amount column for each order, setting a value for the priority_level


RECURSIVE

RECURSIVE: A SQL query is considered recursive when it references itself in the query definition. This self-reference allows the query to perform repetitive operations until a termination condition is met.



WITH RECURSIVE CustomerSequence AS (
    SELECT 
        customer_id
    FROM 
        Customers
    WHERE 
        customer_id = 1  -- Start with customer_id = 1

    UNION

    SELECT 
        customer_id + 1
    FROM 
        CustomerSequence
    WHERE 
        customer_id < (SELECT MAX(customer_id) FROM Customers)
)

SELECT 
    customer_id
FROM 
    CustomerSequence;
  • The base case starts with a specific customer_id, in this case 1.

  • The recursive part of the query adds 1 to the pervious customer_id value and continues until it reaches the maximum customer_id in the Customers table.

  • The final SELECT retrieves the sequence of customer IDs generated by the recursive query


CREATE ASSERTION

CREATE ASSERTION specifies a query that selects any tuples that violate the desired condition. Unlike a CHECK constraint which applies to a single table, an assertion can apply to multiple tables.

Suppose you want to enforce a rule that ensures no customer from the Customers table is allowed to place an order in the Orders table if their country is Australia.

CREATE ASSERTION no_australian_orders
CHECK (
    NOT EXISTS (
        SELECT 1
        FROM Customers c
        JOIN Orders o
        ON c.customer_id = o.customer_id
        WHERE c.country = 'Australia'
    )
);
  • NOT EXISTS checks if there are any rows where a customer from Australia has placed an order.

  • SELECT 1 returns 1 for each row that meets this criteria

  • If NOT EXISTS returns FALSE (meaning the subquery found one or more rows), the assertion condition is violated.


CREATE TRIGGER

CREATE TRIGGER specifies the type of action to be taken when certain events occur and when certain conditions are satisfied.

Suppose we want to ensure that every time a new order is inserted into the Orders table, the order_status is automatically set to 'Pending' if no status is provided.

CREATE TRIGGER set_default_order_status
BEFORE INSERT ON Orders
FOR EACH ROW
BEGIN
    IF NEW.order_status IS NULL THEN
        SET NEW.order_status = 'Pending';
    END IF;
END;


CREATE VIEW

CREATE VIEW makes a virtual table that is based on a SQL query. It doesn't store data itself but presents data from one or more tables in a specific way. Once the view is created, you can query it like a regular table.

Suppose you want to create a VIEW that shows a summary of each customer's total spending, including their name, country, and the total amount they have spent on all their orders.

CREATE VIEW CustomerSpending AS
SELECT 
    c.customer_id,
    c.customer_name,
    c.country,
    SUM(o.amount) AS total_spent
FROM 
    Customers c
LEFT JOIN 
    Orders o ON c.customer_id = o.customer_id
GROUP BY 
    c.customer_id, 
    c.customer_name, 
    c.country;

INTRO

Let's say we have a table called Customers and a table called Orders.



EXISTS

EXISTS is a logical operator used to test for the existence of any record in a subquery. It returns TRUE if the subquery returns one or more rows, and FALSE if the subquery returns no rows.

Suppose you want to find all customers who have placed at least one order.

SELECT 
    c.customer_id,
    c.customer_name,
    c.country
FROM 
    Customers c
WHERE 
    EXISTS (
        SELECT 1
        FROM Orders o
        WHERE o.customer_id = c.customer_id
    );
  • The outer query selects data from the Customers table

  • The EXISTS clause contains a subquery that checks for the existence of rows in the Orders table where customer_id matches the customer_id in the Customers table.

  • For each row in Customers, the subquery runs to see if there is a matching order

  • If a matching order exists, that customer is included in the result.


NOT EXISTS

NOT EXISTS is a logical operator to test for the absence of any records in a subquery. It returns TRUE if the subquery returns no rows, and FALSE if the subquery returns one or more rows.

Suppose you want to find customers from a specific country who have not placed any orders.

SELECT 
    c.customer_id,
    c.customer_name,
    c.country
FROM 
    Customers c
WHERE 
    c.country = 'USA'
    AND NOT EXISTS (
        SELECT 1
        FROM Orders o
        WHERE o.customer_id = c.customer_id
    );
  • This query selects customers from the USA who do not have any corresponding entries in the Orders table. The subquery inside NOT EXISTS checks for the existence of orders with a matching customer_id.


EXCEPT

EXCEPT helps us filter out specific customers based on the difference between two sets.

Suppose you want to find all customers who have placed an order, but only those who are not from the USA.

SELECT 
    customer_id,
    customer_name,
    country
FROM 
    Customers
WHERE 
    customer_id IN (SELECT DISTINCT customer_id FROM Orders)

EXCEPT

SELECT 
    customer_id,
    customer_name,
    country
FROM 
    Customers
WHERE 
    country = 'USA';
  • The first part of the query selects customers who have placed orders. The EXCEPT operator removes those customers who are from the USA from the result.


COMBINING NOT EXISTS WITH EXCEPT

The combination of NOT EXISTS with EXCEPT achieves the "for all" effect.

Suppose we want to find customers from the USA who have placed orders for all available products.

SELECT 
    customer_name, country
FROM 
    Customers c
WHERE 
    country = 'USA'
    AND NOT EXISTS (
        SELECT product_id
        FROM Products p
        WHERE product_id = p.product_id
        EXCEPT
        (SELECT product_id
         FROM Orders o
         WHERE o.customer_id = c.customer_id)
    );
  • In the outer query we select the customer_name and country from the Customers table for customers in the USA.

  • SELECT product_id FROM Products p selects all product IDs in the Products table

  • The EXCEPT operation subtracts the products that the customer has ordered from the complete list of products

  • If the result of EXCEPT is empty (meaning there are no products left that the customer hasn't ordered), NOT EXISTS returns TRUE indicating that the customer has ordered all available products


RANKING

Suppose we want to rank customers based on the total amount they've spent on orders.

SELECT 
    c.customer_id,
    c.customer_name,
    c.country,
    SUM(o.amount) AS total_spent,
    RANK() OVER (ORDER BY SUM(o.amount) DESC) AS spending_rank
FROM 
    Customers c
JOIN 
    Orders o 
ON 
    c.customer_id = o.customer_id
GROUP BY 
    c.customer_id, c.customer_name, c.country;
  • This query sums the amount for each customer and ranks them based on their total spending. The RANK function creates a rank based on the total spent.


COMBINING THE WHERE AND HAVING CLAUSES

WHERE filters rows before any grouping or aggregation occurs, and operates on individual rows of the data.

HAVING filters groups after aggregation has occurred, and operates on grouped data (e.g., after GROUP BY)

Suppose we want to find out which countries have more than 3 customers who have placed orders totaling more than $100.

SELECT 
    country, COUNT(*)
FROM 
    Customers c
WHERE 
    c.customer_id IN (
        SELECT 
            o.customer_id
        FROM 
            Orders o
        GROUP BY 
            o.customer_id
        HAVING 
            SUM(o.amount) > 100
    )
GROUP BY 
    country
HAVING 
    COUNT(*) > 3;
  • The outer query selects the country and counts the number of customers from that country who meet the criteria defined in the subquery

  • The WHERE clause filters the Customers table to include only those customers whose total order amount exceeds $100.

  • GROUP BY country groups the results by country, allowing us to count the number of customers per country who meet the spending criterion.

  • HAVING COUNT(*) > 3 filters out any countries with 3 or fewer customers who meet the criteria. Only countries with more than 3 such customers will be included in the final result.


CASE AND UPDATE

CASE functions like an "if-then-else" statement, allowing you to apply logic based on the values in your data.

UPDATE 
    Orders
SET 
    priority_level = CASE
        WHEN amount < 50 THEN 'Low'
        WHEN amount >= 50 AND amount < 100 THEN 'Medium'
        WHEN amount >= 100 AND amount < 500 THEN 'High'
        WHEN amount >= 500 THEN 'Critical'
    END;

  • Updates the Orders table, setting a priority_level column

  • The CASE statement evaluates the amount column for each order, setting a value for the priority_level


RECURSIVE

RECURSIVE: A SQL query is considered recursive when it references itself in the query definition. This self-reference allows the query to perform repetitive operations until a termination condition is met.



WITH RECURSIVE CustomerSequence AS (
    SELECT 
        customer_id
    FROM 
        Customers
    WHERE 
        customer_id = 1  -- Start with customer_id = 1

    UNION

    SELECT 
        customer_id + 1
    FROM 
        CustomerSequence
    WHERE 
        customer_id < (SELECT MAX(customer_id) FROM Customers)
)

SELECT 
    customer_id
FROM 
    CustomerSequence;
  • The base case starts with a specific customer_id, in this case 1.

  • The recursive part of the query adds 1 to the pervious customer_id value and continues until it reaches the maximum customer_id in the Customers table.

  • The final SELECT retrieves the sequence of customer IDs generated by the recursive query


CREATE ASSERTION

CREATE ASSERTION specifies a query that selects any tuples that violate the desired condition. Unlike a CHECK constraint which applies to a single table, an assertion can apply to multiple tables.

Suppose you want to enforce a rule that ensures no customer from the Customers table is allowed to place an order in the Orders table if their country is Australia.

CREATE ASSERTION no_australian_orders
CHECK (
    NOT EXISTS (
        SELECT 1
        FROM Customers c
        JOIN Orders o
        ON c.customer_id = o.customer_id
        WHERE c.country = 'Australia'
    )
);
  • NOT EXISTS checks if there are any rows where a customer from Australia has placed an order.

  • SELECT 1 returns 1 for each row that meets this criteria

  • If NOT EXISTS returns FALSE (meaning the subquery found one or more rows), the assertion condition is violated.


CREATE TRIGGER

CREATE TRIGGER specifies the type of action to be taken when certain events occur and when certain conditions are satisfied.

Suppose we want to ensure that every time a new order is inserted into the Orders table, the order_status is automatically set to 'Pending' if no status is provided.

CREATE TRIGGER set_default_order_status
BEFORE INSERT ON Orders
FOR EACH ROW
BEGIN
    IF NEW.order_status IS NULL THEN
        SET NEW.order_status = 'Pending';
    END IF;
END;


CREATE VIEW

CREATE VIEW makes a virtual table that is based on a SQL query. It doesn't store data itself but presents data from one or more tables in a specific way. Once the view is created, you can query it like a regular table.

Suppose you want to create a VIEW that shows a summary of each customer's total spending, including their name, country, and the total amount they have spent on all their orders.

CREATE VIEW CustomerSpending AS
SELECT 
    c.customer_id,
    c.customer_name,
    c.country,
    SUM(o.amount) AS total_spent
FROM 
    Customers c
LEFT JOIN 
    Orders o ON c.customer_id = o.customer_id
GROUP BY 
    c.customer_id, 
    c.customer_name, 
    c.country;

INTRO

Let's say we have a table called Customers and a table called Orders.



EXISTS

EXISTS is a logical operator used to test for the existence of any record in a subquery. It returns TRUE if the subquery returns one or more rows, and FALSE if the subquery returns no rows.

Suppose you want to find all customers who have placed at least one order.

SELECT 
    c.customer_id,
    c.customer_name,
    c.country
FROM 
    Customers c
WHERE 
    EXISTS (
        SELECT 1
        FROM Orders o
        WHERE o.customer_id = c.customer_id
    );
  • The outer query selects data from the Customers table

  • The EXISTS clause contains a subquery that checks for the existence of rows in the Orders table where customer_id matches the customer_id in the Customers table.

  • For each row in Customers, the subquery runs to see if there is a matching order

  • If a matching order exists, that customer is included in the result.


NOT EXISTS

NOT EXISTS is a logical operator to test for the absence of any records in a subquery. It returns TRUE if the subquery returns no rows, and FALSE if the subquery returns one or more rows.

Suppose you want to find customers from a specific country who have not placed any orders.

SELECT 
    c.customer_id,
    c.customer_name,
    c.country
FROM 
    Customers c
WHERE 
    c.country = 'USA'
    AND NOT EXISTS (
        SELECT 1
        FROM Orders o
        WHERE o.customer_id = c.customer_id
    );
  • This query selects customers from the USA who do not have any corresponding entries in the Orders table. The subquery inside NOT EXISTS checks for the existence of orders with a matching customer_id.


EXCEPT

EXCEPT helps us filter out specific customers based on the difference between two sets.

Suppose you want to find all customers who have placed an order, but only those who are not from the USA.

SELECT 
    customer_id,
    customer_name,
    country
FROM 
    Customers
WHERE 
    customer_id IN (SELECT DISTINCT customer_id FROM Orders)

EXCEPT

SELECT 
    customer_id,
    customer_name,
    country
FROM 
    Customers
WHERE 
    country = 'USA';
  • The first part of the query selects customers who have placed orders. The EXCEPT operator removes those customers who are from the USA from the result.


COMBINING NOT EXISTS WITH EXCEPT

The combination of NOT EXISTS with EXCEPT achieves the "for all" effect.

Suppose we want to find customers from the USA who have placed orders for all available products.

SELECT 
    customer_name, country
FROM 
    Customers c
WHERE 
    country = 'USA'
    AND NOT EXISTS (
        SELECT product_id
        FROM Products p
        WHERE product_id = p.product_id
        EXCEPT
        (SELECT product_id
         FROM Orders o
         WHERE o.customer_id = c.customer_id)
    );
  • In the outer query we select the customer_name and country from the Customers table for customers in the USA.

  • SELECT product_id FROM Products p selects all product IDs in the Products table

  • The EXCEPT operation subtracts the products that the customer has ordered from the complete list of products

  • If the result of EXCEPT is empty (meaning there are no products left that the customer hasn't ordered), NOT EXISTS returns TRUE indicating that the customer has ordered all available products


RANKING

Suppose we want to rank customers based on the total amount they've spent on orders.

SELECT 
    c.customer_id,
    c.customer_name,
    c.country,
    SUM(o.amount) AS total_spent,
    RANK() OVER (ORDER BY SUM(o.amount) DESC) AS spending_rank
FROM 
    Customers c
JOIN 
    Orders o 
ON 
    c.customer_id = o.customer_id
GROUP BY 
    c.customer_id, c.customer_name, c.country;
  • This query sums the amount for each customer and ranks them based on their total spending. The RANK function creates a rank based on the total spent.


COMBINING THE WHERE AND HAVING CLAUSES

WHERE filters rows before any grouping or aggregation occurs, and operates on individual rows of the data.

HAVING filters groups after aggregation has occurred, and operates on grouped data (e.g., after GROUP BY)

Suppose we want to find out which countries have more than 3 customers who have placed orders totaling more than $100.

SELECT 
    country, COUNT(*)
FROM 
    Customers c
WHERE 
    c.customer_id IN (
        SELECT 
            o.customer_id
        FROM 
            Orders o
        GROUP BY 
            o.customer_id
        HAVING 
            SUM(o.amount) > 100
    )
GROUP BY 
    country
HAVING 
    COUNT(*) > 3;
  • The outer query selects the country and counts the number of customers from that country who meet the criteria defined in the subquery

  • The WHERE clause filters the Customers table to include only those customers whose total order amount exceeds $100.

  • GROUP BY country groups the results by country, allowing us to count the number of customers per country who meet the spending criterion.

  • HAVING COUNT(*) > 3 filters out any countries with 3 or fewer customers who meet the criteria. Only countries with more than 3 such customers will be included in the final result.


CASE AND UPDATE

CASE functions like an "if-then-else" statement, allowing you to apply logic based on the values in your data.

UPDATE 
    Orders
SET 
    priority_level = CASE
        WHEN amount < 50 THEN 'Low'
        WHEN amount >= 50 AND amount < 100 THEN 'Medium'
        WHEN amount >= 100 AND amount < 500 THEN 'High'
        WHEN amount >= 500 THEN 'Critical'
    END;

  • Updates the Orders table, setting a priority_level column

  • The CASE statement evaluates the amount column for each order, setting a value for the priority_level


RECURSIVE

RECURSIVE: A SQL query is considered recursive when it references itself in the query definition. This self-reference allows the query to perform repetitive operations until a termination condition is met.



WITH RECURSIVE CustomerSequence AS (
    SELECT 
        customer_id
    FROM 
        Customers
    WHERE 
        customer_id = 1  -- Start with customer_id = 1

    UNION

    SELECT 
        customer_id + 1
    FROM 
        CustomerSequence
    WHERE 
        customer_id < (SELECT MAX(customer_id) FROM Customers)
)

SELECT 
    customer_id
FROM 
    CustomerSequence;
  • The base case starts with a specific customer_id, in this case 1.

  • The recursive part of the query adds 1 to the pervious customer_id value and continues until it reaches the maximum customer_id in the Customers table.

  • The final SELECT retrieves the sequence of customer IDs generated by the recursive query


CREATE ASSERTION

CREATE ASSERTION specifies a query that selects any tuples that violate the desired condition. Unlike a CHECK constraint which applies to a single table, an assertion can apply to multiple tables.

Suppose you want to enforce a rule that ensures no customer from the Customers table is allowed to place an order in the Orders table if their country is Australia.

CREATE ASSERTION no_australian_orders
CHECK (
    NOT EXISTS (
        SELECT 1
        FROM Customers c
        JOIN Orders o
        ON c.customer_id = o.customer_id
        WHERE c.country = 'Australia'
    )
);
  • NOT EXISTS checks if there are any rows where a customer from Australia has placed an order.

  • SELECT 1 returns 1 for each row that meets this criteria

  • If NOT EXISTS returns FALSE (meaning the subquery found one or more rows), the assertion condition is violated.


CREATE TRIGGER

CREATE TRIGGER specifies the type of action to be taken when certain events occur and when certain conditions are satisfied.

Suppose we want to ensure that every time a new order is inserted into the Orders table, the order_status is automatically set to 'Pending' if no status is provided.

CREATE TRIGGER set_default_order_status
BEFORE INSERT ON Orders
FOR EACH ROW
BEGIN
    IF NEW.order_status IS NULL THEN
        SET NEW.order_status = 'Pending';
    END IF;
END;


CREATE VIEW

CREATE VIEW makes a virtual table that is based on a SQL query. It doesn't store data itself but presents data from one or more tables in a specific way. Once the view is created, you can query it like a regular table.

Suppose you want to create a VIEW that shows a summary of each customer's total spending, including their name, country, and the total amount they have spent on all their orders.

CREATE VIEW CustomerSpending AS
SELECT 
    c.customer_id,
    c.customer_name,
    c.country,
    SUM(o.amount) AS total_spent
FROM 
    Customers c
LEFT JOIN 
    Orders o ON c.customer_id = o.customer_id
GROUP BY 
    c.customer_id, 
    c.customer_name, 
    c.country;

SHARE