BLOG POSTS

MangoHost Blog / How to Use Nested Queries in SQL

How to Use Nested Queries in SQL

Nested queries, also known as subqueries, are essentially queries within queries that allow you to perform complex data retrieval operations in SQL. They’re incredibly useful when you need to filter, aggregate, or join data based on the results of another query, and mastering them will seriously level up your database skills. In this post, we’ll break down how nested queries work, walk through practical examples, cover performance considerations, and share some battle-tested best practices that’ll help you avoid common pitfalls.

How Nested Queries Work

A nested query is a SELECT statement that’s embedded inside another SQL statement, typically within the WHERE, FROM, or HAVING clauses. The inner query executes first, and its results are used by the outer query to complete the operation. Think of it like calling a function within another function – the inner function needs to return its result before the outer function can proceed.

There are three main types of nested queries:

Scalar subqueries – Return a single value
Row subqueries – Return a single row with multiple columns
Table subqueries – Return multiple rows and columns

Here’s a basic example to illustrate the concept:

SELECT employee_name, salary
FROM employees
WHERE salary > (
    SELECT AVG(salary)
    FROM employees
);

In this case, the inner query calculates the average salary, and the outer query uses that value to find all employees earning above average.

Step-by-Step Implementation Guide

Let’s start with simple examples and work our way up to more complex scenarios. We’ll use a typical e-commerce database with customers, orders, and products tables.

Step 1: Basic WHERE Clause Subquery

Find all customers who have placed orders:

SELECT customer_id, customer_name, email
FROM customers
WHERE customer_id IN (
    SELECT DISTINCT customer_id
    FROM orders
);

Step 2: Correlated Subqueries

These are more advanced – the inner query references columns from the outer query:

SELECT c.customer_name, c.email
FROM customers c
WHERE EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.customer_id
    AND o.order_date >= '2023-01-01'
);

Step 3: Subqueries in FROM Clause

You can treat subquery results as temporary tables:

SELECT avg_order_value.customer_id, avg_order_value.avg_value
FROM (
    SELECT customer_id, AVG(total_amount) as avg_value
    FROM orders
    GROUP BY customer_id
    HAVING AVG(total_amount) > 100
) as avg_order_value;

Step 4: Multiple Level Nesting

Find products that were ordered by customers from specific cities:

SELECT product_name, price
FROM products
WHERE product_id IN (
    SELECT product_id
    FROM order_items
    WHERE order_id IN (
        SELECT order_id
        FROM orders
        WHERE customer_id IN (
            SELECT customer_id
            FROM customers
            WHERE city = 'New York'
        )
    )
);

Real-World Examples and Use Cases

Here are some practical scenarios where nested queries shine:

Finding Top Performers

Get employees whose sales are in the top 10% of their department:

SELECT e.employee_name, e.department, s.sales_amount
FROM employees e
JOIN sales s ON e.employee_id = s.employee_id
WHERE s.sales_amount > (
    SELECT PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY sales_amount)
    FROM sales s2
    JOIN employees e2 ON s2.employee_id = e2.employee_id
    WHERE e2.department = e.department
);

Data Cleanup Operations

Delete duplicate records keeping only the most recent:

DELETE FROM user_logs
WHERE log_id NOT IN (
    SELECT MAX(log_id)
    FROM user_logs
    GROUP BY user_id, DATE(created_at)
);

Dynamic Reporting

Compare current month performance against historical average:

SELECT 
    product_id,
    current_month_sales,
    historical_avg,
    ROUND((current_month_sales - historical_avg) / historical_avg * 100, 2) as growth_percent
FROM (
    SELECT 
        product_id,
        SUM(quantity) as current_month_sales,
        (SELECT AVG(monthly_sales)
         FROM (
             SELECT product_id, SUM(quantity) as monthly_sales
             FROM order_items oi2
             JOIN orders o2 ON oi2.order_id = o2.order_id
             WHERE o2.order_date < DATE_TRUNC('month', CURRENT_DATE)
             GROUP BY product_id, DATE_TRUNC('month', o2.order_date)
         ) historical
         WHERE historical.product_id = oi.product_id
        ) as historical_avg
    FROM order_items oi
    JOIN orders o ON oi.order_id = o.order_id
    WHERE o.order_date >= DATE_TRUNC('month', CURRENT_DATE)
    GROUP BY product_id
) performance_comparison;

Comparison with Alternatives

While nested queries are powerful, they’re not always the best solution. Here’s how they stack up against alternatives:

Approach	Performance	Readability	Maintainability	Best Use Case
Nested Queries	Moderate	Can be complex	Medium	Complex filtering, one-off queries
JOINs	Generally faster	Usually clearer	High	Combining related data
CTEs (Common Table Expressions)	Similar to subqueries	Very clear	High	Complex queries, recursive operations
Window Functions	Excellent	Moderate	Medium	Analytical queries, rankings

For example, this nested query:

SELECT customer_name
FROM customers
WHERE customer_id IN (
    SELECT customer_id
    FROM orders
    WHERE order_date > '2023-01-01'
);

Could be rewritten as a more efficient JOIN:

SELECT DISTINCT c.customer_name
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date > '2023-01-01';

Performance Considerations and Optimization

Nested queries can be performance killers if not used carefully. Here are some key points to remember:

Correlated vs Non-Correlated Subqueries

Non-correlated subqueries execute once and cache the result
Correlated subqueries execute once for each row in the outer query
Always try to convert correlated subqueries to JOINs when possible

Indexing Strategy

Make sure columns used in subquery WHERE clauses are properly indexed:

-- If you frequently filter by customer_id in subqueries
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
CREATE INDEX idx_orders_date_customer ON orders(order_date, customer_id);

Query Execution Plans

Always check your execution plans. In PostgreSQL:

EXPLAIN ANALYZE
SELECT customer_name
FROM customers
WHERE customer_id IN (
    SELECT customer_id
    FROM orders
    WHERE order_date > '2023-01-01'
);

Look for operations like “Nested Loop” or “Seq Scan” that might indicate performance issues.

Best Practices and Common Pitfalls

Do’s:

Use EXISTS instead of IN when checking for existence – it’s often faster
Limit subquery results when possible using TOP/LIMIT
Consider using CTEs for complex nested queries to improve readability
Test performance with realistic data volumes
Use proper indentation and aliasing for maintainability

Don’ts:

Don’t nest more than 3-4 levels deep – it becomes unmaintainable
Avoid using SELECT * in subqueries – be specific about columns needed
Don’t use subqueries in SELECT clauses that return multiple rows
Avoid correlated subqueries in large datasets without proper indexing

Common Gotchas:

NULL values can cause unexpected results with IN/NOT IN:

-- This might not work as expected if subquery returns NULLs
SELECT * FROM customers WHERE customer_id NOT IN (
    SELECT customer_id FROM blacklisted_customers
);

-- Better approach
SELECT * FROM customers WHERE customer_id NOT IN (
    SELECT customer_id FROM blacklisted_customers WHERE customer_id IS NOT NULL
);

Debugging Complex Nested Queries

Break them down and test each part separately:

-- Test the innermost query first
SELECT customer_id FROM customers WHERE city = 'New York';

-- Then the next level
SELECT order_id FROM orders WHERE customer_id IN (1, 2, 3);

-- Finally combine them
SELECT product_id FROM order_items WHERE order_id IN (
    SELECT order_id FROM orders WHERE customer_id IN (
        SELECT customer_id FROM customers WHERE city = 'New York'
    )
);

For more detailed information about SQL subqueries and optimization techniques, check out the PostgreSQL official documentation on subquery expressions and the MySQL subquery documentation.

Nested queries are a powerful tool in your SQL arsenal, but like any powerful tool, they need to be used thoughtfully. Start with simple examples, always consider performance implications, and don’t be afraid to refactor complex nested queries into more readable alternatives like CTEs or JOINs when it makes sense. With practice, you’ll develop an intuition for when nested queries are the right choice and when to reach for other solutions.

This article incorporates information and material from various online sources. We acknowledge and appreciate the work of all original authors, publishers, and websites. While every effort has been made to appropriately credit the source material, any unintentional oversight or omission does not constitute a copyright infringement. All trademarks, logos, and images mentioned are the property of their respective owners. If you believe that any content used in this article infringes upon your copyright, please contact us immediately for review and prompt action.

This article is intended for informational and educational purposes only and does not infringe on the rights of the copyright owners. If any copyrighted material has been used without proper credit or in violation of copyright laws, it is unintentional and we will rectify it promptly upon notification. Please note that the republishing, redistribution, or reproduction of part or all of the contents in any form is prohibited without express written permission from the author and website owner. For permissions or further inquiries, please contact us.