Page 1 of 1

Simple SQL

PostPosted: 22 Feb 2013, 08:17
by Ursego
Divide one complicated SQL query into a few simple sub-queries if that doesn't impact performance.

Caution! This tip must be used carefully. It really makes the code more elegant, but you should make sure it doesn't decrease the query execution speed. Especially avoid producing of correlated sub-queries - ones where a value of the sub-query field is compared to a value from the outer query field (in this sitiation the sub-query is executed once per each row of the main query!). So, always check execution plans!

The "divide and conquer" method works well in SQL queries. If you can take a part of a query into a sub-query - do that (unless it can impair performance)! From the readability viewpoint, it's always better to write a number of simple SELECT statements than throw everything into one unreadable heap. Let's compare two the approaches in a simple query, reporting the country name by a city appearing in an address:

*** BAD code (the "all in one heap" approach): ***

Code: Select all
 SELECT @country_name = country.country_name
   FROM country,
         city,
         address
  WHERE country.country_id = city.country_id
    AND city.city_id = address.city_id
    AND address.address_id = @address_id;

*** GOOD code (the "divide and conquer" approach): ***

Code: Select all
 SELECT @country_name = country_name
   FROM country
  WHERE country_id =

        (SELECT country_id
           FROM city
          WHERE city_id =

                (SELECT  city_id
                   FROM address
                  WHERE address_id = @address_id))

In the second method, you definitely know where to begin the investigation if the overall query has not returned the expected result - in the deepest query. OK, it returns address_id, and that converts the next-level query to something like "SELECT city_id FROM address WHERE address_id = 12345". And so on, and so on...

This example is very primitive, so breaking it down into sub-queries doesn't really help us a lot, but SQLs of real applications can be much more complicated.

There is also third method - SELECT each time INTO a variable and pass that variable to the subsequent query as retrieval argument. It is the best practice from the viewpoint of readability:

*** VERY GOOD code (the "divide into simple sub-tasks" approach + using variables): ***

Code: Select all
 SELECT @city_id = city_id
   FROM address
  WHERE address_id = @address_id

 SELECT @country_id = country_id
   FROM city
  WHERE city_id = @city_id -- select by previously populated variable

 SELECT @country_name = country_name
   FROM country
  WHERE country_id = @country_id -- select by previously populated variable

The advantages of this approach:

1. Clarity of presentation - it's easy to see what happens.

2. If the expected result (country name) is not gotten, you can see in the debugger which step exactly fails (in the previous two methods you can see only the final output).

But the last method can be used only in stored procedures. Using it in SQL, embedded in a client application's code, will worsen performance because the application will do two trips (to the database server and back) for each SELECT (it means 6 trips instead of 2 in the given example).