When writting SQL queries that will scan millions of rows it is important to optimize them. Otherwise, they could turn to be slow and processor consuming. In this post, some techniques that can be applied effectively to improve the performance of SQL queries are explained.
Avoid writing SELECT *
Select only those columns that you are actually going to use. Selecting more columns than necessary raises the time it takes to execute the query and the transfer time if the DB is on a separate server.
When making queries that are going to return many fields, it is better to define all the fields that we want to return in our queries. Using the * operator prevents the use of indexes efficiently.
SELECT id, title FROM Movies;
SELECT * FROM Movies;
For each subquery you add, your SQL management system will perform an additional query for each record of the main query. So, if you have a query that returns 100 results, it will perform an additional 100 queries for each subquery.
SELECT m.id, m.title, g.audience FROM Movies m JOIN Genres g ON g.genre = m.genre;
SELECT m.id, m.title, (SELECT audience FROM Genres WHERE g.genre = m.genre) AS 'Audience';
EXISTS instead of IN
Use EXISTS whenever possible, as this operator stops searching when it finds a match. If this cannot be done, using IN instead of NOT IN is much more efficient, since the latter does a complete scan in the table discarding options to omit.
Avoid using the percent sign (%)
The percentage sign (%) seriously penalizes the performance of the query. Therefore, it is convenient to use it only if strictly necessary.
id, title FROM Movies WHERE title = 'batman';
SELECT id, title FROM Movies WHERE title LIKE ‘%batman%’;
Use CHAR instead of VARCHAR
CHAR data type is up to 50% faster than VARCHAR, although it has two drawbacks:
– Takes up more disk space.
– It has a limit of 255 characters.
Las columnas pequeñas son más rápidas
Small columns are faster. This means that, when declaring the lenght of columns in a table, just declare the needed space. For example, if a column of a table represents the SWIFT code (Society for Worldwide Interbank Financial Telecommunication) of a bank, which is usually 8 characters long, declaring a data type of 20 characters will just penalize your quety when retrieving the information.
Use the EXPLAIN operator
The EXPLAIN operator allows you to obtain information about how the query will be executed, in order to optimize it. For example:
EXPLAIN SELECT m.id, m.title, g.audience FROM Movies m JOIN Genres g ON g.genre = m.genre;
Use the BENCHMARK function
The BENCHMARK function executes a scalar expression N times. It is used to know how fast your SQL database management system executes the expression. The result of the query is always 0 (you will not select anything). The goal of this function is to know the time it takes to execute the query.
SELECT BENCHMARK(100, (SELECT id, title FROM Movies WHERE title = 'batman'));
Do not use the GROUP BY statement
The GROUP BY clause can be used with or without an aggregate function. But if we want to get better performance, do not use the GROUP BY clause without an aggregation function. This is because using DISTINCT produces the same result and is faster.
Avoid using the DISTINCT statement
Using DISTINCT statement to exclude duplicate data is widely used by programmers to avoid database design errors. However, this is a serious mistake. First, you are hiding some duplication of information. And seconds, it is one of the statements that makes most I / O on the disk and forces the processor a lot. For this reason, if it is not completely necessary, avoid using it.
Avoid using ORDER BY statement
Use ORDER BY statement only if it is absolutely essential. That is, if it is possible to do the ordering on the client side it will always be better than doing it from the SQL Server side.
In case it is absolutely necessary to perform the ordering on the side of the SQL Server server, we must pay attention to the following recommendations:
– Keep the number of rows to sort to a minimum.
– Keep the number of columns to sort to a minimum.
– Keep the width (physical size) of the columns to be sorted to a minimum.
– Sort columns with numeric data (NOT character data types).