The SQL Language seems easy to learn – the commands follow a simple syntax and don’t describe the specific algorithms used to retrieve data. The simplicity might, however, be deceptive.
Not all the database functions operate with the same efficiency. Two very similar queries can vary significantly in terms of the computation time. There are many ways can help improve the query performance.
What I can think about now is:
Understand the Data
Understand the data, its type and how queries are being performed to retrieve the data before making any decision to create an index. If you understand the behavior of data thoroughly, it will help you to decide which column should have either a clustered index or non-clustered index.
Revisit Your Schema Definitions
Last but not least, revisit your schema definitions; keep on eye out that appropriate FOREIGN KEY, NOT NULL and CEHCK constraints are in place or not. Availability of the right constraint on the right place always helps to improve the query performance, like FOREIGN KEY constraint helps to simplify joins by converting some outer or semi-joins to inner joins and CHECK constraint also helps a bit by removing unnecessary or redundant predicates.
Create a Highly Selective Index
Selectivity define the percentage of qualifying rows in the table (qualifying number of rows/total number of rows). If the ratio of the qualifying number of rows to the total number of rows is low, the index is highly selective and is most useful. A non-clustered index is most useful if the ratio is around 5% or less, which means if the index can eliminate 95% of the rows from consideration. If index is returning more than 5% of the rows in a table, it probably will not be used
Drop Unused Indexes
Dropping unused indexes can help to speed up data modifications without affecting data retrieval.
Avoid Multiple Joins in a Single Query
Try to avoid writing a SQL query using multiple joins that includes outer joins, cross apply, outer apply and other complex sub queries. It reduces the choices for Optimizer to decide the join order and join type. Sometime, Optimizer is forced to use nested loop joins, irrespective of the performance consequences for queries with excessively complex cross apply or sub queries.
1. Learn How to Create Indexes Properly
Learning how to index properly is the best thing you can do to improve the performance of your SQL queries. Indexes allow for quicker access to the database in typical situations. Database novices often find indexes mysterious or difficult. They either index nothing or they try to index everything. Of course, neither of these approaches is right. With no indexes at all, your queries are likely to be slow. If you index everything, your updates and insert triggers will be inefficient.
2. Only Retrieve the Data we Really Need
A common way of retrieving the desired columns is to use the * symbol even though not all the columns are really needed:
SELECT * FROM users WHERE age > 20;
If the table is small, retrieving additional columns won’t make much of a difference. For larger datasets, however, specifying the columns might save a lot of computation time:
SELECT id, last_name, sex, age FROM users WHERE age > 20;
Keep in mind, however, that many popular ORM will not let you to create a query which selects only a subset of table columns.
Similarly, if you only need a limited number of rows you should use the LIMIT clause (or your database’s equivalent). Take a look at the following code:
SELECT name, price FROM products;
For instance, if you only want to display the first 10 records out of 50,000 on your website, it is advisable to inform the database about it. This way, the database will stop the search after finding 10 rows rather than scan the whole table:
SELECT name, price FROM products LIMIT 10;
The LIMIT statement is available in MySQL and Postgresql, but other databases have ways to achieve a similar effect.
These above examples illustrate the general idea – you should always think whether you need all the rows returned by an SQL statement. If you don’t, there is always some room for improvement.
4. Consider Getting Rid of Correlated Subqueries
A correlated subquery is a subquery which depends on the outer query. It uses the data obtained from the outer query in its WHERE clause. Suppose you want to list all users who have made a donation. You could retrieve the data with the following code:
SELECT user_id, last_name FROM users WHERE EXISTS (SELECT * FROM donationuser WHERE donationuser.user_id = users.user_id);
In the above case, the subquery runs once for each row of the main query, thus causing possible inefficiency. Instead, we can apply a join:
SELECT DISTINCT users.user_id FROM users INNER JOIN donationuser ON users.user_id = donationuser.user_id;
If there are millions of users in the database, the statement with the correlated subquery will most likely be less efficient than the INNER JOIN because it needs to run millions of times. But if you were to look for donations made by a single user, the correlated subquery might not be a bad idea. As a rule of thumb, if you look for many or most of the rows, try to avoid using correlated subqueries. Keep in mind, however, that using correlated subqueries might be inevitable in some cases.
5. Avoid Wildcard Characters at the Beginning of a LIKE Pattern
Whenever possible, avoid using the LIKE pattern in the following way:
SELECT * FROM users WHERE name LIKE '%bar%';
The use of the % wildcard at the beginning of the LIKE pattern will prevent the database from using a suitable index if such exists. Since the system doesn’t know what the beginning of the name column is, it will have to perform a full table scan anyway. In many cases, this may slow the query execution. If the query can be rewritten in the following way:
SELECT * FRO users WHERE name LIKE 'bar%';
then the performance may be enhanced. You should always consider whether a wildcard character at the beginning is really essential.
but remember to evaluate each situation individually to see which method works best -- there are no ironclad solutions