Wednesday, June 10, 2009

Minimize the Search Overhead


You minimize the size of the result set by using the WHERE and HAVING clauses. To increase the efficiency of these clauses, you should formulate them to fit with the database table indexes.

Database Indexes

Indexes speed up data selection from the database. They consist of selected fields of a table, of which a copy is then made in sorted order. If you specify the index fields correctly in a condition in the WHERE or HAVING clause, the system only searches part of the index (index range scan).

The primary index is always created automatically in the R/3 System. It consists of the primary key fields of the database table. This means that for each combination of fields in the index, there is a maximum of one line in the table. This kind of index is also known as UNIQUE.

If you cannot use the primary index to determine the result set because, for example, none of the primary index fields occur in the WHERE or HAVING clause, the system searches through the entire table (full table scan). For this case, you can create secondary indexes, which can restrict the number of table entries searched to form the result set.

You specify the fields of secondary indexes using the ABAP Dictionary. You can also determine whether the index is unique or not. However, you should not create secondary indexes to cover all possible combinations of fields.

Only create one if you select data by fields that are not contained in another index, and the performance is very poor. Furthermore, you should only create secondary indexes for database tables from which you mainly read, since indexes have to be updated each time the database table is changed. As a rule, secondary indexes should not contain more than four fields, and you should not have more than five indexes for a single database table.

If a table has more than five indexes, you run the risk of the optimizer choosing the wrong one for a particular operation. For this reason, you should avoid indexes with overlapping contents.

Secondary indexes should contain columns that you use frequently in a selection, and that are as highly selective as possible. The fewer table entries that can be selected by a certain column, the higher that column’s selectivity. Place the most selective fields at the beginning of the index. Your secondary index should be so selective that each index entry corresponds to at most five percent of the table entries. If this is not the case, it is not worth creating the index. You should also avoid creating indexes for fields that are not always filled, where their value is initial for most entries in the table.

If all of the columns in the SELECT clause are contained in the index, the system does not have to search the actual table data after reading from the index. If you have a SELECT clause with very few columns, you can improve performance dramatically by including these columns in a secondary index.

Formulating Conditions for Indexes

You should bear in mind the following when formulating conditions for the WHERE and HAVING clauses so that the system can use a database index and does not have to use a full table scan.

Check for Equality and Link Using AND

The database index search is particularly efficient if you check all index fields for equality (= or EQ) and link the expressions using AND.

Use Positive Conditions

The database system only supports queries that describe the result in positive terms, for example, EQ or LIKE. It does not support negative expressions like NE or NOT LIKE.

If possible, avoid using the NOT operator in the WHERE clause, because it is not supported by database indexes; invert the logical expression instead.

Using OR

The optimizer usually stops working when an OR expression occurs in the condition. This means that the columns checked using OR are not included in the index search. An exception to this are OR expressions at the outside of conditions. You should try to reformulate conditions that apply OR expressions to columns relevant to the index, for example, into an IN condition.

Using Part of the Index

If you construct an index from several columns, the system can still use it even if you only specify a few of the columns in a condition. However, in this case, the sequence of the columns in the index is important. A column can only be used in the index search if all of the columns before it in the index definition have also been specified in the condition.

Checking for Null Values

The IS NULL condition can cause problems with indexes. Some database systems do not store null values in the index structure. Consequently, this field cannot be used in the index.

Avoid Complex Conditions

Avoid complex conditions, since the statements have to be broken down into their individual components by the database system.

No comments:

Blog Archive