Tech Tip: Impact of Indexing
PRODUCT: 4D | VERSION: 16 | PLATFORM: Mac & Win
Published On: January 18, 2018
Indexing a field can have a large impact on the speed of an application to pull data from a table, and should always be considered when developing a structure.
To index a field, simply right click on the field in the structure editor and select the Field Properties option to open up the menu. Default for a field is "None", but there are three index options: Automatic, B-Tree, Cluster B-Tree.
In a sample structure with one table ([Table_1]) and two fields ([Table_1]ID, [Table_1]item_name) the affect of indexing is clear when comparing the time to pull distinct values on the [Table_1]item_name with the field non-indexed versus indexed.
Test Code:
ALL RECORDS([Table_1]) ARRAY TEXT($values;0) C_TEXT($time_start;$time_end) $time_start:=String(Current time) DISTINCT VALUES([Table_1]item_name;$values) $time_end:=String(Current time) |
Pulling values when NOT indexed (~19 seconds):
Pulling values when indexed (Automatic, ~3 seconds):
As can be seen in the above screenshots, when the field is indexed the search takes only 3 seconds versus 19 seconds when the field is non-indexed, about 16% of the non-indexed runtime. This is with only 2 million records as well, and as the size of the table grows the affect of indexing becomes even more apparent!
The above example shows "Automatic" selected for the index type, however some consideration should be taken in developing an application with which is the best option. To think about this, something called the "Selectivity" of the field should be considered, which is defined by a simple equation: Selectivity = Distinct Values / Total Number Rows
B-Tree:
If the selectivity is close to 1, the "B-Tree" index type should be used. This would mean that the number of distinct values for that field are close to the total number of rows.
Cluster B-Tree:
If the selectivity is closer to 0, the "Cluster B-Tree" index type should be used. This would mean that the number of distinct values for that field are much less than the total number of rows.
From the sample table setup for this tech tip there are 7 distinct values for the [Table_1]item_name field with 2 million rows, meaning the selectivity of the table would be 7 / 2,000,000 = 0.0000035 (very low!) and thus the best option for the index of this test field would be "Cluster B-Tree".
Pulling values when indexed (Cluster B-Tree, ~1 second):
When changing this field to "Cluster B-Tree", the best option given the field's selectivity, and pulling the values again, the run time is about one second, about 33% of the runtime when the field is indexed with the "Automatic" option, and about 5% of the runtime for a non-indexed field!