Wednesday, May 6, 2026

Database indexing

Binary search is O(log n) but it only works on sorted data. Suppose you have a customer database and want to find customers whose ages are between 25 and 30. If the database is naturally ordered by customer creation date, you would first need to sort the data by age before performing a binary search to locate the rows corresponding to ages 25 and 30. For a large database, sorting the entire dataset can be time consuming.

A better approach is to create indexes on specific columns. An index on the age column, for example, would contain only two fields: age and customer ID. Because this index is smaller than the full table, it is faster to sort and search. When a query for customers aged between 25 and 30 is executed, the database can perform a binary search (or similar efficient lookup) on the index to find the relevant customer IDs, and then use those IDs to retrieve the full records from the main table.

The downside of indexing is slower write performance. Each time a user creates an account or places an order, the database must update not only the main table but also all associated indexes. Having too many indexes can make write operations, such as clicking a “Save” button, feel slower. Additionally, indexes consume storage space, and at large scale, the total size of the indexes can even exceed that of the actual data.