跪拜 Guibai
← All articles
Database

A 2700x Speedup from One Index, and Three Ways to Still Get It Wrong

By 倔强的石头_ ·
Read original on juejin.cn ↗ Google Translate ↗ Alt translation

Slow queries are the most common production performance killer, and they rarely announce themselves in logs. A developer who can read an execution plan and spot a Seq Scan on a large table can fix a 1000x slowdown in minutes instead of watching CPU sit at 90% while the business screams.

Summary

A 2-million-row transaction log table without indexes forces a parallel sequential scan that burns CPU and returns a single-row lookup in 88 milliseconds—roughly 1000x slower than acceptable for OLTP. Adding a B-Tree index on the lookup column collapses execution time to 0.032 ms, a 2700x improvement, by swapping the scan for a direct index seek.

Three common patterns still defeat the index. Wrapping the indexed column in an expression like `log_id + 1` triggers a full scan because the optimizer cannot match the transformed value against the index tree. Implicit type conversion is more nuanced: KingbaseES will safely cast a string constant to match an integer column, but reversing the types—passing an integer against a varchar column—forces a cast on the column side and kills the index. When a query retrieves nearly all rows, the cost-based optimizer deliberately ignores the index and chooses a sequential scan, since the random I/O of index lookups plus row fetches costs more than a straight disk sweep.

Takeaways
On a 2-million-row table, a single-row lookup via full table scan took 88 ms; adding a B-Tree index brought it to 0.032 ms, a 2700x improvement.
Wrapping an indexed column in any expression—even `col + 1`—forces a full table scan because the index stores the raw column values.
KingbaseES can safely cast a string constant to match an integer column and still use the index, but passing an integer against a varchar column triggers a cast on the column side and disables the index.
When a query retrieves a high percentage of rows, the cost-based optimizer skips the index and chooses a sequential scan because random I/O from index lookups costs more than a linear disk read.
Execution time of 88 ms for a single-row OLTP lookup is roughly 1000x slower than the sub-millisecond target; at concurrency, this saturates CPU and queues requests.
The `EXPLAIN ANALYZE` output shows the actual node type, estimated cost, and real execution time—three metrics that isolate whether a scan, index, or parallelism is in play.
Conclusions

Most junior developers misread 88 ms as fast because it is below human perception, but OLTP single-row lookups should be measured in microseconds; a 1000x gap is a concurrency bomb.

Index creation is not a one-time fix. The optimizer's decision to use or ignore an index depends on the query shape, data distribution, and type-matching rules that change with every release of the database engine.

The implicit-cast behavior is engine-specific and version-dependent. A rule that was true for PostgreSQL 9 may not hold for KingbaseES, so testing with `EXPLAIN` on the actual version is the only reliable answer.

High-selectivity queries that bypass indexes are not a bug; they are the optimizer correctly trading random I/O for sequential I/O. The real fix is partitioning or columnar storage, not more indexes.

Performance tuning shifts from reactive firefighting to proactive review when DBAs inspect execution plans during code review, catching full scans before they reach production.

Concepts & terms
Seq Scan (Sequential Scan)
A full table scan that reads every row from disk in order. Fast for retrieving most of a table but catastrophic for single-row lookups on large tables.
Index Scan
A lookup that traverses a B-Tree structure to find matching rows directly, then fetches the corresponding heap tuples. Orders of magnitude faster than a Seq Scan for selective queries.
B-Tree Index
The default index type in most relational databases. It stores key values in a balanced tree, enabling logarithmic-time lookups, range scans, and sorting on the indexed column.
EXPLAIN ANALYZE
A diagnostic command that executes a query and returns the actual execution plan with real timing, row counts, and loop counts, rather than just optimizer estimates.
Cost-Based Optimizer (CBO)
The database component that evaluates multiple execution strategies and picks the one with the lowest estimated cost, factoring in I/O, CPU, and data statistics.
Implicit Type Conversion
An automatic cast the database performs when comparing values of different types. Whether it breaks index usage depends on which side of the comparison gets cast.
Table Access by Index Rowid (回表)
After an index lookup finds matching row pointers, the database must fetch the full row from the heap. This random I/O becomes expensive when many rows match.
Source: juejin.cn ↗ Google Translate ↗ Backup ↗