跪拜 Guibai
← Back to the summary

A 2700x Speedup from One Index, and Three Ways to Still Get It Wrong

Insert image description here

In our previous columns, we tackled database security issues, handling things like zombie processes, row locks, deadlocks, and full disk situations. So the database no longer crashes frequently and doesn't report insufficient space. This just means it has finally stabilized in the production environment.

However, the life of an operations engineer isn't that easy. Just a few days after the system stabilized, the business side started complaining again. "The server isn't down, and there's still disk space, but when I click the query button on the front-end page, that spinner takes about 10 seconds to return a result. Why is that?"

At this point, when you log into the server to check, you'll find a very strange situation. The number of database connections looks normal, and there are no deadlocks. But the CPU usage just won't come down, staying above 90% for a long time.

This is actually the most common and technically demanding scenario in enterprise-level database operations: slow queries crashing performance. A poorly written SQL statement doesn't just run slowly itself. It also devours the entire server's CPU and memory, causing other normal business operations to grind to a halt.

As database tuning professionals, we definitely can't rely on guessing where the slowness is. This article will show you how the database query optimizer calculates things. Then, it will walk you through using the very handy diagnostic tool EXPLAIN ANALYZE. We'll set up some real data stress tests to see what an execution plan actually looks like—specifically, the difference between a full table scan and an index scan. Finally, we'll discuss those situations where an index exists but isn't used, and how to avoid these pitfalls.

@[toc]


Phase 1: Setting Up the Crime Scene — Creating a "Performance Monster" with Tens of Millions of Data Rows

In the world of performance tuning, data volume is everything. A full table scan SQL might take only 1 millisecond in a development environment with 10,000 rows, but in a production environment with 10 million rows, it could run for dozens of minutes.

To give today's practical exercise a realistic, production-level sense of pressure, we first need to use ksql in your test database to instantly create a test table with massive amounts of data.

1. One-Click Generation of a Million-Row Test Table

Please open your Win11 ksql client, connect to the database, and execute the following rapid data generation script:

-- 1. Create a simulated user transaction log table
CREATE TABLE opt_test_log (
    log_id INT,
    user_id INT,
    action_type VARCHAR(50),
    trade_amount NUMERIC(10, 2),
    create_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- 2. Frenetically inject 2 million rows of test data (execution may take a dozen seconds to half a minute, please be patient)
INSERT INTO opt_test_log (log_id, user_id, action_type, trade_amount)
SELECT 
    i, 
    (random() * 100000)::INT, -- Randomly generate user IDs within 100,000
    'PAYMENT', 
    (random() * 500)::NUMERIC(10, 2)
FROM generate_series(1, 2000000) AS i;

At this point, you have a bulky, unoptimized "primordial" table with tens of millions of rows. Now, the show begins.


Phase 2: The Prey Reveals Itself — Hardcore Dissection of the EXPLAIN ANALYZE Execution Plan

Suppose a business colleague wrote this SQL: they want to precisely query the transaction log for log_id = 1500000 (the 1.5 millionth record).

If you directly execute SELECT * FROM opt_test_log WHERE log_id = 1500000;, you'll only see the result and a long wait time, without knowing what happened under the hood. We need to equip this SQL with an "X-ray machine"—the EXPLAIN ANALYZE command.

1. The Cost of Running Naked: Full Table Scan (Seq Scan)

Add EXPLAIN ANALYZE before the query statement:

EXPLAIN ANALYZE 
SELECT * FROM opt_test_log WHERE log_id = 1500000;

Insert image description here

2. Deciphering the "Book of Heaven": Key Metrics in the Execution Plan

The screen returns a few lines of English. A novice might see it as gibberish. So, what distinguishes a veteran DBA from a newbie? It lies in the ability to quickly extract the truly useful information from this data. The following three metrics must be watched closely.

Execution Node (Node Type): Parallel Seq Scan on opt_test_log This is the most critical signal. As you can see in the screenshot, although the KingbaseES optimizer is quite smart and automatically enabled Parallel—meaning parallel multi-threading—and even allocated 2 Workers to speed things up, the root cause remains: it's still a Seq Scan, a sequential full table scan. What does this mean? To find just 1 row of data, the database mobilized several CPU cores and flipped through all 2 million rows from beginning to end. Think about it: when concurrency spikes, the server CPU instantly hits 100%. This is the culprit behind it.

Cost Estimate (Cost): cost=0.00..25231.67 This is an execution cost estimate provided by the optimizer. The first number, 0.00, is the startup cost to return the first row. The second number, 25231.67, is the total cost to complete the scan for this node. This number has no unit; it's a relative value. The larger it is, the more resources it consumes.

Actual Execution Time (Execution Time): Execution Time: 88.380 ms Often, a junior developer sees this number and thinks, "Oh, it's only 88 milliseconds, not even 0.1 seconds. That's pretty fast." Let me tell you, this kind of thinking is problematic. Why? Think about it: for an OLTP business (online transaction processing) performing an exact equality match on a single column, the normal response time should be at the 0.1 millisecond level—sub-millisecond. So, what does 88 milliseconds mean? It's roughly 1000 times slower than normal! Imagine a scenario like Singles' Day, with thousands of requests hitting per second. If each request monopolizes 3 CPU cores for 88 milliseconds, the system would instantly queue up and then crash.


Phase 3: Intervention — How Indexes Work and the Performance Change

Since the cost of a full table scan is so high, we need to use a very common database feature: the B-Tree index.

What's the principle behind an index? It's like creating a sorted directory for those 2 million rows of data. When looking for data, you check the directory first, which naturally speeds things up.

1. Creating a B-Tree Index

Next, in ksql, we create a single-column index on the log_id field:

-- Create a B-Tree index on the log_id field
CREATE INDEX idx_opt_log_id ON opt_test_log(log_id);

(Note: Creating the index will scan the entire table. Just wait for the CREATE INDEX completion prompt.)

2. Seeing the Effect: Index Scan

After the directory is ready, let's run the same diagnostic SQL again:

EXPLAIN ANALYZE 
SELECT * FROM opt_test_log WHERE log_id = 1500000;

Insert image description here

Performance Comparison Before and After:

Looking at the screenshot above, the execution plan has changed dramatically:

  1. Node Changed: The CPU-hungry Parallel Seq Scan is gone, replaced by Index Scan using idx_opt_log_id. This means the database directly queried the index tree. It followed the tree root down a few branches, found the data, and no longer needed multi-threaded scanning.
  2. Cost Dropped Significantly: The execution cost (Cost) plummeted from over twenty thousand (25231.67) to a tiny 0.43..8.45.
  3. Response Time Improved: The actual execution time (Execution Time) went from 88.380 ms to 0.032 ms. Less than 1 millisecond. Performance improved by roughly 2700 times! Typically, this is the most satisfying moment for a DBA tuning indexes in production.

Phase 4: Advanced Mine Clearing — Why Does My SQL Still "Run Naked" Even After I Created an Index?

Knowing how to read an Index Scan is just the beginning. In a real enterprise architecture, the most soul-crushing question for developers is: "DBA, I clearly created an index on this field, so why does the execution plan show it's still doing a full table scan?"

This phenomenon is called "index invalidation." The database optimizer is extremely smart, but if your SQL violates underlying architectural taboos, the optimizer will decisively abandon your index. Let's use ksql to demonstrate these three classic failure (or reversal) scenarios one by one.

Insert image description here

Combined with the full stress test record above, let's review them one by one:

Taboo 1: Performing Mathematical Operations or Wrapping Functions on the Indexed Column (Definitely Problematic)

Look closely at the results from the first SQL block in the screenshot. We intended to query data for log_id = 1500000, but deliberately wrote log_id + 1 = 1500001 on the left side.

Taboo 2: Implicit Type Conversion (Interesting Twist: The Optimizer Got Smarter)

Now look at the second SQL block in the screenshot. The log_id field is an integer (INT), but the value passed on the right side of the equation is a string: '1500000'. Many older tutorials will tell you: "Implicit type conversion will definitely invalidate the index!"

Taboo 3: The Optimizer Is Actually Smarter Than You — When Too Much Data Needs to Be Fetched

Next, look at the third SQL block in the screenshot. We created a good index on user_id and then queried for data where user_id > 0. Since the IDs in the test data are basically all greater than 0, we are essentially trying to fetch almost 100% of the table's data.

Conclusion

What did we do in this article? First, we generated about 2 million rows of data—at that volume, the performance pressure became apparent. Then, using the EXPLAIN ANALYZE command for a deeper look, we discovered it was doing a full table scan, which was the root cause of the performance issue. Next, we created a B-Tree index, and the query speed improved by tens of thousands of times, really. After that, we carefully dissected three scenarios where indexes become invalid—all architectural-level pitfalls. After this whole cycle, we've essentially covered database performance tuning, from interpreting specific execution details to optimizing holistically.

A truly qualified DBA or architect doesn't spend their days killing processes, restarting databases, and firefighting. More often, they review execution plans and nip problems in the bud before the developers' SQL becomes too terrible. Yes, it's the feeling of strangling the problem in its cradle.

Writing up to this point, this KingbaseES Moat Series column—covering network connection debugging, deadlock troubleshooting, space management, and performance tuning—has walked through these four tough areas. I sincerely hope this set of hands-on operational methods, tested on both Win and Linux, can truly help you become proficient with the domestic KingbaseES database. After all, these domestic databases are now used in more and more critical systems. If you understand the underlying operational principles and have solid troubleshooting skills, you'll have a foundation of confidence when facing new databases and new challenges in the future. Alright, see you in the next technical content.