跪拜 Guibai
← Back to the summary

AI-Generated SQL Is Crashing Production Databases, and the Guardrails Aren't Keeping Up

Hello everyone, I'm Xiao Ye. I write these lessons only so that you won't step into the same pitfalls I did!

In the first half of this year, I took on database support for three new projects. One phenomenon was particularly noticeable — slow-query tickets more than doubled compared to before. Upon investigation, more than half of the problems traced back to a single source: AI-written SQL.

After tools like Cursor, GitHub Copilot, and Tongyi Lingma became widespread, developers who don't know SQL can now generate queries directly. It looks like a good thing, but AI-written SQL has a fatal flaw — it can produce syntactically correct code, yet it has absolutely no understanding of what kind of data actually lives in your database.

A real case from last week: a team using AI-assisted development ran a stress test before going live. An AI-generated join query pushed the CPU straight to 98%. Checking the execution plan revealed — three large tables doing nested loops, with zero index hits. The developer said, "AI wrote it; the logic looked fine to me, so I committed it."

The logic was fine, but the performance collapsed. This is the new challenge DBAs face in 2026.


1. Three Fatal Blind Spots of AI-Written SQL

Blind Spot 1: It Doesn't Know What Your Data Looks Like

When AI generates SQL, it knows nothing about your data distribution, cardinality, or skew. It assumes data is uniform, but real-world data never is.

For example:

-- AI-generated query, looks perfectly reasonable
SELECT u.name, o.total 
FROM users u 
JOIN orders o ON u.id = o.user_id 
WHERE u.status = 'active' 
  AND o.created_at > '2025-01-01'
ORDER BY o.total DESC 
LIMIT 10;

The logic is flawless. But if your users table has 5 million rows, where status = 'active' accounts for 90%, and the orders table has 20 million rows without a composite index on (user_id, created_at) — this query runs lightning fast on the thousand-row test dataset in the dev environment, but becomes a disaster in production.

AI doesn't know that the cardinality of the status field is nearly the entire table, doesn't know how poor the selectivity of created_at is, and certainly doesn't know whether your orders table is partitioned by time. It only generates SQL that "looks right" based on semantics.

Blind Spot 2: It Doesn't Look at Execution Plans

This is the most critical point. After generating SQL, AI won't run EXPLAIN for you, nor will it check whether a full table scan is occurring.

Common performance traps in AI-generated SQL:

Trap 1: N+1 Query Pattern

-- AI will write the main query for you, but won't realize this needs to execute in a loop
SELECT * FROM orders WHERE user_id = ?  -- Then query again for each result

If you query user info again for each order in your code, 1000 orders means 1001 queries. AI won't remind you that "you could use a JOIN here."

Trap 2: Implicit Conversion Causing Index Failure

-- AI doesn't know user_id is a BIGINT type
SELECT * FROM users WHERE user_id = '123456';  -- String comparison, index invalidated

A type mismatch leads to a full table scan. AI can't detect this because you didn't tell it the schema.

Trap 3: ORDER BY + LIMIT on a Non-Indexed Field

SELECT * FROM logs ORDER BY created_at DESC LIMIT 10;
-- If created_at has no index, full table sort

On millions of rows, this query triggers a filesort, writing the temporary table to disk.

Blind Spot 3: It Doesn't Consider Concurrency Scenarios

AI-written SQL runs perfectly in a single-user environment but never considers concurrency.

-- AI would update status like this
UPDATE orders SET status = 'processing' 
WHERE status = 'pending' AND created_at < NOW() - INTERVAL 1 HOUR;

This SQL itself is fine. But if your system has 50 workers running this task simultaneously, lock waits or even deadlocks will occur. AI doesn't know your concurrency model; it only generated a single "logically correct" statement.


2. Why This Problem Erupted in 2026

Three reasons converged:

First, the penetration rate of AI coding tools has surged. By 2026, tools like Cursor, Copilot, and Tongyi Lingma have moved from novelty to daily use. According to industry surveys, over 60% of developers use AI-assisted coding at least once a week. But the vast majority take AI-generated SQL and commit it directly without any performance verification.

Second, the lowered barrier has created a "capability-responsibility mismatch." Previously, developers who didn't know SQL would seek help from a DBA. Now they generate it directly with AI, bypassing the DBA entirely. Code that runs is committed; performance issues are only exposed after going live.

Third, insufficient data volume in test environments. Dev environment databases typically have only a few hundred or thousand rows of test data. AI-generated slow SQL simply can't manifest problems on small datasets. The true cost of a full table scan only becomes apparent in production with tens of millions of rows.


3. How DBAs Hold the Line: Building a SQL Review Mechanism

AI-written SQL cannot be allowed directly into production. Here are several lines of defense I've implemented in real projects:

Defense Line 1: SQL Must Pass EXPLAIN Before Submission

Regardless of who wrote the SQL (including AI), an EXPLAIN result must be attached before submission. Focus on:

This rule is written into the code review process and automatically intercepted in the CI/CD pipeline.

Defense Line 2: Establish a SQL Standards Checklist

Set a few ironclad rules for the team:

  1. Forbid SELECT * — AI especially loves using SELECT *; must be changed to explicitly specify fields
  2. JOINs must use indexes — Join fields must have index coverage
  3. WHERE condition fields must have indexes or use covering indexes
  4. Forbid implicit type conversions on large tables
  5. UPDATE/DELETE must include LIMIT — Prevents AI-generated conditions from matching the entire table

Defense Line 3: Introduce Automated SQL Review Tools

Relying on manual review is unrealistic, especially as team size grows. Tools are needed for automated interception.

Current mainstream SQL review solutions fall into several categories:

Open-source solutions: Open-source SQL review platforms like Archery and Yearning support SQL syntax checking, execution plan analysis, and change approval workflows. Suitable for teams with some operational capability.

Commercial platforms: Enterprise-grade database management platforms like NineData and CloudDM have built-in intelligent SQL review capabilities, automatically identifying high-risk SQL, providing optimization suggestions, and supporting ticket-based approval workflows. Suitable for medium to large enterprises needing standardized management.

Database built-in capabilities: Some domestic databases (like KingbaseES) integrate SQL diagnostics and optimization suggestion features within their management tools like KStudio, proactively identifying potential performance issues based on execution history data models, catching AI-generated SQL pitfalls during the development phase.

Defense Line 4: "Usage Guidelines" for AI-Generated SQL

Set a few rules for AI users on the team:

  1. AI-generated SQL must run EXPLAIN in the test environment, confirming index hits before submission
  2. SQL involving multi-table JOINs must be reviewed by a DBA
  3. Batch operations (INSERT/UPDATE/DELETE exceeding 100 rows) must go through an approval process
  4. Before going live, must validate with production-scale data volumes in a stress-test environment

4. Advice for Developers: Use AI Well, But Don't Get Burned by AI

AI is a great tool, but it must be used correctly:

Tell AI your schema first. Don't just say "look up the user's orders." Instead, give AI your table structure and index details, letting it generate SQL based on your actual environment.

My table structure:
users(id BIGINT PK, name VARCHAR(50), status TINYINT, INDEX idx_status)
orders(id BIGINT PK, user_id BIGINT, total DECIMAL(10,2), created_at DATETIME, INDEX idx_user_created(user_id, created_at))

Please write a query to find the top 10 active users with the highest order amounts in 2025.

The quality of SQL generated from such a prompt will be much higher.

Learn to read EXPLAIN. You don't need to be an expert, but at least understand whether type is using an index, whether rows is within a reasonable range, and whether Extra shows filesort.

If it can't pass the DBA, it can't pass AI. If your SQL runs slowly in the test environment, tune it yourself first. If you can't fix it, find a DBA. Don't just commit it hoping the production environment can handle it.


Summary

The efficiency dividend of AI-written SQL is real, but the performance debt it brings is equally real. DBAs in 2026 need to complete a role transformation — from "helping people write SQL" to "helping people review SQL."

The core is three things:

  1. Establish a review mechanism — EXPLAIN must be passed, standards must be followed, tools must be used
  2. Elevate team awareness — AI is not omnipotent; it doesn't understand your data
  3. Guard the go-live gate — Validate with production-scale data in the stress-test environment; don't let small datasets deceive you

With Xiao Ye at hand, SQL worries are banned.

If there's anything else you'd like to know, feel free to leave a comment! Xiao Ye will surely tell all she knows... See you next time~