Don't Assume Your SQL WHERE Conditions Work — Run Them First

Don't Write WHERE Conditions by Habit — Run Common Queries First

When switching to a new database, don't rush into complex syntax first. In daily development, the first things you'll encounter are still those ordinary conditions: =, <>, like, in, between, is null, order by, limit, group by. These look simple, but when actually migrating SQL or debugging interface data, they're the most likely to be overlooked because "it should be about the same."

This time, a very small test table is used to run through common query conditions in one go. The environment is an already initialized app_db, with the table placed under app_schema. The current instance was initialized with --enable-ci, so case-sensitive results are only recorded for this environment and cannot be directly applied to all installation modes.

Prepare Enough Data First

The test table is called app_schema.t_filter_demo. The fields aren't deliberately complex: name for names, category for categories, status for statuses, score and price for range queries, sorting, and aggregation, and remark is specifically left for NULL checks.

There are 12 rows of data, mixing several types: mysql, Kingbase, sql, oracle, pgsql; statuses include online, offline, draft; name deliberately includes values with different cases like MySQL-basic, mysql-limit, MYSQL-LIKE. One row has a NULL score, and 4 rows have NULL remark. Every subsequent query relies on this data.

Prepare condition query test table

This table isn't a business model; it's just to exercise query conditions. For example, category is good for in and group by, status for inequality and grouping statistics, score for between, sorting, and aggregate functions, and remark for NULL checks.

The terminal prompt is app_db=>, indicating the current connection database is app_db, and the current user isn't under the admin prompt #. Such small details are often overlooked but are useful when writing query experiments — at least you can confirm you're not connected to the installation's test database or still using system for all operations.

Confirm Equality and Inequality First

The most basic equality query is straightforward:

select id, name, category, status, score
from app_schema.t_filter_demo
where category = 'mysql'
order by id;

Returns rows 1, 2, 3, all in the mysql category. First, get where category = 'mysql' working, then layer on more conditions.

Inequality uses two syntaxes:

where status <> 'online'

And:

where status != 'online'

Both return the same 5 rows: 3, 6, 8, 9, 11 — data with offline and draft statuses. In this environment, both <> and != work. When writing migration scripts, if the original MySQL SQL used !=, at least this scenario doesn't block it.

Verify equality and inequality

However, for long-term SQL writing, <> is still closer to the standard; != feels more like a habit from MySQL. Whether it works is one thing, but team consistency is another.

One thing confirmed here: status <> 'online' returns all explicit non-online values and doesn't mix in NULLs. The test table's status field is not null, so the result is clean. If a business table's status field allows NULL, inequality conditions need rechecking — don't assume NULL counts as "not online."

AND, OR — Parentheses Are Crucial

First, a combined condition with and:

select id, name, category, status, score
from app_schema.t_filter_demo
where category = 'Kingbase'
  and status = 'online'
order by id;

Returns rows 4 and 5. The table stores category as kingbase, but the query uses Kingbase — it still matches due to --enable-ci. Case insensitivity isn't an assumption; it's visible in the query results.

Now add or:

where (category = 'mysql' or category = 'Kingbase')
  and status = 'online'

Returns rows 1, 2, 4, 5. Parentheses are essential here. Without them, anyone reading the SQL has to mentally recalculate and/or precedence; with parentheses, the grouping is explicit.

Verify AND and OR

SQL migrated from MySQL often has many filter conditions like this: one main condition, then several status, category, and time range conditions. KingbaseES running these basic conditions isn't surprising; what matters is developing the habit of writing complex conditions unambiguously. Especially when or is mixed with a string of ands, parentheses are more reliable than explanations.

LIKE Case Sensitivity Must Be Tested

For LIKE, names with different cases were deliberately used. First, exclude names containing sql:

select id, name, category
from app_schema.t_filter_demo
where name not like '%sql%'
order by id;

Returns rows 4, 6, 10, 11. Kingbase-start, KINGBASE-schema, Oracle-mode, oracle-sequence are not matched by %sql%.

Then query separately:

where name like 'MYSQL%'

And:

where name like 'mysql%'

Both return rows 1, 2, 3: MySQL-basic, mysql-limit, MYSQL-LIKE. This shows that in the current instance, LIKE is case-insensitive for these character comparisons.

LIKE and NOT LIKE case behavior

Don't overgeneralize this conclusion. It only applies to this environment and initialization mode. If the instance doesn't enable case insensitivity, or if the character set or collation changes, retest. When migrating from MySQL, LIKE often hides in search interfaces, so it's best to confirm case behavior early.

One more detail: not like '%sql%' excludes MySQL-basic, mysql-limit, MYSQL-LIKE, SQL-where, sql-order, and SQL-null. This result reveals case sensitivity more than like 'mysql%' because %sql% matches anywhere in the string — SQL inside MySQL is also counted.

IN, NOT IN, BETWEEN — All Smooth

IN is a good replacement for a string of ors. Query for mysql and Kingbase:

select id, name, category, status
from app_schema.t_filter_demo
where category in ('mysql', 'Kingbase')
order by category, id;

Returns 6 rows: 3 kingbase and 3 mysql. Again, due to case insensitivity, Kingbase matches the stored kingbase.

Using not in returns the remaining oracle, pgsql, and sql categories, 6 rows total.

Range condition uses between 85 and 95:

select id, name, score, price
from app_schema.t_filter_demo
where score between 85 and 95
order by score desc, id;

Returns 7 rows, including 95 and 85. between includes both endpoints, consistent with MySQL. Sorted by score desc, id, the highest score is Kingbase-start at 95, the lowest is Oracle-mode at 85.

IN, NOT IN, and BETWEEN

These conditions aren't hard themselves; problems arise when data contains NULLs. not in with a NULL subquery can be tricky, but here only fixed lists are used. NULL handling is separate.

order by category, id also clarifies the return order. in results don't come back in the list order 'mysql', 'Kingbase'; they follow the SQL order by. If an API requires a specific order, write order by explicitly — don't rely on insertion order or in list order.

NULL — Don't Guess with Equals

The remark field has 4 NULL rows. Use the correct syntax:

where remark is null

Returns rows 3, 6, 9, 12. Conversely:

where remark is not null

Returns 8 rows.

Now deliberately write:

where remark = null

Returns 0 rows. This is a good reminder: NULL isn't a normal value; don't use = to match it. To check for nulls, use is null; to exclude them, use is not null.

score is null was also run separately, returning only row 9 SQL-null. This will be used in aggregation later, because count(*) and count(score) give different results.

NULL checks

When migrating from MySQL to KingbaseES, NULL handling doesn't require learning new things — just follow standard SQL. The real change is breaking bad habits: don't concatenate = null into interface conditions, and don't treat NULL as a normal string in dynamic SQL.

Both remark and score have NULLs, but with different effects: remark is text, good for verifying null filtering; score is numeric and will participate in avg. The same NULL affects counting and averages differently in aggregate functions, which is more likely to cause issues in reports than simply finding empty rows.

Sorting and LIMIT Can Be Combined Directly

Get the top 5 scores:

select id, name, category, score
from app_schema.t_filter_demo
where score is not null
order by score desc, id
limit 5;

Returns rows 4, 11, 6, 1, 5, with scores 95, 92, 91, 90, 88. First exclude NULLs with score is not null, then sort descending by score, and add id as a tiebreaker.

Multi-field sorting with LIMIT

The limit syntax was verified earlier; using it with conditions is closer to daily usage. Real APIs rarely just write limit 5; they filter, sort, then limit. With only one sort field, ties can make pagination unstable; adding a unique field as a tiebreaker makes results more reproducible.

In this data, the top 5 have no ties, so the id tiebreaker seems minor, but it should still be written. As data grows, ties become common. Without a stable sort field, pagination issues — duplicate or missing rows — won't appear immediately but will surface during page turns.

Aggregate Functions Skip NULL

First, basic statistics on the whole table:

select
  count(*) as total_count,
  count(score) as score_count,
  min(score) as min_score,
  max(score) as max_score,
  avg(score) as avg_score
from app_schema.t_filter_demo;

Results: total_count is 12, score_count is 11. The missing row is row 9 with NULL score. min(score) is 70, max(score) is 95, avg(score) is 84.45454545454545. The average is calculated over 11 non-NULL scores, not divided by 12.

Then group by status:

select status, count(*) as row_count
from app_schema.t_filter_demo
group by status
order by status;

Returns draft=2, offline=3, online=7.

Basic aggregation statistics

This is similar to common MySQL usage. The key thing to watch is count(*) vs count(field). The former counts rows; the latter counts only non-NULL values in that field. Mixing them up in APIs will skew statistics.

The status statistics also verify the inserted data: 2 drafts, 3 offline, 7 online, totaling 12 rows. In small experiments, grouping results serve as a cross-check, preventing further analysis on already incorrect data.

HAVING — Filtering After Grouping

Finally, statistics by category:

select category, count(*) as row_count, avg(score) as avg_score
from app_schema.t_filter_demo
group by category
order by category;

Results show 5 categories: kingbase 3 rows, avg ~91.33; mysql 3 rows, avg ~82.67; oracle 2 rows, avg 88.50; pgsql 1 row, avg 87; sql 3 rows, avg 71.50. The sql category has one NULL score, so the average is calculated only from 70 and 73.

Add:

having count(*) >= 2

pgsql is filtered out, leaving only categories with 2 or more rows. where filters detail rows before grouping; having filters aggregate results after grouping. This distinction is common in report SQL.

GROUP BY and HAVING

After running these queries, the basics of WHERE conditions are covered. Equality, inequality, and/or, like, in, between, NULL checks, sorting, limiting, aggregation, and having all work as expected in this V009R001C010 environment. Three points need special marking: the current instance is case-insensitive, so LIKE and category = 'Kingbase' behavior can't be generalized outside this environment; NULL must use is null/is not null; aggregate functions handle NULL differently, so watch the difference between count(*) and count(field).

having count(*) >= 2 filters by row count after grouping, so the sql category — despite having one NULL score — is retained because it has 3 detail rows. avg(score) only counts non-NULL scores; row count and average are independent, as seen with sql's 3 rows and avg 71.50.

Once these basic conditions are clear, moving on to JOINs, subqueries, window functions, or backup and recovery will feel much more solid. The biggest risk in SQL migration isn't complex syntax — it's assuming "it's probably the same" without actually testing.