Agent · LLM · Architecture

How a Q&A Agent Uses Hierarchical Clustering to Route Customer Questions Without Retraining

By 久愿Y · Jul 4, 2026

Read original on juejin.cn ↗ Google Translate ↗ Alt translation

Most production Q&A agents need to route questions to the right handler without forcing users through multi-turn menus. This clustering approach gives a concrete, low-dependency recipe—TF-IDF, Ward linkage, cosine similarity, and a UUID trick—that slots into existing Python stacks and avoids the cost and latency of embedding models for straightforward intent routing.

Summary

Customer questions vary wildly in phrasing but collapse into a handful of real topics—refunds, shipping, repairs. A B2B support agent built on TF-IDF vectorization and Ward-linkage hierarchical clustering discovers those topics automatically from historical logs. Each cluster gets a UUID instead of a numeric index, which avoids collisions when the system runs in distributed environments or persists results across batches.

New questions are handled incrementally: the system transforms them using the original vocabulary, measures cosine similarity against each cluster's centroid, and either assigns them to an existing group or spawns a new cluster when similarity drops below a threshold. The centroid updates with a moving average, so clusters drift to stay current without full retraining.

The pipeline includes TextRank keyword extraction so each cluster carries a human-readable label, and the UUID scheme keeps identifiers stable across restarts and database writes. The approach works for small-to-medium text sets where O(n²) clustering cost is acceptable and where the team needs interpretable, hierarchical groupings rather than a flat K-Means partition.

Takeaways

— TF-IDF vectorization paired with Ward-linkage hierarchical clustering groups semantically similar customer questions without needing a pre-set number of clusters.

— Each cluster is assigned a UUID instead of a numeric index, preventing collisions across runs and making results safe for databases and distributed systems.

— Incremental prediction reuses the original TF-IDF vocabulary to vectorize new questions, then assigns them to the nearest cluster centroid via cosine similarity.

— A similarity threshold (default 0.3) decides whether a question joins an existing cluster or spawns a new one; the centroid updates with a moving average on each assignment.

— TextRank extracts representative keywords per cluster, giving each group a human-readable label for downstream routing or FAQ generation.

— Questions that fall below the threshold trigger a new-cluster event, which can alert operators to emerging issue types that need manual handling.

Conclusions

Using UUIDs as cluster identifiers is a small but sharp design choice: numeric indices are ephemeral and ambiguous across runs, while UUIDs make the clustering output directly serializable and safe for distributed coordination.

The incremental prediction step calls `transform` rather than `fit_transform` on new text—a detail that is easy to miss but critical for keeping the vector space consistent with the original clustering.

Ward linkage is chosen specifically because it minimizes within-cluster variance and produces compact, similarly sized clusters, which matters when downstream handlers expect roughly uniform group sizes.

The 0.3 similarity threshold is presented as an empirical starting point, not a universal constant; tuning it controls the trade-off between cluster fragmentation and noise tolerance.

TF-IDF ignores synonymy and word order, so the system will struggle with paraphrases that share no vocabulary—an explicit limitation that points toward embedding-based upgrades for teams that outgrow this baseline.

Concepts & terms

Hierarchical Clustering

A clustering method that builds a tree of nested groups (a dendrogram) by iteratively merging the closest pairs of data points or clusters. It does not require pre-specifying the number of clusters and allows cutting the tree at different heights for varying granularity.

Ward Linkage

A linkage criterion for hierarchical clustering that merges the two clusters that produce the smallest increase in total within-cluster variance. It tends to create compact, similarly sized clusters.

TF-IDF

Term Frequency-Inverse Document Frequency. A numerical statistic that reflects how important a word is to a document in a corpus. TF counts word occurrences; IDF down-weights words that appear in many documents.

Cosine Similarity

A measure of similarity between two non-zero vectors that computes the cosine of the angle between them. Values range from -1 to 1, with 1 meaning identical direction. Widely used in text analysis with TF-IDF vectors.

TextRank

A graph-based ranking algorithm for keyword extraction, inspired by PageRank. It builds a word co-occurrence graph from a text and scores words by their connections, surfacing the most representative terms.

Source: juejin.cn ↗ Google Translate ↗ Backup ↗