How a Q&A Agent Uses Hierarchical Clustering to Route Customer Questions Without Retraining
Most production Q&A agents need to route questions to the right handler without forcing users through multi-turn menus. This clustering approach gives a concrete, low-dependency recipe—TF-IDF, Ward linkage, cosine similarity, and a UUID trick—that slots into existing Python stacks and avoids the cost and latency of embedding models for straightforward intent routing.
Customer questions vary wildly in phrasing but collapse into a handful of real topics—refunds, shipping, repairs. A B2B support agent built on TF-IDF vectorization and Ward-linkage hierarchical clustering discovers those topics automatically from historical logs. Each cluster gets a UUID instead of a numeric index, which avoids collisions when the system runs in distributed environments or persists results across batches.
New questions are handled incrementally: the system transforms them using the original vocabulary, measures cosine similarity against each cluster's centroid, and either assigns them to an existing group or spawns a new cluster when similarity drops below a threshold. The centroid updates with a moving average, so clusters drift to stay current without full retraining.
The pipeline includes TextRank keyword extraction so each cluster carries a human-readable label, and the UUID scheme keeps identifiers stable across restarts and database writes. The approach works for small-to-medium text sets where O(n²) clustering cost is acceptable and where the team needs interpretable, hierarchical groupings rather than a flat K-Means partition.
Using UUIDs as cluster identifiers is a small but sharp design choice: numeric indices are ephemeral and ambiguous across runs, while UUIDs make the clustering output directly serializable and safe for distributed coordination.
The incremental prediction step calls `transform` rather than `fit_transform` on new text—a detail that is easy to miss but critical for keeping the vector space consistent with the original clustering.
Ward linkage is chosen specifically because it minimizes within-cluster variance and produces compact, similarly sized clusters, which matters when downstream handlers expect roughly uniform group sizes.
The 0.3 similarity threshold is presented as an empirical starting point, not a universal constant; tuning it controls the trade-off between cluster fragmentation and noise tolerance.
TF-IDF ignores synonymy and word order, so the system will struggle with paraphrases that share no vocabulary—an explicit limitation that points toward embedding-based upgrades for teams that outgrow this baseline.