Building Production AI into WordPress: Lessons from the BetterDocs Chatbot

Welcome to my blog. I’ll be writing here about web engineering, AI products and building software at scale. To kick things off, I want to share some of what I learned building a production-grade AI chatbot inside a WordPress product.

The problem

Documentation is only useful if people can find answers fast. A traditional search box matches keywords; it doesn’t understand intent. The goal for the BetterDocs AI Chatbot was simple to state and hard to deliver: let a user ask a question in plain language and get an accurate answer grounded in their own documentation — not a hallucination.

That meant building a retrieval-augmented system, not just wiring up an LLM API.

Embeddings and semantic retrieval

The core idea is to represent each chunk of documentation as a vector (an embedding) that captures its meaning. When a user asks a question, we embed the question too, then find the documentation chunks whose vectors are closest to it. Those chunks become the context we hand to the model.

A few things mattered more than I expected:

Chunking strategy. Too large and retrieval gets noisy; too small and you lose context. Splitting on semantic boundaries (headings, paragraphs) beat naive fixed-size splitting.
Keeping embeddings in sync. Docs change. Every edit needs to re-embed the affected content — without blocking the editor.
Grounding the prompt. Structured prompts that explicitly instruct the model to answer only from the provided context dramatically reduced hallucinations.

Background processing is the hard part

The AI calls are the flashy bit, but the engineering challenge in WordPress is everything around them. Embedding a large knowledge base can’t happen in a web request — it has to run in the background, survive failures, and resume cleanly.

I leaned on a queue-and-sync pipeline: detect changed content, enqueue embedding jobs, process them in batches, and reconcile state so the index never drifts from the source. This is the kind of unglamorous systems work that makes the difference between a demo and a product real users trust.

Takeaways

Treat the LLM as one component in a larger system, not the system itself.
Invest early in the data pipeline — retrieval quality is mostly a data problem.
Background processing and idempotency aren’t optional once you’re at scale.

I’ll go deeper on each of these in future posts. If you’re working on something similar, I’d love to hear about it — find me on GitHub or LinkedIn.