Case Study: AI Assistant for a WordPress Site

Project: eslwriting.org RAG System
Role: AI Content Architect & Technical Writer
Stack: Python, OpenAI API (text-embedding-3-small, GPT-4o-mini), Pinecone, Flask, WordPress REST API, Render, GitHub

About the project

eslwriting.org is a 12-year-old ESL teaching resource site with 1,019 published posts. This portfolio project demonstrates the full design and deployment of a RAG-based AI knowledge assistant built on the site’s existing content archive, from content ingestion and vector indexing to live deployment and documentation.

Challenge

A large content archive creates a discoverability problem. Visitors searching for specific teaching techniques must navigate 1,000+ posts manually. The challenge was to transform a static content library into a dynamic, queryable knowledge system without modifying the existing site or content workflow.

Three constraints shaped the design:

  • Scale: 1,019 posts spanning 12 years of content required a systematic ingestion and indexing approach, not manual curation.
  • Accuracy: Answers had to be grounded exclusively in published site content — no hallucination from general model training data.
  • Maintainability: The system needed to be updatable as new content is published, without rebuilding from scratch.

Strategic execution

1. Content ingestion & knowledge architecture

Designed and implemented a content pipeline that fetches all published posts via the WordPress REST API, strips HTML using BeautifulSoup, and chunks content into 500-word segments optimised for semantic retrieval.

  • Chunking strategy: Fixed 500-word chunks balance retrieval precision with contextual completeness. Chunks under 50 words are discarded to eliminate navigational and promotional text.
  • Metadata design: Each chunk stores title, permalink, and source text — enabling full attribution in every generated answer.
  • Scale: 1,019 posts processed into 1,174 vectors at a total embedding cost of under $1.
2. Vector database & retrieval layer

Configured a Pinecone vector index using cosine similarity and OpenAI’s text-embedding-3-small embedding model (1,536 dimensions). At query time, the user’s question is embedded and matched against the corpus, retrieving the top 3 semantically relevant chunks regardless of keyword overlap.

3. System prompt & generation rules

Designed the system prompt to enforce three behaviours: answer only from retrieved context, maintain the site’s friendly teacher-to-teacher voice, and always attribute answers to source URLs. This grounds every response in published content and drives traffic back to the site.

4. Documentation package

Produced a full four-document knowledge system specification:

  • Technical Specification: system architecture, design decisions, component overview
  • Editorial & Style Guide: voice, tone, terminology, response format standards
  • Knowledge Base Document: content architecture, metadata schema, maintenance procedures
  • Content Governance Guide: quality framework, audit schedule, content lifecycle
5. Deployment

Deployed as a Flask web application on Render. Embedded on the live WordPress site as a floating chat widget via a JavaScript snippet in the theme footer.

System performance

  • 1,019 posts indexed, 1,174 vectors stored
  • Semantic retrieval across full 12-year archive
  • Answers grounded in site content with source attribution on every response
  • Beta deployment live at eslwriting-assistant.onrender.com

Interpretation

This project demonstrates that a content-rich site can be transformed into a queryable AI knowledge system without modifying existing content or workflow. The retrieval layer surfaces relevant articles that users would not find through standard site search — particularly for conceptual questions like “how do I build writing fluency” that don’t match article titles directly.

The documentation package mirrors the deliverables a professional AI Content Architect would produce for a client engagement.

Resources