Project: eslwriting.org RAG System
Role: AI Content Architect & Technical Writer
Stack: Python, OpenAI API (text-embedding-3-small, GPT-4o-mini), Pinecone, Flask, WordPress REST API, Render, GitHub
About the project
eslwriting.org is a 12-year-old ESL teaching resource site with 1,019 published posts. This portfolio project demonstrates the full design and deployment of a RAG-based AI knowledge assistant built on the site’s existing content archive, from content ingestion and vector indexing to live deployment and documentation.
Challenge
A large content archive creates a discoverability problem. Visitors searching for specific teaching techniques must navigate 1,000+ posts manually. The challenge was to transform a static content library into a dynamic, queryable knowledge system without modifying the existing site or content workflow.
Three constraints shaped the design:
- Scale: 1,019 posts spanning 12 years of content required a systematic ingestion and indexing approach, not manual curation.
- Accuracy: Answers had to be grounded exclusively in published site content — no hallucination from general model training data.
- Maintainability: The system needed to be updatable as new content is published, without rebuilding from scratch.
Strategic execution
1. Content ingestion & knowledge architecture
Designed and implemented a content pipeline that fetches all published posts via the WordPress REST API, strips HTML using BeautifulSoup, and chunks content into 500-word segments optimised for semantic retrieval.
- Chunking strategy: Fixed 500-word chunks balance retrieval precision with contextual completeness. Chunks under 50 words are discarded to eliminate navigational and promotional text.
- Metadata design: Each chunk stores title, permalink, and source text — enabling full attribution in every generated answer.
- Scale: 1,019 posts processed into 1,174 vectors at a total embedding cost of under $1.
2. Vector database & retrieval layer
Configured a Pinecone vector index using cosine similarity and OpenAI’s text-embedding-3-small embedding model (1,536 dimensions). At query time, the user’s question is embedded and matched against the corpus, retrieving the top 3 semantically relevant chunks regardless of keyword overlap.
3. System prompt & generation rules
Designed the system prompt to enforce three behaviours: answer only from retrieved context, maintain the site’s friendly teacher-to-teacher voice, and always attribute answers to source URLs. This grounds every response in published content and drives traffic back to the site.
4. Documentation package
Produced a full four-document knowledge system specification:
- Technical Specification: system architecture, design decisions, component overview
- Editorial & Style Guide: voice, tone, terminology, response format standards
- Knowledge Base Document: content architecture, metadata schema, maintenance procedures
- Content Governance Guide: quality framework, audit schedule, content lifecycle
5. Deployment
Deployed as a Flask web application on Render. Embedded on the live WordPress site as a floating chat widget via a JavaScript snippet in the theme footer.
System performance
- 1,019 posts indexed, 1,174 vectors stored
- Semantic retrieval across full 12-year archive
- Answers grounded in site content with source attribution on every response
- Beta deployment live at eslwriting-assistant.onrender.com
Interpretation
This project demonstrates that a content-rich site can be transformed into a queryable AI knowledge system without modifying existing content or workflow. The retrieval layer surfaces relevant articles that users would not find through standard site search — particularly for conceptual questions like “how do I build writing fluency” that don’t match article titles directly.
The documentation package mirrors the deliverables a professional AI Content Architect would produce for a client engagement.
Resources
- Live demo: eslwriting-assistant.onrender.com
- GitHub repository: github.com/writingteacher/eslwriting-assistant
- Technical Specification
- Editorial & Style Guide