Building RAG Knowledge Search for My Portfolio
This is how I built the RAG system that powers the "Ask about my work" feature on this site. It uses pgvector, OpenAI embeddings, and a citation system that links back to source material.
This post walks through how I built RAG Knowledge Search, and where it fits in the rest of my work.
The document-centric thinking behind this RAG system draws from my CMS work, where understanding content hierarchies and metadata was essential.
I wanted my portfolio to show real AI engineering work, not just marketing bullets. RAG Knowledge Search is the first project inside my AI Lab: a retrieval augmented generation system that can answer questions about a small knowledge base and show exactly which sources it used.
This post walks through why I built it, the stack I used, and some of the tradeoffs I made for a portfolio friendly version one.
Why start with RAG
RAG keeps popping up in real client conversations. People want chat based interfaces that can answer questions about their docs, their product, or their internal knowledge without hallucinating all over the place.
As a web engineer, I wanted a project that sits right at the intersection of what I already do well and where the industry is heading:
- Next.js and TypeScript for the frontend and API routes
- OpenAI for embeddings and generation
- A simple vector search layer that I can later swap out for a more serious database
- An interface that feels like a real product, not a toy demo
RAG Knowledge Search lives at /ai/rag on my site and it is wired like a real feature, not a separate playground.
High level architecture
Here is the shape of the system in version one:
- Next.js 16 app router page at /ai/rag with a chat style UI
- API route at /api/ai/rag/query that handles embedding, search, and generation
- A small knowledge base stored as structured chunks in memory on the server
- Optional user added documents that are embedded and stored in a shared store during the life of the process
- OpenAI embeddings (text-embedding-3-small) for both the base knowledge and user docs
- GPT 4o mini for answering questions using the retrieved context
The important part is that the whole flow is explicit and inspectable. The UI shows which chunks were used, how relevant they were, and what the final answer was.
Chunking and embeddings
RAG falls apart if your context is messy, so I started with a very simple but explicit chunking approach.
When I add static knowledge to the system, I store it as small text blocks. For user documents, the ingestion route does the following:
Thanks for reading! If you found this useful, check out my other posts or explore the live demos in my AI Lab.