Anindya Obi

Posted on Dec 28

Stop Tuning Embeddings: Package Your Knowledge for Retrieval

#ai #rag #programming #mcp

The most common RAG/agent use case teams ship first is still this:

A customer support copilot that answers questions using help docs, policies, runbooks, and past tickets.

It sounds simple (“just plug docs into RAG”), but this is where most teams learn the hard lesson:

Retrieval quality isn’t mainly an embedding problem.
It’s a packaging problem.

If your knowledge isn’t chunked by meaning and labeled with the right metadata, your agent will feel “random” no matter how many knobs you tune.

This post is a practical checklist for packaging knowledge for retrieval: chunk for meaning + metadata, with examples, plus what’s worth automating.

The foundational process

Baseline RAG pipeline:
ingest → clean → chunk → embed → index → retrieve → generate

This post focuses on the step that quietly decides your ceiling:
chunking + metadata.

Use case we’ll design for: Support Copilot for a SaaS product

Short description: A support agent asks: “Can this customer get a refund? They’re on the Pro plan”.

The copilot should:

pull the right policy section
apply the right exceptions (region/plan/version)
respond with a short, auditable answer + source

The checklist: chunk for meaning + metadata (with examples)

1) Define your “unit of meaning” per doc type

AI Engineer task: RAG Pipeline Setup

Founder lesson: Chunk size is not a strategy. Meaning is.

Examples:

Policy docs: chunk by sections like Eligibility, Exceptions, How to apply (not every 800 tokens)
Runbooks: chunk by step groups (“Check logs → Verify account state → Mitigation”)
API docs: chunk by endpoint (request/response/errors/examples kept together)

✅ Output you want: each chunk answers one real question.

2) Keep related rules together (don’t split what humans decide together)

AI Engineer task: RAG Pipeline Setup

Example (refund policy):

Bad: Eligibility in one chunk, Exceptions in another
Good: Eligibility + Exceptions grouped, because that’s how decisions are made

If you split them, retrieval might grab eligibility but miss the exception, and you ship confident wrong answers.

3) Preserve structure (headings, tables, and hierarchy)

AI Engineer task: Knowledge Source Integration + RAG Pipeline Setup

Examples:

A table like “Plan → Refund window → Conditions” should remain table-shaped (or at least grouped as one unit)
Keep the heading path with the chunk:
Refunds > Eligibility > Pro Plan
This prevents “floating paragraphs” that lose the rule context.

4) Attach metadata that matches how people filter truth

AI Engineer task: Knowledge Source Integration

Most important principle: metadata should reflect how the business differentiates rules.

Example metadata schema for support:

doc_type: policy | runbook | faq
product: mobile_app | web_app
plan: free | pro | enterprise
region: US | EU | IN
effective_date or version: 2025-10 / v3
audience: customer_safe | internal_only
risk_level: low | high

Now when the question is “refund + Pro + EU,” retrieval can filter before ranking.

5) Make “latest policy wins” automatic (versioning + recency)

AI Engineer task: Knowledge Source Integration

Example:

Two refund policies exist: v2 (14 days) and v3 (30 days)
If chunks aren’t labeled with version/effective date, your retriever will happily pull the wrong one.

Rule of thumb:

Prefer the latest effective policy unless the user explicitly asks for historical behavior.

6) Audience tags prevent accidental leakage

AI Engineer task: Guardrails & Safety

Example:

Internal chunk: “Refund override allowed for VIP customers” (internal-only)
Customer chunk: public policy wording (customer-safe)

Without audience=internal_only, your model can retrieve internal notes and leak them into a customer-facing reply.

7) Add “retrieval helpers” for precision (without bloating prompts)

AI Engineer task: RAG Pipeline Setup

Examples:

Add a short chunk summary at ingestion: “Refund eligibility for Pro plan in EU, includes exceptions.”
Add 2–3 canonical questions per chunk: “Can Pro users in EU get a refund after 20 days?”

This improves recall without turning your prompt into a novel.

8) Validate packaging with 10 golden queries (before you blame embeddings)

AI Engineer task: LLM Evaluation

Example golden queries:

“Refund after 20 days (EU, Pro)” → should retrieve Eligibility + Exceptions + EU policy
“Policy conflict (14 vs 30 days)” → should retrieve latest policy + show version label

If retrieval fails here, it’s usually packaging (units + metadata), not model choice.

The repetitive / boring part

If you build multiple RAG systems, you end up repeating the same grind:

defining chunk templates per doc type (policy/runbook/api)
enforcing “keep related sections together”
generating metadata consistently across sources
version tagging + deprecating old docs
generating summaries + canonical questions
regenerating eval queries

It’s critical work, but it shouldn’t be hand-crafted every time.

Where HuTouch fits

HuTouch automates the scaffolding behind RAG Pipeline setup, here is how.

Live teardown invite

I’m hosting a live teardown where we talk about best practices in RAG/AGent building and how automation of structure helps. Below is what we will cover: