Prompts/AI Engineering & LLM Apps/RAG Pipelines

FreeAI Engineering & LLM Apps🟠 Claude

Build token-based sliding window RAG Pipeline for code repositories with jina-embeddings-v3

Claude Prompt for RAG Pipelines

Production RAG recipe: token-based sliding window chunking, jina-embeddings-v3 embeddings, Vectorize (Cloudflare) storage, mxbai-rerank-large reranking. Includes retrieval evals.

Related prompts

More prompts for RAG Pipelines.

Browse all AI Engineering & LLM Apps →

AI Engineering & LLM Apps

Free

query decomposition Query Transformation for support tickets RAG

Implement query decomposition to improve retrieval recall for support tickets using jina-embeddings-v3 + multi-vector (per chunk).

💬ChatGPT

3511516

AI Engineering & LLM Apps

Premium

Build recursive character RAG Pipeline for scanned PDFs with OCR artifacts with mxbai-embed-large

Production RAG recipe: recursive character chunking, mxbai-embed-large embeddings, Redis Vector storage, Voyage rerank-2 reranking. Includes retrieval evals.

💬ChatGPT

3111513

AI Engineering & LLM Apps

Premium

Build semantic (embedding-based) RAG Pipeline for legal contracts with stella_en_1.5B_v5

Production RAG recipe: semantic (embedding-based) chunking, stella_en_1.5B_v5 embeddings, Chroma storage, mxbai-rerank-large reranking. Includes retrieval evals.

💬ChatGPT

931513

AI Engineering & LLM Apps

Premium

Hybrid Retrieval with Cohere Rerank 3.5 Reranking on Supabase Vector

Hybrid BM25 + dense retrieval architecture with Cohere Rerank 3.5 cross-encoder reranking, tuned for customer interview transcripts.

🤖Any Model

201513

AI Engineering & LLM Apps

Premium

Build token-based sliding window RAG Pipeline for earnings call transcripts with stella_en_1.5B_v5

Production RAG recipe: token-based sliding window chunking, stella_en_1.5B_v5 embeddings, Weaviate storage, mxbai-rerank-large reranking. Includes retrieval evals.

🤖Any Model

3361512

AI Engineering & LLM Apps

Free

Build token-based sliding window RAG Pipeline for multilingual help center articles with cohere-embed-multilingual-v3

Production RAG recipe: token-based sliding window chunking, cohere-embed-multilingual-v3 embeddings, pgvector storage, Cohere Rerank 3.5 reranking. Includes retrieval evals.

🤖Any Model

3361511

You are a senior AI engineer designing a production RAG pipeline. Produce a complete, implementation-ready spec that a mid-level engineer can ship in one sprint. ## System Context - **Document corpus:** code repositories (~5M documents) - **Query volume:** 200 queries/second peak, 200k daily - **Latency SLO:** end-to-end p95 under 5s - **Language coverage:** 40+ languages - **Freshness requirement:** weekly ## Pipeline Architecture ### 1. Ingestion & Parsing - Parser choice for code repositories (Unstructured, LlamaParse, pdfplumber, custom): justify tradeoff - How to preserve tables, code blocks, and hierarchical structure - OCR fallback strategy for scanned pages (Tesseract vs GPT-4o vision vs Textract) - Metadata to extract per document: source_url, created_at, author, section_path, doc_type, access_level - Deduplication: content hash + near-duplicate detection via MinHash/SimHash ### 2. Chunking: token-based sliding window - Target chunk size: 512 tokens with 0 token overlap - Boundary rules (where you're allowed and NOT allowed to split) - How to handle chunks that are too small (merge) and too large (recursive split) - Special handling for tables, code blocks, lists (do NOT split these) - Contextual header: prepend document title + section breadcrumb to each chunk - If chunking strategy is 'contextual retrieval', include the Claude Haiku prompt to generate the 50-100 token contextual prefix per chunk ### 3. Embedding: jina-embeddings-v3 - Batch size, rate-limit handling, retry policy (exponential backoff + jitter) - Cost estimate at corpus size (model price × tokens) - Embedding dimensionality and storage implications - When to re-embed (model upgrade, chunking change) and migration path - Normalization: L2-normalize if using dot product; skip if cosine ### 4. Storage: Vectorize (Cloudflare) - Index configuration (HNSW M, efConstruction, efSearch OR IVF nlist, nprobe) - Payload schema (which metadata is filterable, which is projected) - Sharding / namespace strategy for multi-tenancy - Backup and point-in-time recovery - Estimated storage cost per 1M chunks ### 5. Retrieval: BM25 - Top-k at retrieval: 30 - Metadata filters (tenant_id, access_level, date range) - Hybrid search weights and Reciprocal Rank Fusion k=60 - Query preprocessing: lowercase, stopword handling, acronym expansion ### 6. Reranking: mxbai-rerank-large - Rerank top-30 down to top-5 - Latency budget for rerank step - When to skip rerank (very high-confidence top result, latency pressure) - Fallback if reranker API fails ### 7. Answer Synthesis - Model: Claude Sonnet 4.5 or GPT-4.1 (justify) - System prompt template with citation requirements - Output format: answer + citations array [{chunk_id, quote, doc_url}] - Refusal rules: when retrieval returns low-relevance chunks, refuse rather than hallucinate ## Prompt Template ``` You are a helpful assistant answering questions about code repositories. RULES: 1. Use ONLY the context below. If the answer is not present, say "I don't have enough information." 2. Cite every factual claim with [doc_id] markers. 3. Quote short spans verbatim for critical facts. 4. If context conflicts, note the conflict. CONTEXT: {retrieved_chunks_with_ids} QUESTION: {user_question} ANSWER: ``` ## Evaluation Plan Build a golden set of 500 question+expected-sources pairs from real user queries. Score: - **Retrieval hit@k** (30, 5, 1): did the correct chunk appear? - **Context precision** (Ragas): are retrieved chunks relevant? - **Context recall**: did we retrieve everything needed? - **Faithfulness**: does the answer stay grounded in context? - **Answer relevance**: does the answer address the question? - **Citation accuracy**: do cited chunks actually support the claim? Target thresholds: hit@10 ≥ 0.90, faithfulness ≥ 0.95, citation accuracy ≥ 0.98. ## Edge Cases to Handle - Very long documents (> 1M tokens): hierarchical summarization or map-reduce - Tables with numeric data: extract to markdown tables, preserve headers - Code chunks: chunk by function/class, never mid-function - Non-English docs: language detection, multilingual embedding, answer in query language - PII redaction before storage (emails, SSNs, phone numbers) - Access-controlled docs: enforce ACL filters at query time, NEVER post-filter after retrieval ## Cost & Latency Estimate | Component | Latency (p95) | Cost per query | |-----------|---------------|----------------| | Query embed | — | — | | Vector search | — | — | | Rerank | — | — | | LLM synthesis | — | — | | **Total** | — | — | Fill these in with specific numbers for jina-embeddings-v3 + Vectorize (Cloudflare) + mxbai-rerank-large at 200 QPS. ## Deliverables 1. Code scaffold (Python or TypeScript): ingest.py, chunk.py, embed.py, retrieve.py, synthesize.py, evaluate.py 2. Chunking parameter table (chunk_size, overlap, separators) tuned for code repositories 3. Prompt template (above) with variables called out 4. Eval harness wired to the golden set with CI gate on the three target thresholds 5. Runbook: what to do when retrieval quality drops, what to do when latency spikes 6. Cost model spreadsheet with ingestion cost, per-query cost, storage cost - Use precise technical terminology appropriate for the audience - Include code examples, configurations, or specifications where relevant - Document assumptions, prerequisites, and dependencies - Provide error handling and edge case considerations Present your output in a clear, organized structure with headers (##), subheaders (###), and bullet points. Use bold for key terms.

Build token-based sliding window RAG Pipeline for code repositories with jina-embeddings-v3

Related prompts

query decomposition Query Transformation for support tickets RAG

Build recursive character RAG Pipeline for scanned PDFs with OCR artifacts with mxbai-embed-large

Build semantic (embedding-based) RAG Pipeline for legal contracts with stella_en_1.5B_v5

Hybrid Retrieval with Cohere Rerank 3.5 Reranking on Supabase Vector

Build token-based sliding window RAG Pipeline for earnings call transcripts with stella_en_1.5B_v5

Build token-based sliding window RAG Pipeline for multilingual help center articles with cohere-embed-multilingual-v3

Build token-based sliding window RAG Pipeline for code repositories with jina-embeddings-v3

Related prompts

query decomposition Query Transformation for support tickets RAG

Build recursive character RAG Pipeline for scanned PDFs with OCR artifacts with mxbai-embed-large

Build semantic (embedding-based) RAG Pipeline for legal contracts with stella_en_1.5B_v5

Hybrid Retrieval with Cohere Rerank 3.5 Reranking on Supabase Vector

Build token-based sliding window RAG Pipeline for earnings call transcripts with stella_en_1.5B_v5

Build token-based sliding window RAG Pipeline for multilingual help center articles with cohere-embed-multilingual-v3

How to customize this prompt

Tags

Who this is for