Prompts/AI Engineering & LLM Apps/RAG Pipelines

FreeAI Engineering & LLM Apps💬 ChatGPT

Build token-based sliding window RAG Pipeline for PDFs with tables with mxbai-embed-large

ChatGPT Prompt for RAG Pipelines

Production RAG recipe: token-based sliding window chunking, mxbai-embed-large embeddings, OpenSearch storage, mxbai-rerank-large reranking. Includes retrieval evals.

Related prompts

More prompts for RAG Pipelines.

Browse all AI Engineering & LLM Apps →

AI Engineering & LLM Apps

Free

query decomposition Query Transformation for support tickets RAG

Implement query decomposition to improve retrieval recall for support tickets using jina-embeddings-v3 + multi-vector (per chunk).

💬ChatGPT

3511516

AI Engineering & LLM Apps

Premium

Build recursive character RAG Pipeline for scanned PDFs with OCR artifacts with mxbai-embed-large

Production RAG recipe: recursive character chunking, mxbai-embed-large embeddings, Redis Vector storage, Voyage rerank-2 reranking. Includes retrieval evals.

💬ChatGPT

3111513

AI Engineering & LLM Apps

Premium

Build semantic (embedding-based) RAG Pipeline for legal contracts with stella_en_1.5B_v5

Production RAG recipe: semantic (embedding-based) chunking, stella_en_1.5B_v5 embeddings, Chroma storage, mxbai-rerank-large reranking. Includes retrieval evals.

💬ChatGPT

931513

AI Engineering & LLM Apps

Premium

Hybrid Retrieval with Cohere Rerank 3.5 Reranking on Supabase Vector

Hybrid BM25 + dense retrieval architecture with Cohere Rerank 3.5 cross-encoder reranking, tuned for customer interview transcripts.

🤖Any Model

201513

AI Engineering & LLM Apps

Premium

Build token-based sliding window RAG Pipeline for earnings call transcripts with stella_en_1.5B_v5

Production RAG recipe: token-based sliding window chunking, stella_en_1.5B_v5 embeddings, Weaviate storage, mxbai-rerank-large reranking. Includes retrieval evals.

🤖Any Model

3361512

AI Engineering & LLM Apps

Free

Build token-based sliding window RAG Pipeline for multilingual help center articles with cohere-embed-multilingual-v3

Production RAG recipe: token-based sliding window chunking, cohere-embed-multilingual-v3 embeddings, pgvector storage, Cohere Rerank 3.5 reranking. Includes retrieval evals.

🤖Any Model

3361511

You are a senior AI engineer designing a production RAG pipeline. Produce a complete, implementation-ready spec that a mid-level engineer can ship in one sprint. ## System Context - **Document corpus:** PDFs with tables (~100k documents) - **Query volume:** 50 queries/second peak, 200k daily - **Latency SLO:** end-to-end p95 under 3s - **Language coverage:** English only - **Freshness requirement:** weekly ## Pipeline Architecture ### 1. Ingestion & Parsing - Parser choice for PDFs with tables (Unstructured, LlamaParse, pdfplumber, custom): justify tradeoff - How to preserve tables, code blocks, and hierarchical structure - OCR fallback strategy for scanned pages (Tesseract vs GPT-4o vision vs Textract) - Metadata to extract per document: source_url, created_at, author, section_path, doc_type, access_level - Deduplication: content hash + near-duplicate detection via MinHash/SimHash ### 2. Chunking: token-based sliding window - Target chunk size: 1024 tokens with 128 token overlap - Boundary rules (where you're allowed and NOT allowed to split) - How to handle chunks that are too small (merge) and too large (recursive split) - Special handling for tables, code blocks, lists (do NOT split these) - Contextual header: prepend document title + section breadcrumb to each chunk - If chunking strategy is 'contextual retrieval', include the Claude Haiku prompt to generate the 50-100 token contextual prefix per chunk ### 3. Embedding: mxbai-embed-large - Batch size, rate-limit handling, retry policy (exponential backoff + jitter) - Cost estimate at corpus size (model price × tokens) - Embedding dimensionality and storage implications - When to re-embed (model upgrade, chunking change) and migration path - Normalization: L2-normalize if using dot product; skip if cosine ### 4. Storage: OpenSearch - Index configuration (HNSW M, efConstruction, efSearch OR IVF nlist, nprobe) - Payload schema (which metadata is filterable, which is projected) - Sharding / namespace strategy for multi-tenancy - Backup and point-in-time recovery - Estimated storage cost per 1M chunks ### 5. Retrieval: hybrid (weighted linear combination) - Top-k at retrieval: 30 - Metadata filters (tenant_id, access_level, date range) - Hybrid search weights and Reciprocal Rank Fusion k=60 - Query preprocessing: lowercase, stopword handling, acronym expansion ### 6. Reranking: mxbai-rerank-large - Rerank top-30 down to top-8 - Latency budget for rerank step - When to skip rerank (very high-confidence top result, latency pressure) - Fallback if reranker API fails ### 7. Answer Synthesis - Model: Claude Sonnet 4.5 or GPT-4.1 (justify) - System prompt template with citation requirements - Output format: answer + citations array [{chunk_id, quote, doc_url}] - Refusal rules: when retrieval returns low-relevance chunks, refuse rather than hallucinate ## Prompt Template ``` You are a helpful assistant answering questions about PDFs with tables. RULES: 1. Use ONLY the context below. If the answer is not present, say "I don't have enough information." 2. Cite every factual claim with [doc_id] markers. 3. Quote short spans verbatim for critical facts. 4. If context conflicts, note the conflict. CONTEXT: {retrieved_chunks_with_ids} QUESTION: {user_question} ANSWER: ``` ## Evaluation Plan Build a golden set of 1000 question+expected-sources pairs from real user queries. Score: - **Retrieval hit@k** (30, 5, 1): did the correct chunk appear? - **Context precision** (Ragas): are retrieved chunks relevant? - **Context recall**: did we retrieve everything needed? - **Faithfulness**: does the answer stay grounded in context? - **Answer relevance**: does the answer address the question? - **Citation accuracy**: do cited chunks actually support the claim? Target thresholds: hit@10 ≥ 0.90, faithfulness ≥ 0.95, citation accuracy ≥ 0.98. ## Edge Cases to Handle - Very long documents (> 1M tokens): hierarchical summarization or map-reduce - Tables with numeric data: extract to markdown tables, preserve headers - Code chunks: chunk by function/class, never mid-function - Non-English docs: language detection, multilingual embedding, answer in query language - PII redaction before storage (emails, SSNs, phone numbers) - Access-controlled docs: enforce ACL filters at query time, NEVER post-filter after retrieval ## Cost & Latency Estimate | Component | Latency (p95) | Cost per query | |-----------|---------------|----------------| | Query embed | — | — | | Vector search | — | — | | Rerank | — | — | | LLM synthesis | — | — | | **Total** | — | — | Fill these in with specific numbers for mxbai-embed-large + OpenSearch + mxbai-rerank-large at 50 QPS. ## Deliverables 1. Code scaffold (Python or TypeScript): ingest.py, chunk.py, embed.py, retrieve.py, synthesize.py, evaluate.py 2. Chunking parameter table (chunk_size, overlap, separators) tuned for PDFs with tables 3. Prompt template (above) with variables called out 4. Eval harness wired to the golden set with CI gate on the three target thresholds 5. Runbook: what to do when retrieval quality drops, what to do when latency spikes 6. Cost model spreadsheet with ingestion cost, per-query cost, storage cost - Use precise technical terminology appropriate for the audience - Include code examples, configurations, or specifications where relevant - Document assumptions, prerequisites, and dependencies - Provide error handling and edge case considerations Present your output in a clear, organized structure with headers (##), subheaders (###), and bullet points. Use bold for key terms.

Build token-based sliding window RAG Pipeline for PDFs with tables with mxbai-embed-large

Related prompts

query decomposition Query Transformation for support tickets RAG

Build recursive character RAG Pipeline for scanned PDFs with OCR artifacts with mxbai-embed-large

Build semantic (embedding-based) RAG Pipeline for legal contracts with stella_en_1.5B_v5

Hybrid Retrieval with Cohere Rerank 3.5 Reranking on Supabase Vector

Build token-based sliding window RAG Pipeline for earnings call transcripts with stella_en_1.5B_v5

Build token-based sliding window RAG Pipeline for multilingual help center articles with cohere-embed-multilingual-v3

Build token-based sliding window RAG Pipeline for PDFs with tables with mxbai-embed-large

Related prompts

query decomposition Query Transformation for support tickets RAG

Build recursive character RAG Pipeline for scanned PDFs with OCR artifacts with mxbai-embed-large

Build semantic (embedding-based) RAG Pipeline for legal contracts with stella_en_1.5B_v5

Hybrid Retrieval with Cohere Rerank 3.5 Reranking on Supabase Vector

Build token-based sliding window RAG Pipeline for earnings call transcripts with stella_en_1.5B_v5

Build token-based sliding window RAG Pipeline for multilingual help center articles with cohere-embed-multilingual-v3

How to customize this prompt

Tags

Who this is for