Prompts/AI Engineering & LLM Apps/RAG Pipelines

FreeAI Engineering & LLM Apps🟠 Claude

Hybrid Retrieval with Cohere Rerank 3.5 Reranking on OpenSearch

Claude Prompt for RAG Pipelines

Hybrid BM25 + dense retrieval architecture with Cohere Rerank 3.5 cross-encoder reranking, tuned for product manuals.

83 copies1196 views⭐ 3.4 (33 ratings)

Prompt

You are a search and retrieval engineer. Design a hybrid retrieval stack over OpenSearch with Cohere Rerank 3.5 reranking for a corpus of product manuals.

## Why Hybrid
Dense retrieval nails semantic meaning but misses rare/exact terms (product codes, error strings, legal citations). BM25 nails lexical matches but misses paraphrase. Hybrid + rerank gets the best of both at the cost of one extra rerank call.

## Index Design in OpenSearch
- **Dense field:** nomic-embed-text-v1.5 embeddings (3072-dim, L2)
- **Sparse field:** BM25 (or SPLADE if OpenSearch supports learned sparse) with analyzer suited to product manuals
- **Metadata fields:** doc_id, source, section_path, created_at, author, lang, tenant_id, access_level, doc_type
- **Text field:** raw chunk text for rerank input and citation quotes

### Tokenizer / Analyzer Config
For product manuals, configure the BM25 analyzer with:
- Lowercase filter
- Stopword list (English + domain-specific: "the", "a", plus product manuals-specific noise like "page", "figure")
- Stemmer: Porter for English, language-specific otherwise
- Synonym filter: maintain a curated synonyms.txt with domain terms
- Preserve alphanumerics (product codes, versions like "v2.3.1")
- n-gram on exact-match fields (error codes, SKUs)

## Retrieval Stage 1: Parallel First-Stage
Run in parallel (not sequentially):
- **Dense:** top-200 by L2 via HNSW (efSearch=256)
- **Sparse:** top-200 by BM25
- Apply metadata filters in BOTH (tenant_id, access_level, date range)

## Retrieval Stage 2: Fusion
Combine with **Reciprocal Rank Fusion** (RRF):
```
score(doc) = Σ 1 / (k + rank_i(doc))    where k = 60
```
- Deduplicate by chunk_id (keep highest RRF score)
- Truncate to top-50 before rerank

Why RRF over weighted linear combination: scores from BM25 and cosine are NOT comparable without careful calibration. RRF uses only ranks, so it is robust to score drift.

## Retrieval Stage 3: Rerank with Cohere Rerank 3.5
- Send top-50 (chunk_text, query) pairs to Cohere Rerank 3.5
- Use batching (typically 32-64 pairs per API call) to hit provider rate limits efficiently
- Rerank returns relevance scores in [0, 1]
- Truncate to top-3 for the LLM context window
- **Confidence gate:** if top-1 rerank score < 0.30, return "no confident answer" refusal rather than synthesizing

## Context Packing
- Sort final chunks by relevance descending
- Inject with clear boundaries:
```
<chunk id="c-{i}" source="{url}" score="{score}">
{chunk_text}
</chunk>
```
- Token budget: leave 800 tokens headroom for the answer
- If chunks would overflow, DROP from the bottom (least relevant), never truncate mid-chunk

## Latency Budget (p95)
- Query embed: ≤ 80ms
- Dense search: ≤ 150ms
- Sparse search: ≤ 100ms (parallel with dense)
- Fusion: ≤ 5ms
- Rerank 50 pairs: ≤ 300ms
- LLM synthesis: ≤ 1500ms
- **Total p95:** ≤ 2.0s

## Evaluation: Ablation Matrix
Build a golden set from real user queries tagged with expected doc_ids. Run these variants:
1. Dense only (top-3)
2. Sparse only (top-3)
3. Hybrid no rerank (top-3 from RRF)
4. Hybrid + rerank (top-3)

Metrics: hit@1, hit@3, hit@10, MRR, NDCG@10. Report deltas vs variant 1.

**Expected findings on product manuals:**
- Hybrid beats dense by 3-8 pts on queries with product codes, citations, exact phrases
- Rerank adds 5-15 pts hit@3 over hybrid alone
- Latency cost of rerank: ~200-400ms

## Configuration File
Produce a `retrieval.yaml` that captures ALL knobs:
```yaml
dense:
  model: nomic-embed-text-v1.5
  dim: 3072
  similarity: L2
  hnsw: { M: 16, efConstruction: 200, efSearch: 256 }
  top_k: 200
sparse:
  analyzer: "product manuals_analyzer"
  top_k: 200
fusion:
  method: rrf
  k: 60
rerank:
  model: Cohere Rerank 3.5
  top_n_in: 50
  top_n_out: 3
  min_score: 0.30
  batch_size: 32
```

## Edge Cases
- Empty BM25 result (query has only stopwords) → dense-only
- Empty dense result (index cold start) → BM25-only, log alert
- Rerank API 5xx → use RRF ranking directly, log degradation
- Query contains product code `[A-Z]{2,4}-d{3,6}` → boost exact match in sparse via should-clause

Produce code in Python wiring this end to end, with metrics, tracing, and tests.

Present as numbered steps. Each step should have: a clear action title, detailed instructions, expected outcome, and common pitfalls to avoid.

How to customize this prompt

Replace the bracketed placeholders with your own context before running the prompt:

[0, 1]— fill in your specific 0, 1.
[A-Z]— fill in your specific a-z.

Who this is for

Build hybrid search over product manuals
Add reranking to an existing RAG
Tune retrieval for exact-match queries

Browse all AI Engineering & LLM Apps prompts →

Related prompts

More prompts for RAG Pipelines.

Browse all AI Engineering & LLM Apps →

AI Engineering & LLM Apps

Free

query decomposition Query Transformation for support tickets RAG

Implement query decomposition to improve retrieval recall for support tickets using jina-embeddings-v3 + multi-vector (per chunk).

💬ChatGPT

3511516

AI Engineering & LLM Apps

Premium

Build recursive character RAG Pipeline for scanned PDFs with OCR artifacts with mxbai-embed-large

Production RAG recipe: recursive character chunking, mxbai-embed-large embeddings, Redis Vector storage, Voyage rerank-2 reranking. Includes retrieval evals.

💬ChatGPT

3111513

AI Engineering & LLM Apps

Premium

Build semantic (embedding-based) RAG Pipeline for legal contracts with stella_en_1.5B_v5

Production RAG recipe: semantic (embedding-based) chunking, stella_en_1.5B_v5 embeddings, Chroma storage, mxbai-rerank-large reranking. Includes retrieval evals.

💬ChatGPT

931513

AI Engineering & LLM Apps

Premium

Hybrid Retrieval with Cohere Rerank 3.5 Reranking on Supabase Vector

Hybrid BM25 + dense retrieval architecture with Cohere Rerank 3.5 cross-encoder reranking, tuned for customer interview transcripts.

🤖Any Model

201513

AI Engineering & LLM Apps

Premium

Build token-based sliding window RAG Pipeline for earnings call transcripts with stella_en_1.5B_v5

Production RAG recipe: token-based sliding window chunking, stella_en_1.5B_v5 embeddings, Weaviate storage, mxbai-rerank-large reranking. Includes retrieval evals.

🤖Any Model

3361512

AI Engineering & LLM Apps

Free

Build token-based sliding window RAG Pipeline for multilingual help center articles with cohere-embed-multilingual-v3

Production RAG recipe: token-based sliding window chunking, cohere-embed-multilingual-v3 embeddings, pgvector storage, Cohere Rerank 3.5 reranking. Includes retrieval evals.

🤖Any Model

3361511

Hybrid Retrieval with Cohere Rerank 3.5 Reranking on OpenSearch

How to customize this prompt

Tags

Who this is for

Related prompts

query decomposition Query Transformation for support tickets RAG

Build recursive character RAG Pipeline for scanned PDFs with OCR artifacts with mxbai-embed-large

Build semantic (embedding-based) RAG Pipeline for legal contracts with stella_en_1.5B_v5

Hybrid Retrieval with Cohere Rerank 3.5 Reranking on Supabase Vector

Build token-based sliding window RAG Pipeline for earnings call transcripts with stella_en_1.5B_v5

Build token-based sliding window RAG Pipeline for multilingual help center articles with cohere-embed-multilingual-v3

Hybrid Retrieval with Cohere Rerank 3.5 Reranking on OpenSearch

How to customize this prompt

Tags

Who this is for

Related prompts

query decomposition Query Transformation for support tickets RAG

Build recursive character RAG Pipeline for scanned PDFs with OCR artifacts with mxbai-embed-large

Build semantic (embedding-based) RAG Pipeline for legal contracts with stella_en_1.5B_v5

Hybrid Retrieval with Cohere Rerank 3.5 Reranking on Supabase Vector

Build token-based sliding window RAG Pipeline for earnings call transcripts with stella_en_1.5B_v5

Build token-based sliding window RAG Pipeline for multilingual help center articles with cohere-embed-multilingual-v3