Cross-Encoder Reranking¶
Cross-encoder models are powerful neural ranking models that can significantly improve search relevance by reranking initial search results. Unlike bi-encoder models that encode queries and documents separately, cross-encoders process query-document pairs jointly, enabling more sophisticated relevance scoring.
Cross-encoders complement other ranking methods like RRF (Reciprocal Rank Fusion) and work as part of Nixiesearch's comprehensive search query system. See sentence-transformers Cross-Encoder docs for more details on the underlying technology.
How Cross-Encoders Work¶
Cross-encoders take a query and document as input, concatenate them, and output a relevance score. This joint processing allows the model to understand complex relationships between query terms and document content, leading to more accurate relevance judgments.
The typical workflow is:
- Initial Retrieval: Use a fast retrieval method (semantic, lexical, or hybrid) to get candidate documents
- Reranking: Apply the cross-encoder to score and rerank the top candidates
- Final Results: Return the reranked documents with improved relevance ordering
Configuration¶
Model Setup¶
Configure cross-encoder models in your configuration file under the inference.ranker
section:
inference:
ranker:
ce_model:
provider: onnx
model: cross-encoder/ms-marco-MiniLM-L6-v2
max_tokens: 512
batch_size: 32
device: cpu
Configuration Options:
provider
: Model provider (currently onlyonnx
supported)model
: HuggingFace model identifier or local pathmax_tokens
: Maximum sequence length (default: 512)batch_size
: Inference batch size (default: 32)device
: Processing device (cpu
orgpu
) - see inference overview for hardware requirementsfile
: Optional path to custom ONNX model file
Popular cross-encoder models to consider:
cross-encoder/ms-marco-MiniLM-L6-v2
: Fast, general-purpose ranking model, English only.jinaai/jina-reranker-v2-base-multilingual
: Slower, but much more precise multilingual ranker.
Nixiesearch supports any sentence-transformer cross-encoder models in ONNX format. See the Speeding up Inference > ONNX section of SBERT docs for more details on how to convert your own model.
Query Syntax¶
Basic Cross-Encoder query:
{
"query": {
"cross_encoder": {
"model": "ce_model",
"query": "artificial intelligence applications",
"doc_template": "{{ title }} {{ description }}",
"retrieve": {
"semantic": {
"title": "AI machine learning"
}
}
}
}
}
Parameters:
model
: Reference to configured cross-encoder modelquery
: Query text to compare against documentsdoc_template
: Jinja template for rendering document contentretrieve
: Initial retrieval query (can be any query type - semantic, match, bool, etc.)rank_window_size
: Number of documents to retrieve before reranking (optional)
Note
Important: The query
parameter requires explicit text that represents the user's search intent. While the retrieve
query can be any query type (including wildcards, filters, or complex boolean queries), the cross-encoder needs a clear text representation of what the user is looking for. This text query is usually a copy of the main search terms from your retrieval query, but cannot always be extracted automatically - especially when using non-textual queries like category filters or wildcards.
Document Templates¶
Document templates use Jinja syntax to combine multiple document fields into the text passed to the cross-encoder:
Simple template:
{
"doc_template": "{{ title }}"
}
Multi-field template:
{
"doc_template": "Title: {{ title }}\nDescription: {{ description }}\nCategories: {{ categories }}"
}
Conditional template:
{
"doc_template": "{{ title }}{% if description %} - {{ description }}{% endif %}"
}
The system automatically extracts required fields from your template based on your index mapping, so you only need to specify the template without listing fields separately.
Examples¶
E-commerce Product Search¶
This example uses multi-match queries for initial retrieval:
{
"query": {
"cross_encoder": {
"model": "ce_model",
"query": "wireless bluetooth headphones",
"doc_template": "{{ title }} {{ brand }} {{ description }}",
"rank_window_size": 50,
"retrieve": {
"multi_match": {
"query": "wireless bluetooth headphones",
"fields": ["title^2", "description", "brand"]
}
}
}
}
}
Knowledge Base Search¶
This example uses semantic search for initial retrieval:
{
"query": {
"cross_encoder": {
"model": "ce_model",
"query": "how to configure SSL certificates",
"doc_template": "{{ title }}\n{{ content }}",
"retrieve": {
"semantic": {
"content": "SSL certificate configuration setup"
}
}
}
}
}
Hybrid Retrieval with Cross-Encoder Reranking¶
This example combines semantic and lexical search using RRF (Reciprocal Rank Fusion):
{
"query": {
"cross_encoder": {
"model": "ce_model",
"query": "machine learning model deployment",
"doc_template": "{{ title }} {{ abstract }}",
"retrieve": {
"rrf": {
"queries": [
{"semantic": {"abstract": "ML model deployment"}},
{"match": {"title": "machine learning deployment"}}
]
}
}
}
}
}
Note
Cross-encoders expect a single, unified set of documents for reranking. For hybrid search scenarios where you want to combine results from multiple retrieval methods (lexical and semantic), you must first merge them using techniques like RRF or disjunction max before applying cross-encoder reranking.
Performance Considerations¶
Window Size Optimization¶
Use rank_window_size
to balance relevance and performance:
{
"query": {
"cross_encoder": {
"model": "ce_model",
"query": "search query",
"doc_template": "{{ title }}",
"rank_window_size": 100,
"retrieve": {
"semantic": {"title": "initial query"}
}
}
}
}
- Small window (20-50): Faster inference, may miss relevant documents
- Medium window (50-100): Good balance for most use cases
- Large window (100+): Better recall, slower performance
Batch Size Tuning¶
Configure batch size based on your hardware:
- CPU: 8-32 documents per batch
- GPU: 32-128 documents per batch
- Memory-constrained: Reduce batch size if getting OOM errors
Best Practices¶
- Template Design: Include the most relevant fields that help distinguish document relevance
- Initial Retrieval: Use efficient retrieval methods (semantic/lexical) to get good candidates
- Window Sizing: Start with 50-100 documents and adjust based on performance needs
- Model Selection: Choose models appropriate for your domain (general vs. specialized)
- Performance: Cross-encoder inference is expensive; use appropriate
rank_window_size
andbatch_size
for your hardware
Integration with Other Features¶
With Filters¶
Cross-encoder reranking works seamlessly with search filters to first narrow down results before reranking:
{
"query": {
"cross_encoder": {
"model": "ce_model",
"query": "laptop gaming",
"doc_template": "{{ title }} {{ specs }}",
"retrieve": {
"match": {"title": "laptop"}
}
}
},
"filter": {
"term": {"category": "electronics"}
}
}
With Aggregations¶
Cross-encoder reranking works seamlessly with facets and aggregations, as aggregations are computed on the initial retrieval results before reranking. This allows you to get both relevant reranked results and accurate facet counts.
With RAG (Retrieval-Augmented Generation)¶
Cross-encoders are particularly useful in RAG pipelines where high-quality document ranking directly impacts the quality of generated responses.