Skip to content

Cross-Encoder Reranking

Cross-encoder models are powerful neural ranking models that can significantly improve search relevance by reranking initial search results. Unlike bi-encoder models that encode queries and documents separately, cross-encoders process query-document pairs jointly, enabling more sophisticated relevance scoring.

Cross-encoders complement other ranking methods like RRF (Reciprocal Rank Fusion) and work as part of Nixiesearch's comprehensive search query system. See sentence-transformers Cross-Encoder docs for more details on the underlying technology.

How Cross-Encoders Work

Cross-encoders take a query and document as input, concatenate them, and output a relevance score. This joint processing allows the model to understand complex relationships between query terms and document content, leading to more accurate relevance judgments.

The typical workflow is:

  1. Initial Retrieval: Use a fast retrieval method (semantic, lexical, or hybrid) to get candidate documents
  2. Reranking: Apply the cross-encoder to score and rerank the top candidates
  3. Final Results: Return the reranked documents with improved relevance ordering

Configuration

Model Setup

Configure cross-encoder models in your configuration file under the inference.ranker section:

inference:
  ranker:
    ce_model:
      provider: onnx
      model: cross-encoder/ms-marco-MiniLM-L6-v2
      max_tokens: 512
      batch_size: 32
      device: cpu

Configuration Options:

  • provider: Model provider (currently only onnx supported)
  • model: HuggingFace model identifier or local path
  • max_tokens: Maximum sequence length (default: 512)
  • batch_size: Inference batch size (default: 32)
  • device: Processing device (cpu or gpu) - see inference overview for hardware requirements
  • file: Optional path to custom ONNX model file

Popular cross-encoder models to consider:

Nixiesearch supports any sentence-transformer cross-encoder models in ONNX format. See the Speeding up Inference > ONNX section of SBERT docs for more details on how to convert your own model.

Query Syntax

Basic Cross-Encoder query:

{
  "query": {
    "cross_encoder": {
      "model": "ce_model",
      "query": "artificial intelligence applications",
      "doc_template": "{{ title }} {{ description }}",
      "retrieve": {
        "semantic": {
          "title": "AI machine learning"
        }
      }
    }
  }
}

Parameters:

  • model: Reference to configured cross-encoder model
  • query: Query text to compare against documents
  • doc_template: Jinja template for rendering document content
  • retrieve: Initial retrieval query (can be any query type - semantic, match, bool, etc.)
  • rank_window_size: Number of documents to retrieve before reranking (optional)

Note

Important: The query parameter requires explicit text that represents the user's search intent. While the retrieve query can be any query type (including wildcards, filters, or complex boolean queries), the cross-encoder needs a clear text representation of what the user is looking for. This text query is usually a copy of the main search terms from your retrieval query, but cannot always be extracted automatically - especially when using non-textual queries like category filters or wildcards.

Document Templates

Document templates use Jinja syntax to combine multiple document fields into the text passed to the cross-encoder:

Simple template:

{
  "doc_template": "{{ title }}"
}

Multi-field template:

{
  "doc_template": "Title: {{ title }}\nDescription: {{ description }}\nCategories: {{ categories }}"
}

Conditional template:

{
  "doc_template": "{{ title }}{% if description %} - {{ description }}{% endif %}"
}

The system automatically extracts required fields from your template based on your index mapping, so you only need to specify the template without listing fields separately.

Examples

This example uses multi-match queries for initial retrieval:

{
  "query": {
    "cross_encoder": {
      "model": "ce_model",
      "query": "wireless bluetooth headphones",
      "doc_template": "{{ title }} {{ brand }} {{ description }}",
      "rank_window_size": 50,
      "retrieve": {
        "multi_match": {
          "query": "wireless bluetooth headphones",
          "fields": ["title^2", "description", "brand"]
        }
      }
    }
  }
}

This example uses semantic search for initial retrieval:

{
  "query": {
    "cross_encoder": {
      "model": "ce_model",
      "query": "how to configure SSL certificates",
      "doc_template": "{{ title }}\n{{ content }}",
      "retrieve": {
        "semantic": {
          "content": "SSL certificate configuration setup"
        }
      }
    }
  }
}

Hybrid Retrieval with Cross-Encoder Reranking

This example combines semantic and lexical search using RRF (Reciprocal Rank Fusion):

{
  "query": {
    "cross_encoder": {
      "model": "ce_model",
      "query": "machine learning model deployment",
      "doc_template": "{{ title }} {{ abstract }}",
      "retrieve": {
        "rrf": {
          "queries": [
            {"semantic": {"abstract": "ML model deployment"}},
            {"match": {"title": "machine learning deployment"}}
          ]
        }
      }
    }
  }
}

Note

Cross-encoders expect a single, unified set of documents for reranking. For hybrid search scenarios where you want to combine results from multiple retrieval methods (lexical and semantic), you must first merge them using techniques like RRF or disjunction max before applying cross-encoder reranking.

Performance Considerations

Window Size Optimization

Use rank_window_size to balance relevance and performance:

{
  "query": {
    "cross_encoder": {
      "model": "ce_model",
      "query": "search query",
      "doc_template": "{{ title }}",
      "rank_window_size": 100,
      "retrieve": {
        "semantic": {"title": "initial query"}
      }
    }
  }
}
  • Small window (20-50): Faster inference, may miss relevant documents
  • Medium window (50-100): Good balance for most use cases
  • Large window (100+): Better recall, slower performance

Batch Size Tuning

Configure batch size based on your hardware:

  • CPU: 8-32 documents per batch
  • GPU: 32-128 documents per batch
  • Memory-constrained: Reduce batch size if getting OOM errors

Best Practices

  1. Template Design: Include the most relevant fields that help distinguish document relevance
  2. Initial Retrieval: Use efficient retrieval methods (semantic/lexical) to get good candidates
  3. Window Sizing: Start with 50-100 documents and adjust based on performance needs
  4. Model Selection: Choose models appropriate for your domain (general vs. specialized)
  5. Performance: Cross-encoder inference is expensive; use appropriate rank_window_size and batch_size for your hardware

Integration with Other Features

With Filters

Cross-encoder reranking works seamlessly with search filters to first narrow down results before reranking:

{
  "query": {
    "cross_encoder": {
      "model": "ce_model",
      "query": "laptop gaming",
      "doc_template": "{{ title }} {{ specs }}",
      "retrieve": {
        "match": {"title": "laptop"}
      }
    }
  },
  "filter": {
    "term": {"category": "electronics"}
  }
}

With Aggregations

Cross-encoder reranking works seamlessly with facets and aggregations, as aggregations are computed on the initial retrieval results before reranking. This allows you to get both relevant reranked results and accurate facet counts.

With RAG (Retrieval-Augmented Generation)

Cross-encoders are particularly useful in RAG pipelines where high-quality document ranking directly impacts the quality of generated responses.