Config file¶
Core config¶
Main server-related settings are stored here:
core:
host: 0.0.0.0 # optional, default=0.0.0.0
port: 8080 # optional, default=8080
loglevel: info # optional, default info
Environment variables overrides¶
Core config settings can be overridden with env variables:
NIXIESEARCH_CORE_HOST
: overridescore.host
NIXIESEARCH_CORE_PORT
: overridescore.port
NIXIESEARCH_CORE_LOGLEVEL
: overridescore.loglevel
Loglevel can also be set from the command-line flags. Env overrides always have higher priority than config values.
Index mapping¶
You can define each index in the schema
block of the configuration:
schema:
<your-index-name>:
config:
<index configuration>
store:
<store configuration>
fields:
<field definitions>
Index configuration¶
An example of index configuration:
schema:
index-name:
config:
flush:
duration: 5s # how frequently new segments are created
hnsw:
m: 16 # max number of node-node links in HNSW graph
efc: 100 # beam width used while building the index
workers: 8 # how many concurrent workers used for HNSW merge ops
Fields:
flush.duration
: optional, duration, default5s
. Index writer will periodically produce flush index segments (if there are new documents) with this interval.hnsw.m
: optional, int, default 16. How many links should HNSW index have? Larger value means better recall, but higher memory usage and bigger index. Common values are within 16-128 range.hnsw.efc
: optional, int, default 100. How many neighbors in the HNSW graph are explored during indexing. Bigger the value, better the recall, but slower the indexing speed.hnsw.workers
: optional, int, default = number of CPU cores. How many concurrent workers to use for index merges.
Store configuration¶
TODO
Fields definitions¶
TODO
ML Inference¶
See ML Inference overview and RAG Search for an overview of use cases for inference models.
Embedding models¶
Example of a full configuration:
inference:
embedding:
your-model-name:
provider: onnx
model: nixiesearch/e5-small-v2-onnx
file: model.onnx
max_tokens: 512
batch_size: 32
prompt:
query: "query: "
doc: "passage: "
Fields:
provider
: required, string. As forv0.3.0
, only theonnx
provider is supported.model
: required, string. A Huggingface handle, or an HTTP/Local/S3 URL for the model. See model URL reference for more details on how to load your model.prompt
: optional. A document and query prefixes for asymmetrical models.file
: optional, string, default is to pick a lexicographically first file. A file name of the model - useful when HF repo contains multiple versions of the same model.max_tokens
: optional, int, default512
. How many tokens from the input document to process. All tokens beyond the threshold are truncated.batch_size
: optional, int, default32
. Computing embeddings is a highly parallel task, and doing it in big chunks is much more effective than one by one. For CPUs there are usually no gains of batch sizes beyong 32, but on GPUs you can go up to 1024.
LLM completion models¶
Example of a full configuration:
inference:
completion:
your-model-name:
provider: llamacpp
model: Qwen/Qwen2-0.5B-Instruct-GGUF
file: qwen2-0_5b-instruct-q4_0.gguf
system: "You are a helpful assistant, answer only in haiku."
options:
threads: 8
gpu_layers: 100
cont_batching: true
flash_attn: true
seed: 42
Fields:
provider
: required, string. As forv0.3.0
, onlyllamacpp
is supported. Other SaaS providers like OpenAI, Cohere, mxb and Google are on the roadmap.model
: required, string. A Huggingface handle, or an HTTP/Local/S3 URL for the model. See model URL reference for more details on how to load your model.file
: optional, string. A file name for the model, if the target model has multiple. A typical case for quantized models.system
: optional, string, default empty. An optional system prompt to be prepended to all the user prompts.options
: optional, obj. A set of llama-cpp specific options. See llamacpp reference on options for more details.