Text fields¶
Unlike other Lucene-based search engines, Nixiesearch has a distinction between singular and repeated fields on a schema level - so choose your field type wisely:
- for singular fields, use the
text
type. - for repeated fields, choose the
text[]
type.
Example field schema for a text fields title
and genre
:
schema:
movies:
fields:
title:
type: text # only a single title is allowed
search:
semantic:
model: e5-small
genre:
type: text[] # there can be multiple genres per document
search:
lexical:
analyze: english
filter: true # field is filterable
facet: true # field is facetable
store: true # can retrieve the field back from index
suggest: true # build autocomplete suggestions based on that field
Semantic index parameters¶
When a text field has a semantic search enabled, there are a couple of parameters you can further configure:
schema:
movies:
fields:
title:
type: text # only a single title is allowed
search:
semantic:
model: e5-small
ef: 32
m: 16
quantize: float32
workers: 4
Fields:
ef
andm
: HNSW index parameters. The higher these values, the better the search recall at the cost of performance.quantize
(optional,float32
/int8
/int4
/int1
, defaultfloat32
): index quantization level.int8
saves 4x RAM and disk but at the cost of worse recall.workers
(optional, int, default is same as number of CPUs in the system): how many background workers to use for HNSW indexing operations.
Operations on text fields¶
Document ingestion format¶
When a document with a text
field is ingested, Nixiesearch expects the document JSON payload for the field to be in either format:
JSON string
: like{"title":"cookies"}
, when text embedding is computed by the serverJSON obj
: like{"title": {"text":"cookies", "embedding": [1,2,3]}}
for pre-embedded documents.
See pre-embedded text fields in the Document format section for more details.
Search¶
The main reason of text fields existence is to be used in search. Nixiesearch has two types of indexes can be used for search, lexical and semantic:
- lexical: an industry traditional BM25 keyword search, like in Elastic/SOLR before 2022. Nowadays called as
sparse retrieval
. - semantic: an a-kNN vector-based search over embeddings of documents. A.k.a
dense retrieval
.
By default all text fields are not searchable, and you need to explicitly enable either lexical, or semantic retrieval, or both at the same time:
schema:
movies:
fields:
title:
type: text
search:
semantic: # build an embedding HNSW index
model: e5-small
lexical: # build a lexical BM25 index
analyze: english
After that you can search over text fields with all Query DSL operators Nixiesearch supports, for example match
, semantic
and rrf
:
curl -XPOST http://localhost:8080/v1/index/movies/search \
-H "Content-Type: application/json" \
-d '{
"query": {
"rrf": {
"queries": [
{"match": {"title": "batman"}},
{"semantic": {"title": "batman nolan"}}
],
"rank_window_size": 20
}
},
"fields": ["title"],
"size": 5
}'
Facets, filters and sorting¶
See facets, filters and sorting sections for more details.
Suggestions¶
Text fields can also be used for creating autocomplete suggestions:
curl -XPOST -d '{"query": "h", "fields":["title"]}' http://localhost:8080/v1/index/<index-name>/suggest
The request above emits the following response:
{
"suggestions": [
{"text": "hugo", "score": 2.0},
{"text": "hugo boss", "score": 1.0},
{"text": "hugo boss red", "score": 1.0}
],
"took": 11
}
See Autocomplete suggestions section for more details.
For further reading, check out how to define numeric fields in the index mapping.