Skip to content

S3 Persistence

S3 persistence stores search index segments in S3-compatible object storage, enabling distributed deployments where multiple searcher and indexer nodes can share the same index data. This is the recommended approach for production distributed deployments.

For an overview of all storage options, see the persistence overview. For development environments, consider in-memory storage or local disk storage.

When to Use S3 Storage

S3 storage is ideal for production distributed deployments where you need to scale searcher and indexer components independently. It enables stateless compute nodes that can be added or removed without data loss, making it perfect for auto-scaling scenarios.

Use S3 storage when you need to share indexes between multiple environments (development, staging, production) or when you want to decouple storage from compute for better reliability and operational flexibility. It's also essential for multi-region deployments where you need consistent access to search indexes across different geographic locations.

Configuration Example

Distributed configuration with in-memory components and S3 persistence:

schema:
  movies:
    store:
      distributed:
        searcher:
          memory:          # Fast in-memory reads
        indexer:
          memory:          # Fast in-memory writes
        remote:
          s3:              # Persistent index segments
            bucket: my-search-indexes
            prefix: movies
            region: us-east-1
    fields:
      title:
        type: text
        search:
          lexical:
            analyze: en
      overview:
        type: text
        search:
          lexical:
            analyze: en

For more field configuration options, see the schema mapping guide. For complete distributed setup, see the distributed deployment overview.

S3 Authentication

The most secure approach is to use IAM roles for service accounts:

# No credentials in config - uses IAM role
schema:
  movies:
    store:
      distributed:
        remote:
          s3:
            bucket: my-search-indexes
            prefix: movies
            region: us-east-1

Environment Variables

Set AWS credentials via environment variables:

export AWS_ACCESS_KEY_ID=your-access-key
export AWS_SECRET_ACCESS_KEY=your-secret-key
export AWS_REGION=us-east-1

Kubernetes Secrets

For Kubernetes deployments, use secrets for credentials:

kubectl create secret generic s3-credentials \
  --from-literal=access-key-id=YOUR_ACCESS_KEY \
  --from-literal=secret-access-key=YOUR_SECRET_KEY

Then reference in your deployment:

env:
- name: AWS_ACCESS_KEY_ID
  valueFrom:
    secretKeyRef:
      name: s3-credentials
      key: access-key-id
- name: AWS_SECRET_ACCESS_KEY
  valueFrom:
    secretKeyRef:
      name: s3-credentials
      key: secret-access-key

Required IAM Permissions

Your IAM user or role needs these S3 permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::my-search-indexes",
        "arn:aws:s3:::my-search-indexes/*"
      ]
    }
  ]
}

Alternative S3 Providers

Nixiesearch supports any S3-compatible storage service by specifying a custom endpoint:

MinIO

schema:
  movies:
    store:
      distributed:
        remote:
          s3:
            bucket: nixiesearch-indexes
            prefix: movies
            region: us-east-1
            endpoint: http://minio:9000

Google Cloud Storage

schema:
  movies:
    store:
      distributed:
        remote:
          s3:
            bucket: my-gcs-bucket
            prefix: movies
            region: us-central1
            endpoint: https://storage.googleapis.com

DigitalOcean Spaces

schema:
  movies:
    store:
      distributed:
        remote:
          s3:
            bucket: my-spaces-bucket
            prefix: movies
            region: nyc3
            endpoint: https://nyc3.digitaloceanspaces.com

Cloudflare R2

schema:
  movies:
    store:
      distributed:
        remote:
          s3:
            bucket: my-r2-bucket
            prefix: movies
            region: auto
            endpoint: https://your-account-id.r2.cloudflarestorage.com

For authentication with alternative providers, use their respective access keys with the same environment variables or Kubernetes secrets approach.

Further Reading