Link Search Menu Expand Document Documentation Menu

Reindex Documents API

Introduced 1.0

The reindex document API operation copies all documents or a subset of documents from a source index(es), data stream, or alias into a destination index, data stream, or alias. The source and destination must be different.

The reindex operation takes a snapshot of the source index and copies documents to the destination index. For each document, copying is performed by extracting the document source (_source field) and indexing it into the destination.

OpenSearch natively supports cross-cluster reindexing, allowing you to copy data between different OpenSearch clusters. For more information, see Cross-cluster reindexing.

Before using the Reindex API, note the following requirements and limitations:

  • The reindex operation requires the _source field to be enabled for all documents in the source index. If _source is disabled, the operation will fail.
  • You must create and configure the destination index before running the reindex operation. OpenSearch does not automatically copy settings, mappings, or shard configurations from the source index.
  • Configure the appropriate number of shards, replicas, and field mappings for the destination index based on your requirements.
  • For large reindex operations, consider temporarily disabling replicas on the destination index by setting number_of_replicas to 0 and then re-enabling them after completion.

Reindexing large datasets can be resource intensive and may impact cluster performance. Monitor cluster health during reindex operations and consider using throttling parameters for production environments. For more information, see Performance optimization.

Unlike update operations that modify documents within the same index, reindex operations work on different sources and destinations. Thus, version conflicts are unlikely. The version_type parameter controls how OpenSearch handles document versions during reindexing. By default, version conflicts stop the reindex process. To continue reindexing when conflicts occur, set the conflicts parameter to proceed. The response will include a count of version conflicts encountered. Other error types are unaffected by the conflicts parameter.

By default, documents with the same ID are overwritten. The op_type parameter determines whether existing documents can be replaced or if only new documents are allowed, in which case attempting to index a document with an existing ID results in an error. For more information, see Request body fields.

Endpoints

POST /_reindex

Query parameters

The following table lists the available query parameters. All parameters are optional.

Parameter Data type Description
refresh Boolean If true, OpenSearch refreshes shards to make the reindex operation available to search results. Valid values are true, false, and wait_for, which specifies to wait for a refresh before executing the operation. Default is false.
timeout Time unit How long to wait for a response from the cluster. Default is 30s.
wait_for_active_shards String The number of active shards that must be available before OpenSearch processes the reindex request. Default is 1 (only the primary shard). Set to all or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the operation to succeed.
wait_for_completion Boolean If false, OpenSearch runs the reindex operation asynchronously without waiting for it to complete. The request returns immediately, and the task continues in the background. You can monitor its progress using the Tasks API. Default is true, which means the operation runs synchronously. See Asynchronous operations.
requests_per_second Integer Specifies the request’s throttling in sub-requests per second. Default is -1, which means no throttling. See Controlling reindex rate and Throttling and rate control.
require_alias Boolean Whether the destination index must be an alias. Default is false.
scroll Time unit How long to keep the search context open. Default is 5m.
slices Integer The number of slices for automatic slicing. OpenSearch automatically divides the reindex operation into this number of parallel subtasks. Default is 1 (no slicing). Set this parameter to auto for OpenSearch to automatically determine the optimal number of slices. See Using slicing for parallel processing.
max_docs Integer The maximum number of documents that the reindex operation should process. Default is all documents. See Extracting sample data.

Request body fields

The following table lists all request body fields.

Field Data type Required/Optional Description
source Object Required The source to copy data from. See The source object.
dest Object Required The destination to copy data to. See The dest object.
conflicts String Optional Indicates to OpenSearch what should happen if the reindex operation encounters a version conflict. Valid values are abort and proceed. Default is abort.
script Object Optional A script that OpenSearch uses to apply transformations to the data during the reindex operation. See The script object.

The source object

The source object supports the following fields.

Field Data type Required/Optional Description
index String Required The name of the index, data stream, or alias to copy from. You can specify multiple source indexes as a comma-separated list.
query Object Optional The search query to use for the reindex operation. See Filtering documents by query.
remote Object Optional Information about a remote OpenSearch cluster to copy data from. See Cross-cluster reindexing.
remote.host String Required when remote is specified The URL for the remote OpenSearch cluster that you want to index from.
remote.username String Optional The username to use for authentication with the remote host.
remote.password String Optional The password to use for authentication with the remote host.
remote.socket_timeout String Optional The remote socket read timeout. Default is 30s.
remote.connect_timeout String Optional The remote connection timeout. Default is 30s.
size Integer Optional The number of documents to index per batch. Use this when indexing from a remote source to ensure that each batch fits within the on-heap buffer, which has a default maximum size of 100 MB.
slice Object Optional The configuration for manual slicing. Must be an object with id (slice ID) and max (total number of slices) properties to manually specify which slice of the data to process. This enables parallel processing by running multiple reindex operations, each handling a different slice. See Using slicing for parallel processing.
_source Boolean or Array Optional Whether to reindex source fields. Specify a list of fields to reindex or true to reindex all fields. Default is true. See Selecting specific fields.
sort Array Optional Deprecated. A comma-separated list of <field>:<direction> pairs used to sort documents before reindexing. If used with max_docs to control which documents are reindexed, consider filtering documents by query to find the desired subset of data.

The dest object

The dest object supports the following fields.

Field Data type Required/Optional Description
index String Required The name of the index, data stream, or alias to copy to.
version_type String Optional Controls how OpenSearch handles document versions during reindexing:
internal (default): Ignores versions and overwrites any documents in the destination that have the same ID as documents from the source.
external: Preserves the version from the source, creates any missing documents, and updates documents in the destination only if they have an older version than the source.
external_gt: Similar to external but only updates documents if the source version is greater than the destination version.
external_gte: Similar to external but updates documents if the source version is greater than or equal to the destination version.
op_type String Optional Determines how documents are processed during reindexing:
index (default): Creates new documents and updates existing ones.
create: Only creates documents that don’t exist in the destination. Documents with existing IDs cause version conflicts. Required when reindexing to data streams (which are append-only).
pipeline String Optional The ingest pipeline to use during reindexing. See Transforming documents using ingest pipelines.
routing String Optional Controls how document routing is handled during reindexing. Valid values are keep (preserves existing routing, default), discard (removes routing), or =<value> (sets routing to a specific value). See Routing.

The script object

The script object supports the following fields.

Field Data type Required/Optional Description
source String Required The script source code as a string.
lang String Optional The scripting language. Valid values are painless, expression, mustache, and java. Default is painless.

How reindexing works

The reindex operation takes a snapshot of the source index and copies documents to the destination index. This approach means that version conflicts are unlikely to occur, unlike update operations that work on the same index.

By default, version conflicts stop the reindex process. To continue reindexing when conflicts occur, set the conflicts parameter to proceed. The response will include a count of version conflicts encountered. Other error types are unaffected by the conflicts parameter.

Example request

POST /_reindex
{
  "source": {
    "index": "my-source-index"
  },
  "dest": {
    "index": "my-destination-index"
  }
}
response = client.reindex(
  body =   {
    "source": {
      "index": "my-source-index"
    },
    "dest": {
      "index": "my-destination-index"
    }
  }
)

Example response

{
    "took": 28829,
    "timed_out": false,
    "total": 111396,
    "updated": 0,
    "created": 111396,
    "deleted": 0,
    "batches": 112,
    "version_conflicts": 0,
    "noops": 0,
    "retries": {
        "bulk": 0,
        "search": 0
    },
    "throttled_millis": 0,
    "requests_per_second": -1.0,
    "throttled_until_millis": 0,
    "failures": []
}

Response body fields

The following table lists all response body fields and provides a detailed description for each.

Field Data type Description
took Integer The total time in milliseconds required to complete the entire reindex operation, including all batch processing and network overhead.
timed_out Boolean Indicates whether any part of the reindex operation exceeded the configured timeout. If true, the operation may have been partially completed.
total Integer The total number of documents successfully processed during the reindex operation. This includes documents that were created, updated, or resulted in no-op operations.
updated Integer The number of documents that were updated in the destination index because a document with the same ID already existed.
created Integer The number of new documents created in the destination index. These are documents that didn’t previously exist in the destination.
deleted Integer The number of documents deleted from the destination index. This occurs when scripts set ctx.op = "delete".
batches Integer The number of scroll batches processed during the reindex operation. Each batch contains multiple documents as configured by the size parameter.
version_conflicts Integer The number of version conflicts encountered. Version conflicts occur when the destination document has a higher version than the source document (when using external versioning).
noops Integer The number of documents that were skipped during processing. This happens when scripts set ctx.op = "noop" or when no changes are needed.
retries Object The retry statistics object containing retry counts for different operation types. Retries occur automatically when temporary failures are encountered.
retries.bulk Integer The number of bulk operation retries attempted during the reindex operation.
retries.search Integer The number of search operation retries attempted during the reindex operation.
throttled_millis Integer The total time in milliseconds that the operation was throttled to comply with the requests_per_second setting. Higher values indicate more throttling was applied.
requests_per_second Float The actual rate of requests executed per second during the operation. This may differ from the requested rate due to throttling adjustments and system performance.
throttled_until_millis Integer For asynchronous operations, this indicates the next time (in milliseconds since epoch) that throttled requests will be executed. Always 0 for completed operations.
failures Array An array of failure objects describing any unrecoverable errors encountered during the operation. Each failure includes details about the error type, cause, and affected document.

Selective reindexing

The following examples demonstrate different ways to selectively copy data during reindexing, including filtering documents, selecting specific fields, and extracting sample datasets.

Filtering documents by query

Copy only documents that match specific criteria:

POST /_reindex
{
  "source": {
    "index": "orders",
    "query": {
      "range": {
        "order_date": {
          "gte": "2024-01-01",
          "lte": "2024-12-31"
        }
      }
    }
  },
  "dest": {
    "index": "orders-2024"
  }
}
response = client.reindex(
  body =   {
    "source": {
      "index": "orders",
      "query": {
        "range": {
          "order_date": {
            "gte": "2024-01-01",
            "lte": "2024-12-31"
          }
        }
      }
    },
    "dest": {
      "index": "orders-2024"
    }
  }
)

Selecting specific fields

Copy only specific fields from source documents:

POST /_reindex
{
  "source": {
    "index": "customer-data",
    "_source": [
      "customer_id",
      "name",
      "email",
      "created_date"
    ]
  },
  "dest": {
    "index": "customers-minimal"
  }
}
response = client.reindex(
  body =   {
    "source": {
      "index": "customer-data",
      "_source": [
        "customer_id",
        "name",
        "email",
        "created_date"
      ]
    },
    "dest": {
      "index": "customers-minimal"
    }
  }
)

Extracting sample data

Create a smaller dataset for testing:

POST /_reindex
{
  "max_docs": 1000,
  "source": {
    "index": "production-logs",
    "query": {
      "function_score": {
        "random_score": {
          "seed": 42
        },
        "min_score": 0.8
      }
    }
  },
  "dest": {
    "index": "test-sample"
  }
}
response = client.reindex(
  body =   {
    "max_docs": 1000,
    "source": {
      "index": "production-logs",
      "query": {
        "function_score": {
          "random_score": {
            "seed": 42
          },
          "min_score": 0.8
        }
      }
    },
    "dest": {
      "index": "test-sample"
    }
  }
)

Routing

By default, if the reindex operation encounters a document with routing, the routing is preserved unless changed by a script. You can control routing behavior using the routing parameter in the dest section:

  • keep: Preserves the routing from the source document (default)
  • discard: Removes routing from reindexed documents
  • =<text>: Sets routing to the specified value for all reindexed documents

The following request sets a custom routing value for all reindexed documents:

POST /_reindex
{
  "source": {
    "index": "source"
  },
  "dest": {
    "index": "dest",
    "routing": "=company_a"
  }
}
response = client.reindex(
  body =   {
    "source": {
      "index": "source"
    },
    "dest": {
      "index": "dest",
      "routing": "=company_a"
    }
  }
)

Transforming documents using ingest pipelines

To transform data, process documents through an ingest pipeline during reindexing. First create the pipeline, then reference it in the reindex operation:

POST /_reindex
{
  "source": {
    "index": "raw-data"
  },
  "dest": {
    "index": "processed-data",
    "pipeline": "data-enrichment"
  }
}
response = client.reindex(
  body =   {
    "source": {
      "index": "raw-data"
    },
    "dest": {
      "index": "processed-data",
      "pipeline": "data-enrichment"
    }
  }
)

Before running the reindex operation, create the ingest pipeline. This example creates a pipeline that adds a processed_at timestamp and converts the status field to uppercase:

PUT /_ingest/pipeline/data-enrichment
{
  "description": "Enriches documents during reindexing",
  "processors": [
    {
      "set": {
        "field": "processed_at",
        "value": ""
      }
    },
    {
      "uppercase": {
        "field": "status"
      }
    }
  ]
}

Controlling the reindex rate

Control the reindex rate to minimize cluster impact:

POST /_reindex?requests_per_second=500
{
  "source": {
    "index": "production-data"
  },
  "dest": {
    "index": "production-backup"
  }
}
response = client.reindex(
  params = { "requests_per_second": "500" },
  body =   {
    "source": {
      "index": "production-data"
    },
    "dest": {
      "index": "production-backup"
    }
  }
)

Script operations

You can transform documents during the reindex process using scripts. You can modify document content, metadata, and control which documents are processed.

Scripts can modify the following document metadata fields:

  • ctx._id: Change the document ID.
  • ctx._index: Route documents to different destination indexes.
  • ctx._version: Control document versioning.
  • ctx._routing: Set custom routing values.

Set the ctx.op field to control what happens to each document:

  • ctx.op = "index": Index the document normally (default behavior).
  • ctx.op = "create": Only create the document if it doesn’t exist.
  • ctx.op = "noop": Skip the document (useful for conditional processing).
  • ctx.op = "delete": Delete the document from the destination index.

Transforming field values

You can add or modify fields in documents during reindexing. For example, this script adds a timestamp and migration status to each document:

POST /_reindex
{
  "source": {
    "index": "source-data"
  },
  "dest": {
    "index": "migrated-data"
  },
  "script": {
    "source": "ctx._source.timestamp = System.currentTimeMillis(); ctx._source.status = 'migrated'"
  }
}
response = client.reindex(
  body =   {
    "source": {
      "index": "source-data"
    },
    "dest": {
      "index": "migrated-data"
    },
    "script": {
      "source": "ctx._source.timestamp = System.currentTimeMillis(); ctx._source.status = 'migrated'"
    }
  }
)

Renaming fields

You can rename fields during reindexing using scripts. This script renames client_name to customer_name and total_amount to order_total during the reindex operation:

POST /_reindex
{
  "source": {
    "index": "legacy-data"
  },
  "dest": {
    "index": "updated-data"
  },
  "script": {
    "source": "ctx._source.customer_name = ctx._source.remove('client_name'); ctx._source.order_total = ctx._source.remove('total_amount');"
  }
}
response = client.reindex(
  body =   {
    "source": {
      "index": "legacy-data"
    },
    "dest": {
      "index": "updated-data"
    },
    "script": {
      "source": "ctx._source.customer_name = ctx._source.remove('client_name'); ctx._source.order_total = ctx._source.remove('total_amount');"
    }
  }
)

Processing documents conditionally

You can skip documents based on conditions or apply different transformations. For example, this script skips archived documents and adds a migration timestamp to all others:

POST /_reindex
{
  "source": {
    "index": "mixed-data"
  },
  "dest": {
    "index": "processed-data"
  },
  "script": {
    "source": "if (ctx._source.category == 'archived') { ctx.op = 'noop' } else { ctx._source.migrated_at = new Date() }"
  }
}
response = client.reindex(
  body =   {
    "source": {
      "index": "mixed-data"
    },
    "dest": {
      "index": "processed-data"
    },
    "script": {
      "source": "if (ctx._source.category == 'archived') { ctx.op = 'noop' } else { ctx._source.migrated_at = new Date() }"
    }
  }
)

Routing documents to different indexes

You can dynamically route documents to different destination indexes based on document content. For example, this script routes products to category-specific indexes:

POST /_reindex
{
  "source": {
    "index": "product-catalog"
  },
  "dest": {
    "index": "placeholder-will-be-overridden"
  },
  "script": {
    "source": "ctx._index = 'products-' + ctx._source.category.toLowerCase()"
  }
}
response = client.reindex(
  body =   {
    "source": {
      "index": "product-catalog"
    },
    "dest": {
      "index": "placeholder-will-be-overridden"
    },
    "script": {
      "source": "ctx._index = 'products-' + ctx._source.category.toLowerCase()"
    }
  }
)

Consolidating time-based indexes

Use the following script to consolidate multiple time-based indexes into a single index:

POST /_reindex
{
  "source": {
    "index": [
      "logs-2024-01-*",
      "logs-2024-02-*",
      "logs-2024-03-*"
    ]
  },
  "dest": {
    "index": "logs-2024-q1"
  },
  "script": {
    "source": "ctx._source.quarter = 'Q1-2024'; ctx._source.consolidated_date = System.currentTimeMillis();"
  }
}
response = client.reindex(
  body =   {
    "source": {
      "index": [
        "logs-2024-01-*",
        "logs-2024-02-*",
        "logs-2024-03-*"
      ]
    },
    "dest": {
      "index": "logs-2024-q1"
    },
    "script": {
      "source": "ctx._source.quarter = 'Q1-2024'; ctx._source.consolidated_date = System.currentTimeMillis();"
    }
  }
)

This example consolidates 3 months of daily log indexes into a quarterly index while adding metadata about the consolidation.

Asynchronous operations

For large datasets, you can run reindex operations asynchronously to avoid blocking your application. When you set wait_for_completion=false, OpenSearch immediately returns a task ID that you can use to monitor the operation’s progress:

POST /_reindex?wait_for_completion=false
{
  "source": {
    "index": "large-source-index"
  },
  "dest": {
    "index": "destination-index"
  }
}
response = client.reindex(
  params = { "wait_for_completion": "false" },
  body =   {
    "source": {
      "index": "large-source-index"
    },
    "dest": {
      "index": "destination-index"
    }
  }
)

The response includes a task ID:

{
  "task": "oTUltX4IQMOUUVeiohTt8A:12345"
}

Use the Tasks API to check the status of your reindex operation:

GET /_tasks/oTUltX4IQMOUUVeiohTt8A:12345

You can manage long-running reindex tasks using these operations:

  • Cancel a running reindex: POST /_tasks/oTUltX4IQMOUUVeiohTt8A:12345/_cancel
  • List all reindex tasks: GET /_tasks?actions=*reindex*
  • Task cleanup: OpenSearch automatically removes completed task documents, but you can manually delete them if needed for immediate cleanup.

Cross-cluster reindexing

Copy data from a remote OpenSearch cluster:

POST /_reindex
{
  "source": {
    "remote": {
      "host": "https://remote-cluster.example.com:9200",
      "username": "reindex-user",
      "password": "secure-password"
    },
    "index": "remote-index",
    "size": 1000
  },
  "dest": {
    "index": "local-copy"
  }
}
response = client.reindex(
  body =   {
    "source": {
      "remote": {
        "host": "https://remote-cluster.example.com:9200",
        "username": "reindex-user",
        "password": "secure-password"
      },
      "index": "remote-index",
      "size": 1000
    },
    "dest": {
      "index": "local-copy"
    }
  }
)

SSL configuration for remote reindexing

When reindexing from remote clusters over HTTPS, configure SSL settings in opensearch.yml.

Certificate-based authentication

Configure SSL using individual certificate files:

reindex.ssl.certificate_authorities: ["/path/to/ca-cert.pem"]
reindex.ssl.certificate: "/path/to/client-cert.pem"
reindex.ssl.key: "/path/to/client-key.pem"
reindex.ssl.verification_mode: full

Keystore-based authentication

Configure SSL using keystore and truststore files:

reindex.ssl.keystore.path: "/path/to/keystore.p12"
reindex.ssl.keystore.type: "PKCS12"
reindex.ssl.truststore.path: "/path/to/truststore.p12"
reindex.ssl.truststore.type: "PKCS12"

SSL configuration options

The following table lists the available SSL configuration parameters.

Parameter Description Default
reindex.ssl.verification_mode The certificate verification level: full, certificate, or none full
reindex.ssl.certificate_authorities A list of CA certificate file paths None
reindex.ssl.truststore.path The path to the truststore file (JKS or PKCS12) None
reindex.ssl.keystore.path The path to the keystore file for client authentication None
reindex.ssl.supported_protocols The supported TLS protocol versions TLSv1.3,TLSv1.2

SSL settings must be configured in opensearch.yml and require a cluster restart. They cannot be set in the reindex request body.

Remote cluster allow list

Configure allowed remote hosts in opensearch.yml:

reindex.remote.allowlist: [
  "remote-cluster.example.com:9200",
  "backup-cluster.example.com:9200",
  "10.0.1.*:9200"
]

The allow list supports:

  • Explicit host:port combinations.
  • Wildcard patterns for IP ranges.
  • Multiple cluster endpoints.

Performance optimization

Use the following techniques to optimize reindexing performance.

Throttling and rate control

Control the reindex operation’s impact on cluster performance using throttling:

POST /_reindex?requests_per_second=500
{
  "source": {
    "index": "production-data"
  },
  "dest": {
    "index": "production-backup"
  }
}
response = client.reindex(
  params = { "requests_per_second": "500" },
  body =   {
    "source": {
      "index": "production-data"
    },
    "dest": {
      "index": "production-backup"
    }
  }
)

You can dynamically adjust throttling for running reindex operations:

POST /_reindex/task_id/_rethrottle?requests_per_second=200

Using slicing for parallel processing

Slicing divides a reindex operation into multiple parallel tasks to improve performance on large datasets.

Automatic slicing

To let OpenSearch determine the optimal number of slices, set the slices query parameter to auto:

POST /_reindex?slices=auto
{
  "source": {
    "index": "large-index"
  },
  "dest": {
    "index": "large-index-copy"
  }
}
response = client.reindex(
  params = { "slices": "auto" },
  body =   {
    "source": {
      "index": "large-index"
    },
    "dest": {
      "index": "large-index-copy"
    }
  }
)

Manual slicing

For more control over parallelization, you can manually configure slices by specifying the slice ID and total number of slices in the request body.

OpenSearch uses the max parameter to partition the dataset consistently across all slice requests. OpenSearch applies a hash function to each document using the max value to determine which slice the document belongs to. This ensures that:

  • Documents are distributed evenly across all slices.
  • Each document goes to exactly one slice (no duplicates or gaps).
  • All parallel requests must use the same max value for consistency.

For example, with max: 4, you can run four separate requests in parallel:

  • Request 1: {"id": 0, "max": 4} (processes slice 0)
  • Request 2: {"id": 1, "max": 4} (processes slice 1)
  • Request 3: {"id": 2, "max": 4} (processes slice 2)
  • Request 4: {"id": 3, "max": 4} (processes slice 3)

The following request processes slice 0 out of 4 total slices:

POST /_reindex
{
  "source": {
    "index": "large-index",
    "slice": {
      "id": 0,
      "max": 4
    }
  },
  "dest": {
    "index": "large-index-copy"
  }
}
response = client.reindex(
  body =   {
    "source": {
      "index": "large-index",
      "slice": {
        "id": 0,
        "max": 4
      }
    },
    "dest": {
      "index": "large-index-copy"
    }
  }
)

Run multiple requests with different slice IDs (0–3) for parallel processing.

Monitoring reindex operations

Use the following methods to monitor the progress and performance of your reindex operations.

Monitor all active reindex operations in your cluster:

GET /_tasks?actions=*reindex*&detailed=true

Check the progress of a specific reindex task using its task ID:

GET /_tasks/oTUltX4IQMOUUVeiohTt8A:12345

Monitor cluster performance and disk usage during reindex operations:

GET /_cluster/health

GET /_nodes/stats/indices/store