Reindex Documents API

Introduced 1.0

The reindex document API operation copies all documents or a subset of documents from a source index(es), data stream, or alias into a destination index, data stream, or alias. The source and destination must be different.

The reindex operation takes a snapshot of the source index and copies documents to the destination index. For each document, copying is performed by extracting the document source (_source field) and indexing it into the destination.

OpenSearch natively supports cross-cluster reindexing, allowing you to copy data between different OpenSearch clusters. For more information, see Cross-cluster reindexing.

Before using the Reindex API, note the following requirements and limitations:

The reindex operation requires the _source field to be enabled for all documents in the source index. If _source is disabled, the operation will fail.
You must create and configure the destination index before running the reindex operation. OpenSearch does not automatically copy settings, mappings, or shard configurations from the source index.
Configure the appropriate number of shards, replicas, and field mappings for the destination index based on your requirements.
For large reindex operations, consider temporarily disabling replicas on the destination index by setting number_of_replicas to 0 and then re-enabling them after completion.

Reindexing large datasets can be resource intensive and may impact cluster performance. Monitor cluster health during reindex operations and consider using throttling parameters for production environments. For more information, see Performance optimization.

Unlike update operations that modify documents within the same index, reindex operations work on different sources and destinations. Thus, version conflicts are unlikely. The version_type parameter controls how OpenSearch handles document versions during reindexing. By default, version conflicts stop the reindex process. To continue reindexing when conflicts occur, set the conflicts parameter to proceed. The response will include a count of version conflicts encountered. Other error types are unaffected by the conflicts parameter.

By default, documents with the same ID are overwritten. The op_type parameter determines whether existing documents can be replaced or if only new documents are allowed, in which case attempting to index a document with an existing ID results in an error. For more information, see Request body fields.

Endpoints

POST /_reindex

Query parameters

The following table lists the available query parameters. All parameters are optional.

Parameter	Data type	Description
`refresh`	Boolean	If `true`, OpenSearch refreshes shards to make the reindex operation available to search results. Valid values are `true`, `false`, and `wait_for`, which specifies to wait for a refresh before executing the operation. Default is `false`.
`timeout`	Time unit	How long to wait for a response from the cluster. Default is `30s`.
`wait_for_active_shards`	String	The number of active shards that must be available before OpenSearch processes the reindex request. Default is `1` (only the primary shard). Set to `all` or a positive integer. Values greater than `1` require replicas. For example, if you specify a value of `3`, the index must have two replicas distributed across two additional nodes for the operation to succeed.
`wait_for_completion`	Boolean	If `false`, OpenSearch runs the reindex operation asynchronously without waiting for it to complete. The request returns immediately, and the task continues in the background. You can monitor its progress using the Tasks API. Default is `true`, which means the operation runs synchronously. See Asynchronous operations.
`requests_per_second`	Integer	Specifies the request’s throttling in sub-requests per second. Default is `-1`, which means no throttling. See Controlling reindex rate and Throttling and rate control.
`require_alias`	Boolean	Whether the destination index must be an alias. Default is `false`.
`scroll`	Time unit	How long to keep the search context open. Default is `5m`.
`slices`	Integer	The number of slices for automatic slicing. OpenSearch automatically divides the reindex operation into this number of parallel subtasks. Default is `1` (no slicing). Set this parameter to `auto` for OpenSearch to automatically determine the optimal number of slices. See Using slicing for parallel processing.
`max_docs`	Integer	The maximum number of documents that the reindex operation should process. Default is all documents. See Extracting sample data.

Request body fields

The following table lists all request body fields.

Field	Data type	Required/Optional	Description
`source`	Object	Required	The source to copy data from. See The `source` object.
`dest`	Object	Required	The destination to copy data to. See The `dest` object.
`conflicts`	String	Optional	Indicates to OpenSearch what should happen if the reindex operation encounters a version conflict. Valid values are `abort` and `proceed`. Default is `abort`.
`script`	Object	Optional	A script that OpenSearch uses to apply transformations to the data during the reindex operation. See The `script` object.

The `source` object

The source object supports the following fields.

Field	Data type	Required/Optional	Description
`index`	String	Required	The name of the index, data stream, or alias to copy from. You can specify multiple source indexes as a comma-separated list.
`query`	Object	Optional	The search query to use for the reindex operation. See Filtering documents by query.
`remote`	Object	Optional	Information about a remote OpenSearch cluster to copy data from. See Cross-cluster reindexing.
`remote.host`	String	Required when `remote` is specified	The URL for the remote OpenSearch cluster that you want to index from.
`remote.username`	String	Optional	The username to use for authentication with the remote host.
`remote.password`	String	Optional	The password to use for authentication with the remote host.
`remote.socket_timeout`	String	Optional	The remote socket read timeout. Default is `30s`.
`remote.connect_timeout`	String	Optional	The remote connection timeout. Default is `30s`.
`size`	Integer	Optional	The number of documents to index per batch. Use this when indexing from a remote source to ensure that each batch fits within the on-heap buffer, which has a default maximum size of 100 MB.
`slice`	Object	Optional	The configuration for manual slicing. Must be an object with `id` (slice ID) and `max` (total number of slices) properties to manually specify which slice of the data to process. This enables parallel processing by running multiple reindex operations, each handling a different slice. See Using slicing for parallel processing.
`_source`	Boolean or Array	Optional	Whether to reindex source fields. Specify a list of fields to reindex or `true` to reindex all fields. Default is `true`. See Selecting specific fields.
`sort`	Array	Optional	Deprecated. A comma-separated list of `<field>:<direction>` pairs used to sort documents before reindexing. If used with `max_docs` to control which documents are reindexed, consider filtering documents by query to find the desired subset of data.

The `dest` object

The dest object supports the following fields.

Field	Data type	Required/Optional	Description
`index`	String	Required	The name of the index, data stream, or alias to copy to.
`version_type`	String	Optional	Controls how OpenSearch handles document versions during reindexing: • `internal` (default): Ignores versions and overwrites any documents in the destination that have the same ID as documents from the source. • `external`: Preserves the version from the source, creates any missing documents, and updates documents in the destination only if they have an older version than the source. • `external_gt`: Similar to `external` but only updates documents if the source version is greater than the destination version. • `external_gte`: Similar to `external` but updates documents if the source version is greater than or equal to the destination version.
`op_type`	String	Optional	Determines how documents are processed during reindexing: • `index` (default): Creates new documents and updates existing ones. • `create`: Only creates documents that don’t exist in the destination. Documents with existing IDs cause version conflicts. Required when reindexing to data streams (which are append-only).
`pipeline`	String	Optional	The ingest pipeline to use during reindexing. See Transforming documents using ingest pipelines.
`routing`	String	Optional	Controls how document routing is handled during reindexing. Valid values are `keep` (preserves existing routing, default), `discard` (removes routing), or `=<value>` (sets routing to a specific value). See Routing.

The `script` object

The script object supports the following fields.

Field	Data type	Required/Optional	Description
`source`	String	Required	The script source code as a string.
`lang`	String	Optional	The scripting language. Valid values are `painless`, `expression`, `mustache`, and `java`. Default is `painless`.

How reindexing works

The reindex operation takes a snapshot of the source index and copies documents to the destination index. This approach means that version conflicts are unlikely to occur, unlike update operations that work on the same index.

By default, version conflicts stop the reindex process. To continue reindexing when conflicts occur, set the conflicts parameter to proceed. The response will include a count of version conflicts encountered. Other error types are unaffected by the conflicts parameter.

Example request

POST /_reindex
{
  "source": {
    "index": "my-source-index"
  },
  "dest": {
    "index": "my-destination-index"
  }
}

response = client.reindex(
  body =   {
    "source": {
      "index": "my-source-index"
    },
    "dest": {
      "index": "my-destination-index"
    }
  }
)

Example response

{
    "took": 28829,
    "timed_out": false,
    "total": 111396,
    "updated": 0,
    "created": 111396,
    "deleted": 0,
    "batches": 112,
    "version_conflicts": 0,
    "noops": 0,
    "retries": {
        "bulk": 0,
        "search": 0
    },
    "throttled_millis": 0,
    "requests_per_second": -1.0,
    "throttled_until_millis": 0,
    "failures": []
}

Response body fields

The following table lists all response body fields and provides a detailed description for each.

Field	Data type	Description
`took`	Integer	The total time in milliseconds required to complete the entire reindex operation, including all batch processing and network overhead.
`timed_out`	Boolean	Indicates whether any part of the reindex operation exceeded the configured timeout. If `true`, the operation may have been partially completed.
`total`	Integer	The total number of documents successfully processed during the reindex operation. This includes documents that were created, updated, or resulted in no-op operations.
`updated`	Integer	The number of documents that were updated in the destination index because a document with the same ID already existed.
`created`	Integer	The number of new documents created in the destination index. These are documents that didn’t previously exist in the destination.
`deleted`	Integer	The number of documents deleted from the destination index. This occurs when scripts set `ctx.op = "delete"`.
`batches`	Integer	The number of scroll batches processed during the reindex operation. Each batch contains multiple documents as configured by the `size` parameter.
`version_conflicts`	Integer	The number of version conflicts encountered. Version conflicts occur when the destination document has a higher version than the source document (when using external versioning).
`noops`	Integer	The number of documents that were skipped during processing. This happens when scripts set `ctx.op = "noop"` or when no changes are needed.
`retries`	Object	The retry statistics object containing retry counts for different operation types. Retries occur automatically when temporary failures are encountered.
`retries.bulk`	Integer	The number of bulk operation retries attempted during the reindex operation.
`retries.search`	Integer	The number of search operation retries attempted during the reindex operation.
`throttled_millis`	Integer	The total time in milliseconds that the operation was throttled to comply with the `requests_per_second` setting. Higher values indicate more throttling was applied.
`requests_per_second`	Float	The actual rate of requests executed per second during the operation. This may differ from the requested rate due to throttling adjustments and system performance.
`throttled_until_millis`	Integer	For asynchronous operations, this indicates the next time (in milliseconds since epoch) that throttled requests will be executed. Always `0` for completed operations.
`failures`	Array	An array of failure objects describing any unrecoverable errors encountered during the operation. Each failure includes details about the error type, cause, and affected document.

Selective reindexing

The following examples demonstrate different ways to selectively copy data during reindexing, including filtering documents, selecting specific fields, and extracting sample datasets.

Filtering documents by query

Copy only documents that match specific criteria:

POST /_reindex
{
  "source": {
    "index": "orders",
    "query": {
      "range": {
        "order_date": {
          "gte": "2024-01-01",
          "lte": "2024-12-31"
        }
      }
    }
  },
  "dest": {
    "index": "orders-2024"
  }
}

response = client.reindex(
  body =   {
    "source": {
      "index": "orders",
      "query": {
        "range": {
          "order_date": {
            "gte": "2024-01-01",
            "lte": "2024-12-31"
          }
        }
      }
    },
    "dest": {
      "index": "orders-2024"
    }
  }
)

Selecting specific fields

Copy only specific fields from source documents:

POST /_reindex
{
  "source": {
    "index": "customer-data",
    "_source": [
      "customer_id",
      "name",
      "email",
      "created_date"
    ]
  },
  "dest": {
    "index": "customers-minimal"
  }
}

response = client.reindex(
  body =   {
    "source": {
      "index": "customer-data",
      "_source": [
        "customer_id",
        "name",
        "email",
        "created_date"
      ]
    },
    "dest": {
      "index": "customers-minimal"
    }
  }
)

Extracting sample data

Create a smaller dataset for testing:

POST /_reindex
{
  "max_docs": 1000,
  "source": {
    "index": "production-logs",
    "query": {
      "function_score": {
        "random_score": {
          "seed": 42
        },
        "min_score": 0.8
      }
    }
  },
  "dest": {
    "index": "test-sample"
  }
}

response = client.reindex(
  body =   {
    "max_docs": 1000,
    "source": {
      "index": "production-logs",
      "query": {
        "function_score": {
          "random_score": {
            "seed": 42
          },
          "min_score": 0.8
        }
      }
    },
    "dest": {
      "index": "test-sample"
    }
  }
)

Routing

By default, if the reindex operation encounters a document with routing, the routing is preserved unless changed by a script. You can control routing behavior using the routing parameter in the dest section:

keep: Preserves the routing from the source document (default)
discard: Removes routing from reindexed documents
=<text>: Sets routing to the specified value for all reindexed documents

The following request sets a custom routing value for all reindexed documents:

POST /_reindex
{
  "source": {
    "index": "source"
  },
  "dest": {
    "index": "dest",
    "routing": "=company_a"
  }
}

response = client.reindex(
  body =   {
    "source": {
      "index": "source"
    },
    "dest": {
      "index": "dest",
      "routing": "=company_a"
    }
  }
)

Transforming documents using ingest pipelines

To transform data, process documents through an ingest pipeline during reindexing. First create the pipeline, then reference it in the reindex operation:

POST /_reindex
{
  "source": {
    "index": "raw-data"
  },
  "dest": {
    "index": "processed-data",
    "pipeline": "data-enrichment"
  }
}

response = client.reindex(
  body =   {
    "source": {
      "index": "raw-data"
    },
    "dest": {
      "index": "processed-data",
      "pipeline": "data-enrichment"
    }
  }
)

Before running the reindex operation, create the ingest pipeline. This example creates a pipeline that adds a processed_at timestamp and converts the status field to uppercase:

PUT /_ingest/pipeline/data-enrichment
{
  "description": "Enriches documents during reindexing",
  "processors": [
    {
      "set": {
        "field": "processed_at",
        "value": ""
      }
    },
    {
      "uppercase": {
        "field": "status"
      }
    }
  ]
}

Controlling the reindex rate

Control the reindex rate to minimize cluster impact:

POST /_reindex?requests_per_second=500
{
  "source": {
    "index": "production-data"
  },
  "dest": {
    "index": "production-backup"
  }
}

response = client.reindex(
  params = { "requests_per_second": "500" },
  body =   {
    "source": {
      "index": "production-data"
    },
    "dest": {
      "index": "production-backup"
    }
  }
)

Script operations

You can transform documents during the reindex process using scripts. You can modify document content, metadata, and control which documents are processed.

Scripts can modify the following document metadata fields:

ctx._id: Change the document ID.
ctx._index: Route documents to different destination indexes.
ctx._version: Control document versioning.
ctx._routing: Set custom routing values.

Set the ctx.op field to control what happens to each document:

ctx.op = "index": Index the document normally (default behavior).
ctx.op = "create": Only create the document if it doesn’t exist.
ctx.op = "noop": Skip the document (useful for conditional processing).
ctx.op = "delete": Delete the document from the destination index.

Transforming field values

You can add or modify fields in documents during reindexing. For example, this script adds a timestamp and migration status to each document:

POST /_reindex
{
  "source": {
    "index": "source-data"
  },
  "dest": {
    "index": "migrated-data"
  },
  "script": {
    "source": "ctx._source.timestamp = System.currentTimeMillis(); ctx._source.status = 'migrated'"
  }
}

response = client.reindex(
  body =   {
    "source": {
      "index": "source-data"
    },
    "dest": {
      "index": "migrated-data"
    },
    "script": {
      "source": "ctx._source.timestamp = System.currentTimeMillis(); ctx._source.status = 'migrated'"
    }
  }
)

Renaming fields

You can rename fields during reindexing using scripts. This script renames client_name to customer_name and total_amount to order_total during the reindex operation:

POST /_reindex
{
  "source": {
    "index": "legacy-data"
  },
  "dest": {
    "index": "updated-data"
  },
  "script": {
    "source": "ctx._source.customer_name = ctx._source.remove('client_name'); ctx._source.order_total = ctx._source.remove('total_amount');"
  }
}

response = client.reindex(
  body =   {
    "source": {
      "index": "legacy-data"
    },
    "dest": {
      "index": "updated-data"
    },
    "script": {
      "source": "ctx._source.customer_name = ctx._source.remove('client_name'); ctx._source.order_total = ctx._source.remove('total_amount');"
    }
  }
)

Processing documents conditionally

You can skip documents based on conditions or apply different transformations. For example, this script skips archived documents and adds a migration timestamp to all others:

POST /_reindex
{
  "source": {
    "index": "mixed-data"
  },
  "dest": {
    "index": "processed-data"
  },
  "script": {
    "source": "if (ctx._source.category == 'archived') { ctx.op = 'noop' } else { ctx._source.migrated_at = new Date() }"
  }
}

response = client.reindex(
  body =   {
    "source": {
      "index": "mixed-data"
    },
    "dest": {
      "index": "processed-data"
    },
    "script": {
      "source": "if (ctx._source.category == 'archived') { ctx.op = 'noop' } else { ctx._source.migrated_at = new Date() }"
    }
  }
)

Routing documents to different indexes

You can dynamically route documents to different destination indexes based on document content. For example, this script routes products to category-specific indexes:

POST /_reindex
{
  "source": {
    "index": "product-catalog"
  },
  "dest": {
    "index": "placeholder-will-be-overridden"
  },
  "script": {
    "source": "ctx._index = 'products-' + ctx._source.category.toLowerCase()"
  }
}

response = client.reindex(
  body =   {
    "source": {
      "index": "product-catalog"
    },
    "dest": {
      "index": "placeholder-will-be-overridden"
    },
    "script": {
      "source": "ctx._index = 'products-' + ctx._source.category.toLowerCase()"
    }
  }
)

Consolidating time-based indexes

Use the following script to consolidate multiple time-based indexes into a single index:

POST /_reindex
{
  "source": {
    "index": [
      "logs-2024-01-*",
      "logs-2024-02-*",
      "logs-2024-03-*"
    ]
  },
  "dest": {
    "index": "logs-2024-q1"
  },
  "script": {
    "source": "ctx._source.quarter = 'Q1-2024'; ctx._source.consolidated_date = System.currentTimeMillis();"
  }
}

response = client.reindex(
  body =   {
    "source": {
      "index": [
        "logs-2024-01-*",
        "logs-2024-02-*",
        "logs-2024-03-*"
      ]
    },
    "dest": {
      "index": "logs-2024-q1"
    },
    "script": {
      "source": "ctx._source.quarter = 'Q1-2024'; ctx._source.consolidated_date = System.currentTimeMillis();"
    }
  }
)

This example consolidates 3 months of daily log indexes into a quarterly index while adding metadata about the consolidation.

Asynchronous operations

For large datasets, you can run reindex operations asynchronously to avoid blocking your application. When you set wait_for_completion=false, OpenSearch immediately returns a task ID that you can use to monitor the operation’s progress:

POST /_reindex?wait_for_completion=false
{
  "source": {
    "index": "large-source-index"
  },
  "dest": {
    "index": "destination-index"
  }
}

response = client.reindex(
  params = { "wait_for_completion": "false" },
  body =   {
    "source": {
      "index": "large-source-index"
    },
    "dest": {
      "index": "destination-index"
    }
  }
)

The response includes a task ID:

{
  "task": "oTUltX4IQMOUUVeiohTt8A:12345"
}

Use the Tasks API to check the status of your reindex operation:

GET /_tasks/oTUltX4IQMOUUVeiohTt8A:12345

You can manage long-running reindex tasks using these operations:

Cancel a running reindex: POST /_tasks/oTUltX4IQMOUUVeiohTt8A:12345/_cancel
List all reindex tasks: GET /_tasks?actions=*reindex*
Task cleanup: OpenSearch automatically removes completed task documents, but you can manually delete them if needed for immediate cleanup.

Cross-cluster reindexing

Copy data from a remote OpenSearch cluster:

POST /_reindex
{
  "source": {
    "remote": {
      "host": "https://remote-cluster.example.com:9200",
      "username": "reindex-user",
      "password": "secure-password"
    },
    "index": "remote-index",
    "size": 1000
  },
  "dest": {
    "index": "local-copy"
  }
}

response = client.reindex(
  body =   {
    "source": {
      "remote": {
        "host": "https://remote-cluster.example.com:9200",
        "username": "reindex-user",
        "password": "secure-password"
      },
      "index": "remote-index",
      "size": 1000
    },
    "dest": {
      "index": "local-copy"
    }
  }
)

SSL configuration for remote reindexing

When reindexing from remote clusters over HTTPS, configure SSL settings in opensearch.yml.

Certificate-based authentication

Configure SSL using individual certificate files:

reindex.ssl.certificate_authorities: ["/path/to/ca-cert.pem"]
reindex.ssl.certificate: "/path/to/client-cert.pem"
reindex.ssl.key: "/path/to/client-key.pem"
reindex.ssl.verification_mode: full

Keystore-based authentication

Configure SSL using keystore and truststore files:

reindex.ssl.keystore.path: "/path/to/keystore.p12"
reindex.ssl.keystore.type: "PKCS12"
reindex.ssl.truststore.path: "/path/to/truststore.p12"
reindex.ssl.truststore.type: "PKCS12"

SSL configuration options

The following table lists the available SSL configuration parameters.

Parameter	Description	Default
`reindex.ssl.verification_mode`	The certificate verification level: `full`, `certificate`, or `none`	`full`
`reindex.ssl.certificate_authorities`	A list of CA certificate file paths	None
`reindex.ssl.truststore.path`	The path to the truststore file (JKS or PKCS12)	None
`reindex.ssl.keystore.path`	The path to the keystore file for client authentication	None
`reindex.ssl.supported_protocols`	The supported TLS protocol versions	`TLSv1.3,TLSv1.2`

SSL settings must be configured in opensearch.yml and require a cluster restart. They cannot be set in the reindex request body.

Remote cluster allow list

Configure allowed remote hosts in opensearch.yml:

reindex.remote.allowlist: [
  "remote-cluster.example.com:9200",
  "backup-cluster.example.com:9200",
  "10.0.1.*:9200"
]

The allow list supports:

Explicit host:port combinations.
Wildcard patterns for IP ranges.
Multiple cluster endpoints.

Performance optimization

Use the following techniques to optimize reindexing performance.

Throttling and rate control

Control the reindex operation’s impact on cluster performance using throttling:

POST /_reindex?requests_per_second=500
{
  "source": {
    "index": "production-data"
  },
  "dest": {
    "index": "production-backup"
  }
}

response = client.reindex(
  params = { "requests_per_second": "500" },
  body =   {
    "source": {
      "index": "production-data"
    },
    "dest": {
      "index": "production-backup"
    }
  }
)

You can dynamically adjust throttling for running reindex operations:

POST /_reindex/task_id/_rethrottle?requests_per_second=200

Using slicing for parallel processing

Slicing divides a reindex operation into multiple parallel tasks to improve performance on large datasets.

Automatic slicing

To let OpenSearch determine the optimal number of slices, set the slices query parameter to auto:

POST /_reindex?slices=auto
{
  "source": {
    "index": "large-index"
  },
  "dest": {
    "index": "large-index-copy"
  }
}

response = client.reindex(
  params = { "slices": "auto" },
  body =   {
    "source": {
      "index": "large-index"
    },
    "dest": {
      "index": "large-index-copy"
    }
  }
)

Manual slicing

For more control over parallelization, you can manually configure slices by specifying the slice ID and total number of slices in the request body.

OpenSearch uses the max parameter to partition the dataset consistently across all slice requests. OpenSearch applies a hash function to each document using the max value to determine which slice the document belongs to. This ensures that:

Documents are distributed evenly across all slices.
Each document goes to exactly one slice (no duplicates or gaps).
All parallel requests must use the same max value for consistency.

For example, with max: 4, you can run four separate requests in parallel:

Request 1: {"id": 0, "max": 4} (processes slice 0)
Request 2: {"id": 1, "max": 4} (processes slice 1)
Request 3: {"id": 2, "max": 4} (processes slice 2)
Request 4: {"id": 3, "max": 4} (processes slice 3)

The following request processes slice 0 out of 4 total slices:

POST /_reindex
{
  "source": {
    "index": "large-index",
    "slice": {
      "id": 0,
      "max": 4
    }
  },
  "dest": {
    "index": "large-index-copy"
  }
}

response = client.reindex(
  body =   {
    "source": {
      "index": "large-index",
      "slice": {
        "id": 0,
        "max": 4
      }
    },
    "dest": {
      "index": "large-index-copy"
    }
  }
)

Run multiple requests with different slice IDs (0–3) for parallel processing.

Monitoring reindex operations

Use the following methods to monitor the progress and performance of your reindex operations.

Monitor all active reindex operations in your cluster:

GET /_tasks?actions=*reindex*&detailed=true

Check the progress of a specific reindex task using its task ID:

GET /_tasks/oTUltX4IQMOUUVeiohTt8A:12345

Monitor cluster performance and disk usage during reindex operations:

GET /_cluster/health

GET /_nodes/stats/indices/store

WAS THIS PAGE HELPFUL?

✔ Yes ✖ No

Tell us why

350 characters left

Have a question? Ask us on the OpenSearch forum.

Want to contribute? Edit this page or create an issue.