Skip to content

[Feature Request]: Enhance orchestrating data transform nodes for data pipeline. #10427

@yingfeng

Description

@yingfeng

Self Checks

  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (Language Policy).
  • Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
  • Please do not modify this template :) and fill in all the required fields.

Is your feature request related to a problem?

Describe the feature you'd like

Currently, the usage of the data transform node in the data pipeline is very rigid: its downstream operator can only be indexing/embedding. We need it to be orchestratable within the pipeline—for example, to interleave summaries with the original chunks.

For that purpose, we need to add component to manipulate variables, like, data operations, variable aggregator/assigner, and conversation variable creations.

Scenarios Example

According to https://arxiv.org/abs/2510.06999

The document-level summary is attached to each chunk(Summary-Augmented Chunking) such that to resolve Document-Level Retrieval Mismatch issue for legal retrieval

Describe implementation you've considered

No response

Documentation, adoption, use case

Additional information

No response

Metadata

Metadata

Assignees

Labels

💞 featureFeature request, pull request that fullfill a new feature.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions