Skip to content

Expose document_id and file_path in retriever output#286

Open
rachedkko wants to merge 2 commits into
swiss-ai:masterfrom
rachedkko:feat/document-citations
Open

Expose document_id and file_path in retriever output#286
rachedkko wants to merge 2 commits into
swiss-ai:masterfrom
rachedkko:feat/document-citations

Conversation

@rachedkko

Copy link
Copy Markdown
Contributor

Summary

The Milvus collection already stores document_id and file_path, but the retriever's default output_fields didn't include them.

Changes

  • Retriever.retrieve / batch_retrieve default output_fields now include document_id and file_path
  • POST /v1/retrieve exposes them as documentId and filePath
  • GET /v1/chunks/{fileId}/{chunkId} does the same

Motivation

Agentic pipelines (AMMORE) need to cite the source document a chunk came from.

@fabnemEPFL

Copy link
Copy Markdown
Collaborator

Looks good to me @rachedkko, just that the tests have to be adapted, as well as the API documentation at docs/source/developer_documentation/retriever_api_specs.yaml

@fabnemEPFL fabnemEPFL left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me @rachedkko, just that the tests have to be adapted, as well as the API documentation at docs/source/developer_documentation/retriever_api_specs.yaml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants