Self Checks
RAGFlow workspace code commit ID
0d836af
RAGFlow image version
0d836af
Other environment information
Actual behavior
Legacy parse and stop_parsing build the docstore index name from the requester tenant_id injected by @add_tenant_id_to_kwargs, not from the dataset owner's tenant (kb.tenant_id).
# chunk_api.py — parse & stop_parsing (current main)
index_name = search.index_name(tenant_id) # requester user id
if settings.docStoreConn.index_exist(index_name, dataset_id):
settings.docStoreConn.delete({"doc_id": ...}, index_name, dataset_id)
Chunks for a dataset are indexed under the owner tenant index (ragflow_{kb.tenant_id}), which is what every other handler in the same file already uses via _get_dataset_tenant_id():
# chunk_api.py — list_chunks, add_chunk, rm_chunk, etc.
dataset_tenant_id = _get_dataset_tenant_id(dataset_id)
search.index_name(dataset_tenant_id)
When team member B re-parses or stops parsing on a dataset owned by A:
KnowledgebaseService.accessible passes (team permission).
index_exist(ragflow_B, dataset_id) is false (real data lives in ragflow_A).
- Delete is skipped (logged as "index does not exist").
- Stale chunks remain in the docstore; re-parse can duplicate or serve outdated chunks.
Expected behavior
Resolve index_name from the dataset owner (_get_dataset_tenant_id(dataset_id) or doc[0].kb_id owner tenant) and pass doc[0].kb_id as the kb shard id — matching list_chunks, document_api.py guarded deletes, and the newer POST /documents/parse flow.
Steps to reproduce
1. Deploy RAGFlow `main` @ `0d836afd3`.
2. **User A** creates dataset `DS` with `permission = team` and uploads/parses document `DOC`.
3. Invite **User B** to the same tenant (team member, not owner of `DS`).
4. As **User B**, start parsing `DOC` again via legacy route:
curl -sS -X POST "http://<HOST>/api/v1/datasets/DS/chunks" \
-H "Authorization: Bearer KEY_B" \
-H "Content-Type: application/json" \
-d '{"document_ids": ["DOC"]}'
5. Observe logs: `Skipping chunk delete during parse for doc DOC: index ragflow_<B>/DS does not exist`
6. Query chunks (as either user) — old chunks from the prior parse are still present alongside new ones.
**Control:** Same operation as dataset owner **User A** deletes from `ragflow_<A>` and does not leave stale chunks.
Additional information
Suggested fix:
dataset_tenant_id = _get_dataset_tenant_id(dataset_id)
index_name = search.index_name(dataset_tenant_id)
if settings.docStoreConn.index_exist(index_name, doc[0].kb_id):
settings.docStoreConn.delete({"doc_id": id}, index_name, doc[0].kb_id)
Add unit tests where accessible passes for a non-owner team member but get_by_id returns kb.tenant_id = owner-tenant, asserting delete uses search.index_name(owner-tenant).
Self Checks
RAGFlow workspace code commit ID
0d836af
RAGFlow image version
0d836af
Other environment information
Actual behavior
Legacy
parseandstop_parsingbuild the docstore index name from the requestertenant_idinjected by@add_tenant_id_to_kwargs, not from the dataset owner's tenant (kb.tenant_id).Chunks for a dataset are indexed under the owner tenant index (
ragflow_{kb.tenant_id}), which is what every other handler in the same file already uses via_get_dataset_tenant_id():When team member B re-parses or stops parsing on a dataset owned by A:
KnowledgebaseService.accessiblepasses (team permission).index_exist(ragflow_B, dataset_id)is false (real data lives inragflow_A).Expected behavior
Resolve
index_namefrom the dataset owner (_get_dataset_tenant_id(dataset_id)ordoc[0].kb_idowner tenant) and passdoc[0].kb_idas the kb shard id — matchinglist_chunks,document_api.pyguarded deletes, and the newerPOST /documents/parseflow.Steps to reproduce
Additional information
Suggested fix:
Add unit tests where
accessiblepasses for a non-owner team member butget_by_idreturnskb.tenant_id = owner-tenant, asserting delete usessearch.index_name(owner-tenant).