Skip to content

Conversation

stevenlx96
Copy link
Contributor

Description

Related to issue #2851

The database environment is Milvus + TuGraph + MySQL

The main problem is that when we delete a document in KnowledgeGraph Space, we only delete the TuGraph data, and the corresponding data remains in the Milvus _CHUNK_HISTORY collection, dragging down the database performance. This update only focuses on deleting the matching data in the Milvus environment when the user press the delete on the WebUI, while the change in chroma environment is expected in the future update.

This update contains two parts:

  1. Passing the unique doc id from asyncing process to the Metadata in the Milvus _CHUNK_HISTORY collection.
  2. Add the delete_by_file_id function when user deletes the document in knowledge space.

For part 1, the doc id is added as a new optional parameter and passed all the way down to the graph extractor. Then it will be loaded into the metadata.

For part 2, when we detect the user is using Milvus as the vector store environment, the delete_document method will also call the delete_by_file_id, deleting the Milvus data with the matching metadata.

How Has This Been Tested?

Several documents have been uploaded and deleted with no error found.

Snapshots:

This is how matching doc_id is stored in the Milvus collection
FileID

When user deletes a document in knowledge space...
dete

the corresponding data is also deleted in the collection!
after delete

Checklist:

  • My code follows the style guidelines of this project
  • I have already rebased the commits and make the commit message conform to the project standard.
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • Any dependent changes have been merged and published in downstream modules

Pass the unique doc id alias file id into the Milvus _CHUNK_HISTORY collection metadata.
Based on the file id in Milvus metadata, if the user deletes the document in KnowledgeGraph space, delete_by_file_id will be called to delete the corresponding _CHUNK_HISTORY data.
Reformat to pass the code style
reformat the graph_extractor
Update with correct code style
Update to match the code style
@stevenlx96 stevenlx96 changed the title Bugfix/milvus chunk history doc delete fix(core): Delete corresponding Milvus data when using with TuGraph Jul 23, 2025
@github-actions github-actions bot added core Module: core fix Bug fixes labels Jul 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Module: core fix Bug fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant