Skip to content

fix(qdrant): improve document ingestion using add_documents over from_documents#13616

Open
lucifertrj wants to merge 5 commits into
langflow-ai:mainfrom
lucifertrj:fix-qdrant
Open

fix(qdrant): improve document ingestion using add_documents over from_documents#13616
lucifertrj wants to merge 5 commits into
langflow-ai:mainfrom
lucifertrj:fix-qdrant

Conversation

@lucifertrj

@lucifertrj lucifertrj commented Jun 10, 2026

Copy link
Copy Markdown

Overview

  • Switches from from_documents to add_documents so ingestion appends to an initialized Qdrant vector store instead of recreating the store during document insertion. Source: Langchain_qdrant
    • Updates the Qdrant component to initialize a client-backed vector store before ingestion. distance_func was unused earlier. It was just defined but never used for initialized.
    • Creates missing Qdrant collections using the embedding dimension and selected distance function.

[tests] Also Updates Qdrant unit tests to cover deterministic IDs, add_documents, and collection creation.

Tests passed:

uv run pytest src/backend/tests/unit/components/vectorstores/test_qdrant_vector_store_component.py
Screenshot 2026-06-10 at 21 05 50

Summary by CodeRabbit

  • New Features

    • Qdrant vector store automatically creates missing collections using embedding vector dimensions and configured distance metrics.
  • Bug Fixes

    • Improved connection settings handling with fallback defaults for ports.
    • Search queries now return empty results consistently when no matches are found.
    • Distance metric configuration properly maps to Qdrant's supported distance functions.
  • Tests

    • Added test coverage for automatic collection creation during initialization.

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8e69f831-4c77-4c08-b331-309a4d1d17b2

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

Qdrant component now explicitly creates missing collections using embedding vector dimensions and distance mapping, and refactors document ingestion to use add_documents with explicit IDs. Test helpers updated to capture IDs from mocked add_documents, and new test verifies collection creation logic.

Changes

Qdrant Collection and Document Handling

Layer / File(s) Summary
Distance mapping and Qdrant type imports
src/lfx/src/lfx/components/qdrant/qdrant.py
DISTANCE_MAP constant maps component distance strings (Cosine/Euclidean/Dot Product) to Qdrant Distance enum values. Imports updated to include QdrantClient, Distance, and VectorParams types needed for collection configuration.
Qdrant client initialization and collection management
src/lfx/src/lfx/components/qdrant/qdrant.py
build_vector_store constructs server_kwargs with conditional/default port handling (casting to integers when set, using 6333/6334 defaults when unset). Explicitly instantiates QdrantClient, checks collection existence, and creates missing collections with VectorParams using embedding vector length and DISTANCE_MAP-selected distance metric.
Document ingestion and search result handling
src/lfx/src/lfx/components/qdrant/qdrant.py
Document insertion changed from QdrantVectorStore.from_documents(...) to instantiating QdrantVectorStore with the client and calling qdrant.add_documents(documents=..., ids=...). search_documents ensures it returns [] when search yields no results.
Test helper refactoring and collection creation verification
src/backend/tests/unit/components/vectorstores/test_qdrant_vector_store_component.py
Test imports updated; _captured_ids helper refactored to mock QdrantVectorStore.add_documents for ID capture instead of mocking from_documents. New test test_qdrant_creates_missing_collection_from_embedding_dimension() verifies that missing collections are created with embedding-derived vector size and Distance.DOT for "Dot Product" distance selection.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • langflow-ai/langflow#13329: Modifies Qdrant build_vector_store and QdrantVectorStore construction in src/lfx/src/lfx/components/qdrant/qdrant.py, directly related to this PR's refactoring of collection creation and client handling.

Suggested labels

bug

Suggested reviewers

  • erichare
  • ogabrielluiz
🚥 Pre-merge checks | ✅ 7 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Quality And Coverage ⚠️ Warning Tests cover ID generation and collection creation but miss search_documents(), incomplete distance function coverage (1 of 3), and missing error case tests. Add tests for search_documents(), all distance functions, embedding validation, and edge cases.
✅ Passed checks (7 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Test Coverage For New Implementations ✅ Passed PR includes 5 substantive unit tests covering add_documents usage, UUID5 generation, distance mapping, and collection creation with proper naming conventions and meaningful assertions.
Test File Naming And Structure ✅ Passed Test follows pytest patterns with correct test_*.py naming, 5 descriptive test functions using proper mocking, setup helpers, and comprehensive coverage of positive/negative scenarios and edge cases.
Excessive Mock Usage Warning ✅ Passed Mocking is appropriate and focused on external dependencies (QdrantClient, QdrantVectorStore, Embeddings); core logic (UUID5, JSON serialization, custom_serializer) executes as real code, not mocked.
Title check ✅ Passed The title directly and accurately summarizes the main change: switching from from_documents to add_documents for Qdrant document ingestion.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@lucifertrj lucifertrj changed the title [fix-qdrant] add_documents is more preferred approach for Langchain_Qdrant fix(qdrant): improve document ingestion using add_documents over from_documents Jun 10, 2026
@github-actions github-actions Bot added the bug Something isn't working label Jun 10, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/backend/tests/unit/components/vectorstores/test_qdrant_vector_store_component.py (1)

1-119: ⚠️ Potential issue | 🔴 Critical

Qdrant component unit tests should use the ComponentTestBase harness

This test file uses standalone test_* functions and does not define the required ComponentTestBaseWithClient/ComponentTestBaseWithoutClient structure or the component_class, default_kwargs, and file_names_mapping fixtures. Other vector store component tests in src/backend/tests/unit/components/vectorstores/ follow the ComponentTestBase* base-class + fixture pattern, so this one is inconsistent with the established testing contract for component unit tests.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/backend/tests/unit/components/vectorstores/test_qdrant_vector_store_component.py`
around lines 1 - 119, The tests must be refactored to use the shared test
harness instead of standalone functions: convert this module to a test class
inheriting ComponentTestBaseWithClient or ComponentTestBaseWithoutClient
(depending on whether you need a mocked QdrantClient), set the module-level
fixtures component_class to QdrantVectorStoreComponent, default_kwargs to the
equivalent of the current _make_component args (collection_name, embedding,
ingest_data) and file_names_mapping as used by other vectorstore tests, then
move the existing test logic into methods that use the base-class helpers (e.g.,
use the base's client mocking rather than manual patching of
QdrantClient/QdrantVectorStore and use the base's build_vector_store call);
ensure helper functions like _captured_ids are reworked to use the base fixtures
and the component_class/default_kwargs symbols so tests conform to the
ComponentTestBase contract.

Source: Coding guidelines

🧹 Nitpick comments (1)
src/backend/tests/unit/components/vectorstores/test_qdrant_vector_store_component.py (1)

32-52: 🏗️ Heavy lift

Consider reducing mocking in favor of real Qdrant integration.

The helper now mocks both QdrantClient and QdrantVectorStore, which obscures whether the component correctly interacts with the real Qdrant SDK. As per coding guidelines, backend unit tests should avoid mocking when possible and prefer real integrations for more reliable tests.

Qdrant supports in-memory mode (:memory: path), which would allow testing the actual SDK integration without external infrastructure.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@src/backend/tests/unit/components/vectorstores/test_qdrant_vector_store_component.py`:
- Around line 16-17: The import block in the test that contains "from
qdrant_client.models import Distance" is unsorted (Ruff I001); reorder the
imports to follow the project's import sorting rules (alphabetical/grouping) and
then run the formatter to apply fixes — e.g., run the provided make
format_backend or run the ruff autofix command (uv run --only-dev ruff check
--fix) on the test file to correct the import ordering.

In `@src/lfx/src/lfx/components/qdrant/qdrant.py`:
- Around line 71-81: server_kwargs is built with both URL/path and
host/port/grpc_port fields, causing conflicting connection options when
instantiating QdrantClient; change the logic in the constructor (where
server_kwargs is created) to produce mutually exclusive branches: if self.url is
set build server_kwargs with only {"url": self.url, "api_key": ..., "prefix":
..., "timeout": ...}, else if self.path is set build server_kwargs with only
{"path": self.path, ...}, else build server_kwargs with {"host": self.host,
"port": int(self.port) or 6333, "grpc_port": int(self.grpc_port) or 6334, ...};
keep the filter that removes None values and then pass that server_kwargs to
QdrantClient so only one connection mode (url OR path OR host/port(+grpc_port))
is provided.
- Around line 99-105: The check-then-create for the Qdrant collection (using
client.collection_exists and client.create_collection) is racy under concurrent
runs; wrap the create_collection call (or use client.create_collection(...,
if_not_exists=True) if your qdrant-client supports it) to handle the “already
exists” (HTTP 409) case gracefully: compute vector_size via
self.embedding.embed_query("test") and distance via
DISTANCE_MAP.get(self.distance_func, Distance.COSINE) as before, but catch the
client error/exception thrown by create_collection and ignore or log only the
409 conflict while re-raising other errors so concurrent ingests don’t fail.

---

Outside diff comments:
In
`@src/backend/tests/unit/components/vectorstores/test_qdrant_vector_store_component.py`:
- Around line 1-119: The tests must be refactored to use the shared test harness
instead of standalone functions: convert this module to a test class inheriting
ComponentTestBaseWithClient or ComponentTestBaseWithoutClient (depending on
whether you need a mocked QdrantClient), set the module-level fixtures
component_class to QdrantVectorStoreComponent, default_kwargs to the equivalent
of the current _make_component args (collection_name, embedding, ingest_data)
and file_names_mapping as used by other vectorstore tests, then move the
existing test logic into methods that use the base-class helpers (e.g., use the
base's client mocking rather than manual patching of
QdrantClient/QdrantVectorStore and use the base's build_vector_store call);
ensure helper functions like _captured_ids are reworked to use the base fixtures
and the component_class/default_kwargs symbols so tests conform to the
ComponentTestBase contract.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f9d1ba39-c721-433b-8d5f-b911b5aa2451

📥 Commits

Reviewing files that changed from the base of the PR and between 3a8f48b and 5c9ded2.

📒 Files selected for processing (2)
  • src/backend/tests/unit/components/vectorstores/test_qdrant_vector_store_component.py
  • src/lfx/src/lfx/components/qdrant/qdrant.py

Comment on lines +16 to +17
from qdrant_client.models import Distance

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Fix the import sorting to pass CI.

The pipeline reports an I001 violation: the import block is unsorted. This will fail the Ruff style check.

Run the following command to fix the import ordering:

make format_backend

Alternatively, run:

uv run --only-dev ruff check --fix src/backend/tests/unit/components/vectorstores/test_qdrant_vector_store_component.py
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/backend/tests/unit/components/vectorstores/test_qdrant_vector_store_component.py`
around lines 16 - 17, The import block in the test that contains "from
qdrant_client.models import Distance" is unsorted (Ruff I001); reorder the
imports to follow the project's import sorting rules (alphabetical/grouping) and
then run the formatter to apply fixes — e.g., run the provided make
format_backend or run the ruff autofix command (uv run --only-dev ruff check
--fix) on the test file to correct the import ordering.

Source: Pipeline failures

Comment on lines 71 to 81
server_kwargs = {
"host": self.host or None,
"port": int(self.port), # Ensure port is an integer
"grpc_port": int(self.grpc_port), # Ensure grpc_port is an integer
"api_key": self.api_key,
"prefix": self.prefix,
# Ensure timeout is an integer
"port": int(self.port) if self.port else 6333,
"grpc_port": int(self.grpc_port) if self.grpc_port else 6334,
"api_key": self.api_key or None,
"prefix": self.prefix or None,
"timeout": int(self.timeout) if self.timeout else None,
"path": self.path or None,
"url": self.url or None,
}

server_kwargs = {k: v for k, v in server_kwargs.items() if v is not None}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="src/lfx/src/lfx/components/qdrant/qdrant.py"

# Show the relevant section with line numbers
nl -ba "$FILE" | sed -n '1,220p'

# Search for QdrantClient init and for url/path/host usage in this repo
rg -n "QdrantClient\(" -S .
rg -n "\b(self\.)?(url|path|host|port|grpc_port|api_key|prefix|timeout)\b" "$FILE"
rg -n "url\s*[:=]" -S src/lfx/src/lfx/components/qdrant
rg -n "path\s*[:=]" -S src/lfx/src/lfx/components/qdrant

Repository: langflow-ai/langflow

Length of output: 107


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="src/lfx/src/lfx/components/qdrant/qdrant.py"
[ -f "$FILE" ] || { echo "Missing file: $FILE"; exit 1; }

# Print first 220 lines with line numbers (no nl dependency)
awk '{printf "%6d\t%s\n", NR, $0}' "$FILE" | sed -n '1,220p'

# Search for QdrantClient init and key kwargs usage
rg -n "QdrantClient\(" -S .
rg -n "\b(self\.)?(url|path|host|port|grpc_port|api_key|prefix|timeout)\b" "$FILE" || true

# Narrow searches for literal assignments
rg -n "url\s*[:=]" -S src/lfx/src/lfx/components/qdrant || true
rg -n "path\s*[:=]" -S src/lfx/src/lfx/components/qdrant || true

Repository: langflow-ai/langflow

Length of output: 12682


🌐 Web query:

QdrantClient constructor parameters url vs host port precedence

💡 Result:

In the Qdrant Python client, the url and host parameters are mutually exclusive alternatives rather than competing for precedence [1][2][3]. You should choose one or the other based on your preferred configuration style: 1. URL parameter: Use this for a complete connection string, such as "http://localhost:6333" or "https://your-cloud-instance.com" [1][3]. If you include the port in the URL string, the client will parse it directly [4]. 2. Host/Port parameters: Use these for separate definition of the address and port, such as host="localhost" and port=6333 [1][3]. The client is designed to handle these as separate configuration methods [2]. Providing both simultaneously is generally discouraged or may lead to configuration errors depending on the specific implementation version [4]. For clarity and to avoid ambiguity, it is recommended to use the url parameter for full connection strings or host/port for explicit address configuration [1][2].

Citations:


🌐 Web query:

QdrantClient constructor parameters path vs host port grpc_port behavior

💡 Result:

In the Python client, QdrantClient uses separate endpoints for REST (HTTP) and gRPC: 1) path vs host/port - “path” is used for local persistence (SQLite) when you instantiate the client with a filesystem path or the special value “:memory:”. For example, QdrantClient(":memory:") runs in local mode and QdrantClient(path="path/to/db") persists changes to disk. [1] - When connecting to a remote Qdrant server, the constructor expects host/port (or url). Example: QdrantClient(host="localhost", port=6333) and QdrantClient(url="http://localhost:6333"). [2][1] - The tests confirm URL takes precedence over the port parameter (and the port used for REST defaults for backward compatibility). [3] So for remote connections, “path” should not be confused with a URL path prefix for API routing; use the url/host/port/prefix options instead. (A “prefix” option exists and affects the REST URI path.) [3] 2) grpc_port and its behavior relative to port - The client documents that HTTP (REST) defaults to port 6333, while gRPC defaults to port 6334, and that gRPC can be enabled via prefer_grpc=True. [4] - The Python client docs show an example gRPC initialization: QdrantClient(host="localhost", grpc_port=6334, prefer_grpc=True). [4][2] - The gRPC port is a separate configuration parameter from the REST port; the README and docs show both can be specified independently (REST via port, gRPC via grpc_port). [4][2] 3) Which requests use which port/interface? - When prefer_grpc=True and gRPC is available, the client prefers the gRPC interface for operations that have gRPC support; otherwise it falls back to REST for operations that are not implemented in gRPC. This is explicitly discussed in the client PR that mentions you “can’t assign the same port for two interfaces” because both interfaces are valid and used separately in some cases. [5] Practical guidance - Remote Qdrant (HTTP REST): set host and port (default 6333). [4][2][1] - Remote Qdrant (gRPC): set grpc_port (default 6334) and enable prefer_grpc=True. [4][2] - Local mode: use ":memory:" or path="..." instead of host/port/grpc_port. [1] - If you use url="http://host:port", it takes precedence over the port argument. [3]

Citations:


🌐 Web query:

qdrant_client create_collection behavior if collection already exists exist_ok

💡 Result:

In the Qdrant Python client, the behavior when creating a collection that already exists is controlled by the if_not_exists parameter in the create_collection() method [1][2]. If you call create_collection() without the if_not_exists parameter (or with it set to False, which is the default), and the collection already exists, the client will raise a ValueError [1][2]. If you wish to avoid this error and have the method return successfully if the collection already exists, you should set if_not_exists=True [1][2]. When this flag is enabled, the client performs a check for the collection's existence; if it is found, the method returns True immediately without attempting to recreate it [1][2]. This parameter was added to both QdrantClient and AsyncQdrantClient to provide a native, idempotent way to handle collection initialization [1][2].

Citations:


Split server_kwargs by connection mode (avoid mixing url/path with host/port/grpc_port).

server_kwargs always includes host/port/grpc_port because host has a default and port values are synthesized, even when self.url or self.path are set—so QdrantClient(**server_kwargs) receives competing connection options (url + host/port, and/or path + host/port). Build mutually exclusive kwargs branches: url only, else path only, else host+port(+grpc_port).

Suggested direction
-        server_kwargs = {
-            "host": self.host or None,
-            "port": int(self.port) if self.port else 6333,
-            "grpc_port": int(self.grpc_port) if self.grpc_port else 6334,
-            "api_key": self.api_key or None,
-            "prefix": self.prefix or None,
-            "timeout": int(self.timeout) if self.timeout else None,
-            "path": self.path or None,
-            "url": self.url or None,
-        }
+        common_kwargs = {
+            "api_key": self.api_key or None,
+            "prefix": self.prefix or None,
+            "timeout": int(self.timeout) if self.timeout else None,
+        }
+        if self.url:
+            server_kwargs = {
+                **common_kwargs,
+                "url": self.url,
+            }
+        elif self.path:
+            server_kwargs = {
+                **common_kwargs,
+                "path": self.path,
+            }
+        else:
+            server_kwargs = {
+                **common_kwargs,
+                "host": self.host or "localhost",
+                "port": int(self.port) if self.port else 6333,
+                "grpc_port": int(self.grpc_port) if self.grpc_port else 6334,
+            }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
server_kwargs = {
"host": self.host or None,
"port": int(self.port), # Ensure port is an integer
"grpc_port": int(self.grpc_port), # Ensure grpc_port is an integer
"api_key": self.api_key,
"prefix": self.prefix,
# Ensure timeout is an integer
"port": int(self.port) if self.port else 6333,
"grpc_port": int(self.grpc_port) if self.grpc_port else 6334,
"api_key": self.api_key or None,
"prefix": self.prefix or None,
"timeout": int(self.timeout) if self.timeout else None,
"path": self.path or None,
"url": self.url or None,
}
server_kwargs = {k: v for k, v in server_kwargs.items() if v is not None}
common_kwargs = {
"api_key": self.api_key or None,
"prefix": self.prefix or None,
"timeout": int(self.timeout) if self.timeout else None,
}
if self.url:
server_kwargs = {
**common_kwargs,
"url": self.url,
}
elif self.path:
server_kwargs = {
**common_kwargs,
"path": self.path,
}
else:
server_kwargs = {
**common_kwargs,
"host": self.host or "localhost",
"port": int(self.port) if self.port else 6333,
"grpc_port": int(self.grpc_port) if self.grpc_port else 6334,
}
server_kwargs = {k: v for k, v in server_kwargs.items() if v is not None}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lfx/src/lfx/components/qdrant/qdrant.py` around lines 71 - 81,
server_kwargs is built with both URL/path and host/port/grpc_port fields,
causing conflicting connection options when instantiating QdrantClient; change
the logic in the constructor (where server_kwargs is created) to produce
mutually exclusive branches: if self.url is set build server_kwargs with only
{"url": self.url, "api_key": ..., "prefix": ..., "timeout": ...}, else if
self.path is set build server_kwargs with only {"path": self.path, ...}, else
build server_kwargs with {"host": self.host, "port": int(self.port) or 6333,
"grpc_port": int(self.grpc_port) or 6334, ...}; keep the filter that removes
None values and then pass that server_kwargs to QdrantClient so only one
connection mode (url OR path OR host/port(+grpc_port)) is provided.

Comment on lines +99 to +105
if not client.collection_exists(self.collection_name):
vector_size = len(self.embedding.embed_query("test"))
distance = DISTANCE_MAP.get(self.distance_func, Distance.COSINE)
client.create_collection(
collection_name=self.collection_name,
vectors_config=VectorParams(size=vector_size, distance=distance),
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the file and the surrounding code for collection creation logic
FILE="src/lfx/src/lfx/components/qdrant/qdrant.py"
if [ ! -f "$FILE" ]; then
  echo "Missing file: $FILE"
  exit 1
fi

echo "---- $FILE (lines 60-140) ----"
nl -ba "$FILE" | sed -n '60,140p'

echo "---- Search for create_collection usage in the repo ----"
rg -n --hidden --no-ignore -S "create_collection\(" . || true

echo "---- Search for collection_exists usage in the repo ----"
rg -n --hidden --no-ignore -S "collection_exists\(" . || true

Repository: langflow-ai/langflow

Length of output: 195


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="src/lfx/src/lfx/components/qdrant/qdrant.py"
if [ ! -f "$FILE" ]; then
  echo "Missing file: $FILE"
  exit 1
fi

echo "---- $FILE (lines 60-140) ----"
cat -n "$FILE" | sed -n '60,140p'

echo "---- Search for create_collection usage in the repo ----"
rg -n --hidden --no-ignore -S "create_collection\(" . || true

echo "---- Search for collection_exists usage in the repo ----"
rg -n --hidden --no-ignore -S "collection_exists\(" . || true

echo "---- Search for Qdrant client collection creation/update error handling patterns ----"
rg -n --hidden --no-ignore -S "(CollectionAlreadyExists|already exists|status.*409|create_collection|collection_exists)" src/lfx/src/lfx/components/qdrant/qdrant.py . || true

Repository: langflow-ai/langflow

Length of output: 50377


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate and inspect check_cached_vector_store implementation
rg -n "def check_cached_vector_store|check_cached_vector_store" src/lfx/src/lfx/base/vectorstores -S || true
rg -n "check_cached_vector_store" src/lfx/src/lfx -S || true

echo "---- Read check_cached_vector_store definition ----"
FILE_MATCH=$(rg -n "def check_cached_vector_store" -S src/lfx/src/lfx | head -n 1 | cut -d: -f1)
if [ -n "${FILE_MATCH:-}" ] && [ -f "$FILE_MATCH" ]; then
  echo "Found: $FILE_MATCH"
  wc -l "$FILE_MATCH"
  sed -n '1,200p' "$FILE_MATCH"
else
  echo "Could not find check_cached_vector_store definition"
fi

echo "---- Inspect qdrant.py imports and surrounding class definition ----"
FILE="src/lfx/src/lfx/components/qdrant/qdrant.py"
wc -l "$FILE"
sed -n '1,140p' "$FILE"

Repository: langflow-ai/langflow

Length of output: 50377


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="src/lfx/src/lfx/base/vectorstores/model.py"
if [ ! -f "$FILE" ]; then
  echo "Missing $FILE"
  exit 1
fi

wc -l "$FILE"
rg -n "def check_cached_vector_store|check_cached_vector_store" "$FILE"
# Show around the decorator definition
LINE=$(rg -n "def check_cached_vector_store" "$FILE" | head -n 1 | cut -d: -f1)
START=$((LINE-30))
END=$((LINE+120))
sed -n "${START},${END}p" "$FILE"

Repository: langflow-ai/langflow

Length of output: 2099


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="src/lfx/src/lfx/base/vectorstores/model.py"
LINE=$(rg -n "def check_cached_vector_store" "$FILE" | head -n 1 | cut -d: -f1)
START=$((LINE-20))
if [ "$START" -lt 1 ]; then START=1; fi
END=$((LINE+160))

echo "check_cached_vector_store at line $LINE; printing $START-$END from $FILE"
wc -l "$FILE"
sed -n "${START},${END}p" "$FILE"

Repository: langflow-ai/langflow

Length of output: 6895


🌐 Web query:

QdrantClient.create_collection when collection already exists throws which exception HTTP status 409 conflict Collection already exists

💡 Result:

When you call QdrantClient.create_collection for a collection name that already exists, it results in an HTTP 409 Conflict (“Collection ... already exists!”) error; e.g., the client surfaces “Unexpected Response: 409 (Conflict)” with the server message “Wrong input: Collection <name> already exists!” [1]. Note: newer qdrant-client versions introduced an optional if_not_exists parameter to avoid this by returning immediately when the collection already exists [2].

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="src/lfx/src/lfx/components/qdrant/qdrant.py"
wc -l "$FILE"
sed -n '1,80p' "$FILE"

Repository: langflow-ai/langflow

Length of output: 3188


Make collection creation idempotent under concurrent runs.

@check_cached_vector_store only caches within a single component invocation, so it doesn’t prevent a cross-process race. With the current check-then-create, parallel runs can both observe collection_exists(...) == False; the loser then calls create_collection(...) for an existing collection and hits the Qdrant “already exists” (HTTP 409 Conflict) error.

if not client.collection_exists(self.collection_name):
    vector_size = len(self.embedding.embed_query("test"))
    distance = DISTANCE_MAP.get(self.distance_func, Distance.COSINE)
    client.create_collection(
        collection_name=self.collection_name,
        vectors_config=VectorParams(size=vector_size, distance=distance),
    )

Handle the “already exists”/409 case around create_collection (or use if_not_exists if your qdrant-client version supports it) so concurrent ingests remain stable.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lfx/src/lfx/components/qdrant/qdrant.py` around lines 99 - 105, The
check-then-create for the Qdrant collection (using client.collection_exists and
client.create_collection) is racy under concurrent runs; wrap the
create_collection call (or use client.create_collection(..., if_not_exists=True)
if your qdrant-client supports it) to handle the “already exists” (HTTP 409)
case gracefully: compute vector_size via self.embedding.embed_query("test") and
distance via DISTANCE_MAP.get(self.distance_func, Distance.COSINE) as before,
but catch the client error/exception thrown by create_collection and ignore or
log only the 409 conflict while re-raising other errors so concurrent ingests
don’t fail.

@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Jun 10, 2026
@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant