fix(qdrant): improve document ingestion using add_documents over from_documents#13616
fix(qdrant): improve document ingestion using add_documents over from_documents#13616lucifertrj wants to merge 5 commits into
Conversation
…yntax to use unsused distance_func
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
WalkthroughQdrant component now explicitly creates missing collections using embedding vector dimensions and distance mapping, and refactors document ingestion to use ChangesQdrant Collection and Document Handling
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 7 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (7 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/backend/tests/unit/components/vectorstores/test_qdrant_vector_store_component.py (1)
1-119:⚠️ Potential issue | 🔴 CriticalQdrant component unit tests should use the ComponentTestBase harness
This test file uses standalone
test_*functions and does not define the requiredComponentTestBaseWithClient/ComponentTestBaseWithoutClientstructure or thecomponent_class,default_kwargs, andfile_names_mappingfixtures. Other vector store component tests insrc/backend/tests/unit/components/vectorstores/follow theComponentTestBase*base-class + fixture pattern, so this one is inconsistent with the established testing contract for component unit tests.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/backend/tests/unit/components/vectorstores/test_qdrant_vector_store_component.py` around lines 1 - 119, The tests must be refactored to use the shared test harness instead of standalone functions: convert this module to a test class inheriting ComponentTestBaseWithClient or ComponentTestBaseWithoutClient (depending on whether you need a mocked QdrantClient), set the module-level fixtures component_class to QdrantVectorStoreComponent, default_kwargs to the equivalent of the current _make_component args (collection_name, embedding, ingest_data) and file_names_mapping as used by other vectorstore tests, then move the existing test logic into methods that use the base-class helpers (e.g., use the base's client mocking rather than manual patching of QdrantClient/QdrantVectorStore and use the base's build_vector_store call); ensure helper functions like _captured_ids are reworked to use the base fixtures and the component_class/default_kwargs symbols so tests conform to the ComponentTestBase contract.Source: Coding guidelines
🧹 Nitpick comments (1)
src/backend/tests/unit/components/vectorstores/test_qdrant_vector_store_component.py (1)
32-52: 🏗️ Heavy liftConsider reducing mocking in favor of real Qdrant integration.
The helper now mocks both
QdrantClientandQdrantVectorStore, which obscures whether the component correctly interacts with the real Qdrant SDK. As per coding guidelines, backend unit tests should avoid mocking when possible and prefer real integrations for more reliable tests.Qdrant supports in-memory mode (
:memory:path), which would allow testing the actual SDK integration without external infrastructure.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@src/backend/tests/unit/components/vectorstores/test_qdrant_vector_store_component.py`:
- Around line 16-17: The import block in the test that contains "from
qdrant_client.models import Distance" is unsorted (Ruff I001); reorder the
imports to follow the project's import sorting rules (alphabetical/grouping) and
then run the formatter to apply fixes — e.g., run the provided make
format_backend or run the ruff autofix command (uv run --only-dev ruff check
--fix) on the test file to correct the import ordering.
In `@src/lfx/src/lfx/components/qdrant/qdrant.py`:
- Around line 71-81: server_kwargs is built with both URL/path and
host/port/grpc_port fields, causing conflicting connection options when
instantiating QdrantClient; change the logic in the constructor (where
server_kwargs is created) to produce mutually exclusive branches: if self.url is
set build server_kwargs with only {"url": self.url, "api_key": ..., "prefix":
..., "timeout": ...}, else if self.path is set build server_kwargs with only
{"path": self.path, ...}, else build server_kwargs with {"host": self.host,
"port": int(self.port) or 6333, "grpc_port": int(self.grpc_port) or 6334, ...};
keep the filter that removes None values and then pass that server_kwargs to
QdrantClient so only one connection mode (url OR path OR host/port(+grpc_port))
is provided.
- Around line 99-105: The check-then-create for the Qdrant collection (using
client.collection_exists and client.create_collection) is racy under concurrent
runs; wrap the create_collection call (or use client.create_collection(...,
if_not_exists=True) if your qdrant-client supports it) to handle the “already
exists” (HTTP 409) case gracefully: compute vector_size via
self.embedding.embed_query("test") and distance via
DISTANCE_MAP.get(self.distance_func, Distance.COSINE) as before, but catch the
client error/exception thrown by create_collection and ignore or log only the
409 conflict while re-raising other errors so concurrent ingests don’t fail.
---
Outside diff comments:
In
`@src/backend/tests/unit/components/vectorstores/test_qdrant_vector_store_component.py`:
- Around line 1-119: The tests must be refactored to use the shared test harness
instead of standalone functions: convert this module to a test class inheriting
ComponentTestBaseWithClient or ComponentTestBaseWithoutClient (depending on
whether you need a mocked QdrantClient), set the module-level fixtures
component_class to QdrantVectorStoreComponent, default_kwargs to the equivalent
of the current _make_component args (collection_name, embedding, ingest_data)
and file_names_mapping as used by other vectorstore tests, then move the
existing test logic into methods that use the base-class helpers (e.g., use the
base's client mocking rather than manual patching of
QdrantClient/QdrantVectorStore and use the base's build_vector_store call);
ensure helper functions like _captured_ids are reworked to use the base fixtures
and the component_class/default_kwargs symbols so tests conform to the
ComponentTestBase contract.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f9d1ba39-c721-433b-8d5f-b911b5aa2451
📒 Files selected for processing (2)
src/backend/tests/unit/components/vectorstores/test_qdrant_vector_store_component.pysrc/lfx/src/lfx/components/qdrant/qdrant.py
| from qdrant_client.models import Distance | ||
|
|
There was a problem hiding this comment.
Fix the import sorting to pass CI.
The pipeline reports an I001 violation: the import block is unsorted. This will fail the Ruff style check.
Run the following command to fix the import ordering:
make format_backendAlternatively, run:
uv run --only-dev ruff check --fix src/backend/tests/unit/components/vectorstores/test_qdrant_vector_store_component.py🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@src/backend/tests/unit/components/vectorstores/test_qdrant_vector_store_component.py`
around lines 16 - 17, The import block in the test that contains "from
qdrant_client.models import Distance" is unsorted (Ruff I001); reorder the
imports to follow the project's import sorting rules (alphabetical/grouping) and
then run the formatter to apply fixes — e.g., run the provided make
format_backend or run the ruff autofix command (uv run --only-dev ruff check
--fix) on the test file to correct the import ordering.
Source: Pipeline failures
| server_kwargs = { | ||
| "host": self.host or None, | ||
| "port": int(self.port), # Ensure port is an integer | ||
| "grpc_port": int(self.grpc_port), # Ensure grpc_port is an integer | ||
| "api_key": self.api_key, | ||
| "prefix": self.prefix, | ||
| # Ensure timeout is an integer | ||
| "port": int(self.port) if self.port else 6333, | ||
| "grpc_port": int(self.grpc_port) if self.grpc_port else 6334, | ||
| "api_key": self.api_key or None, | ||
| "prefix": self.prefix or None, | ||
| "timeout": int(self.timeout) if self.timeout else None, | ||
| "path": self.path or None, | ||
| "url": self.url or None, | ||
| } | ||
|
|
||
| server_kwargs = {k: v for k, v in server_kwargs.items() if v is not None} |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="src/lfx/src/lfx/components/qdrant/qdrant.py"
# Show the relevant section with line numbers
nl -ba "$FILE" | sed -n '1,220p'
# Search for QdrantClient init and for url/path/host usage in this repo
rg -n "QdrantClient\(" -S .
rg -n "\b(self\.)?(url|path|host|port|grpc_port|api_key|prefix|timeout)\b" "$FILE"
rg -n "url\s*[:=]" -S src/lfx/src/lfx/components/qdrant
rg -n "path\s*[:=]" -S src/lfx/src/lfx/components/qdrantRepository: langflow-ai/langflow
Length of output: 107
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="src/lfx/src/lfx/components/qdrant/qdrant.py"
[ -f "$FILE" ] || { echo "Missing file: $FILE"; exit 1; }
# Print first 220 lines with line numbers (no nl dependency)
awk '{printf "%6d\t%s\n", NR, $0}' "$FILE" | sed -n '1,220p'
# Search for QdrantClient init and key kwargs usage
rg -n "QdrantClient\(" -S .
rg -n "\b(self\.)?(url|path|host|port|grpc_port|api_key|prefix|timeout)\b" "$FILE" || true
# Narrow searches for literal assignments
rg -n "url\s*[:=]" -S src/lfx/src/lfx/components/qdrant || true
rg -n "path\s*[:=]" -S src/lfx/src/lfx/components/qdrant || trueRepository: langflow-ai/langflow
Length of output: 12682
🌐 Web query:
QdrantClient constructor parameters url vs host port precedence
💡 Result:
In the Qdrant Python client, the url and host parameters are mutually exclusive alternatives rather than competing for precedence [1][2][3]. You should choose one or the other based on your preferred configuration style: 1. URL parameter: Use this for a complete connection string, such as "http://localhost:6333" or "https://your-cloud-instance.com" [1][3]. If you include the port in the URL string, the client will parse it directly [4]. 2. Host/Port parameters: Use these for separate definition of the address and port, such as host="localhost" and port=6333 [1][3]. The client is designed to handle these as separate configuration methods [2]. Providing both simultaneously is generally discouraged or may lead to configuration errors depending on the specific implementation version [4]. For clarity and to avoid ambiguity, it is recommended to use the url parameter for full connection strings or host/port for explicit address configuration [1][2].
Citations:
- 1: https://github.com/qdrant/qdrant-client/blob/master/README.md
- 2: https://qdrant-qdrant-18.mintlify.app/api/clients/python
- 3: https://pypi.org/project/qdrant-client/
- 4: https://github.com/qdrant/qdrant-js/blob/master/packages/js-client-rest/src/qdrant-client.ts
🌐 Web query:
QdrantClient constructor parameters path vs host port grpc_port behavior
💡 Result:
In the Python client, QdrantClient uses separate endpoints for REST (HTTP) and gRPC: 1) path vs host/port - “path” is used for local persistence (SQLite) when you instantiate the client with a filesystem path or the special value “:memory:”. For example, QdrantClient(":memory:") runs in local mode and QdrantClient(path="path/to/db") persists changes to disk. [1] - When connecting to a remote Qdrant server, the constructor expects host/port (or url). Example: QdrantClient(host="localhost", port=6333) and QdrantClient(url="http://localhost:6333"). [2][1] - The tests confirm URL takes precedence over the port parameter (and the port used for REST defaults for backward compatibility). [3] So for remote connections, “path” should not be confused with a URL path prefix for API routing; use the url/host/port/prefix options instead. (A “prefix” option exists and affects the REST URI path.) [3] 2) grpc_port and its behavior relative to port - The client documents that HTTP (REST) defaults to port 6333, while gRPC defaults to port 6334, and that gRPC can be enabled via prefer_grpc=True. [4] - The Python client docs show an example gRPC initialization: QdrantClient(host="localhost", grpc_port=6334, prefer_grpc=True). [4][2] - The gRPC port is a separate configuration parameter from the REST port; the README and docs show both can be specified independently (REST via port, gRPC via grpc_port). [4][2] 3) Which requests use which port/interface? - When prefer_grpc=True and gRPC is available, the client prefers the gRPC interface for operations that have gRPC support; otherwise it falls back to REST for operations that are not implemented in gRPC. This is explicitly discussed in the client PR that mentions you “can’t assign the same port for two interfaces” because both interfaces are valid and used separately in some cases. [5] Practical guidance - Remote Qdrant (HTTP REST): set host and port (default 6333). [4][2][1] - Remote Qdrant (gRPC): set grpc_port (default 6334) and enable prefer_grpc=True. [4][2] - Local mode: use ":memory:" or path="..." instead of host/port/grpc_port. [1] - If you use url="http://host:port", it takes precedence over the port argument. [3]
Citations:
- 1: https://pypi.org/project/qdrant-client/
- 2: https://github.com/qdrant/qdrant-client/blob/master/README.md
- 3: https://github.com/qdrant/qdrant-client/blob/master/tests/test_qdrant_client.py
- 4: https://qdrant-qdrant-18.mintlify.app/api/clients/python
- 5: Fix grpc port by configure via url qdrant/qdrant-client#645
🌐 Web query:
qdrant_client create_collection behavior if collection already exists exist_ok
💡 Result:
In the Qdrant Python client, the behavior when creating a collection that already exists is controlled by the if_not_exists parameter in the create_collection() method [1][2]. If you call create_collection() without the if_not_exists parameter (or with it set to False, which is the default), and the collection already exists, the client will raise a ValueError [1][2]. If you wish to avoid this error and have the method return successfully if the collection already exists, you should set if_not_exists=True [1][2]. When this flag is enabled, the client performs a check for the collection's existence; if it is found, the method returns True immediately without attempting to recreate it [1][2]. This parameter was added to both QdrantClient and AsyncQdrantClient to provide a native, idempotent way to handle collection initialization [1][2].
Citations:
- 1: Add if_not_exists parameter to create_collection qdrant/qdrant-client#1149
- 2: Add if_not_exists parameter to create_collection qdrant/qdrant-client#1149
Split server_kwargs by connection mode (avoid mixing url/path with host/port/grpc_port).
server_kwargs always includes host/port/grpc_port because host has a default and port values are synthesized, even when self.url or self.path are set—so QdrantClient(**server_kwargs) receives competing connection options (url + host/port, and/or path + host/port). Build mutually exclusive kwargs branches: url only, else path only, else host+port(+grpc_port).
Suggested direction
- server_kwargs = {
- "host": self.host or None,
- "port": int(self.port) if self.port else 6333,
- "grpc_port": int(self.grpc_port) if self.grpc_port else 6334,
- "api_key": self.api_key or None,
- "prefix": self.prefix or None,
- "timeout": int(self.timeout) if self.timeout else None,
- "path": self.path or None,
- "url": self.url or None,
- }
+ common_kwargs = {
+ "api_key": self.api_key or None,
+ "prefix": self.prefix or None,
+ "timeout": int(self.timeout) if self.timeout else None,
+ }
+ if self.url:
+ server_kwargs = {
+ **common_kwargs,
+ "url": self.url,
+ }
+ elif self.path:
+ server_kwargs = {
+ **common_kwargs,
+ "path": self.path,
+ }
+ else:
+ server_kwargs = {
+ **common_kwargs,
+ "host": self.host or "localhost",
+ "port": int(self.port) if self.port else 6333,
+ "grpc_port": int(self.grpc_port) if self.grpc_port else 6334,
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| server_kwargs = { | |
| "host": self.host or None, | |
| "port": int(self.port), # Ensure port is an integer | |
| "grpc_port": int(self.grpc_port), # Ensure grpc_port is an integer | |
| "api_key": self.api_key, | |
| "prefix": self.prefix, | |
| # Ensure timeout is an integer | |
| "port": int(self.port) if self.port else 6333, | |
| "grpc_port": int(self.grpc_port) if self.grpc_port else 6334, | |
| "api_key": self.api_key or None, | |
| "prefix": self.prefix or None, | |
| "timeout": int(self.timeout) if self.timeout else None, | |
| "path": self.path or None, | |
| "url": self.url or None, | |
| } | |
| server_kwargs = {k: v for k, v in server_kwargs.items() if v is not None} | |
| common_kwargs = { | |
| "api_key": self.api_key or None, | |
| "prefix": self.prefix or None, | |
| "timeout": int(self.timeout) if self.timeout else None, | |
| } | |
| if self.url: | |
| server_kwargs = { | |
| **common_kwargs, | |
| "url": self.url, | |
| } | |
| elif self.path: | |
| server_kwargs = { | |
| **common_kwargs, | |
| "path": self.path, | |
| } | |
| else: | |
| server_kwargs = { | |
| **common_kwargs, | |
| "host": self.host or "localhost", | |
| "port": int(self.port) if self.port else 6333, | |
| "grpc_port": int(self.grpc_port) if self.grpc_port else 6334, | |
| } | |
| server_kwargs = {k: v for k, v in server_kwargs.items() if v is not None} |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/lfx/src/lfx/components/qdrant/qdrant.py` around lines 71 - 81,
server_kwargs is built with both URL/path and host/port/grpc_port fields,
causing conflicting connection options when instantiating QdrantClient; change
the logic in the constructor (where server_kwargs is created) to produce
mutually exclusive branches: if self.url is set build server_kwargs with only
{"url": self.url, "api_key": ..., "prefix": ..., "timeout": ...}, else if
self.path is set build server_kwargs with only {"path": self.path, ...}, else
build server_kwargs with {"host": self.host, "port": int(self.port) or 6333,
"grpc_port": int(self.grpc_port) or 6334, ...}; keep the filter that removes
None values and then pass that server_kwargs to QdrantClient so only one
connection mode (url OR path OR host/port(+grpc_port)) is provided.
| if not client.collection_exists(self.collection_name): | ||
| vector_size = len(self.embedding.embed_query("test")) | ||
| distance = DISTANCE_MAP.get(self.distance_func, Distance.COSINE) | ||
| client.create_collection( | ||
| collection_name=self.collection_name, | ||
| vectors_config=VectorParams(size=vector_size, distance=distance), | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the file and the surrounding code for collection creation logic
FILE="src/lfx/src/lfx/components/qdrant/qdrant.py"
if [ ! -f "$FILE" ]; then
echo "Missing file: $FILE"
exit 1
fi
echo "---- $FILE (lines 60-140) ----"
nl -ba "$FILE" | sed -n '60,140p'
echo "---- Search for create_collection usage in the repo ----"
rg -n --hidden --no-ignore -S "create_collection\(" . || true
echo "---- Search for collection_exists usage in the repo ----"
rg -n --hidden --no-ignore -S "collection_exists\(" . || trueRepository: langflow-ai/langflow
Length of output: 195
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="src/lfx/src/lfx/components/qdrant/qdrant.py"
if [ ! -f "$FILE" ]; then
echo "Missing file: $FILE"
exit 1
fi
echo "---- $FILE (lines 60-140) ----"
cat -n "$FILE" | sed -n '60,140p'
echo "---- Search for create_collection usage in the repo ----"
rg -n --hidden --no-ignore -S "create_collection\(" . || true
echo "---- Search for collection_exists usage in the repo ----"
rg -n --hidden --no-ignore -S "collection_exists\(" . || true
echo "---- Search for Qdrant client collection creation/update error handling patterns ----"
rg -n --hidden --no-ignore -S "(CollectionAlreadyExists|already exists|status.*409|create_collection|collection_exists)" src/lfx/src/lfx/components/qdrant/qdrant.py . || trueRepository: langflow-ai/langflow
Length of output: 50377
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate and inspect check_cached_vector_store implementation
rg -n "def check_cached_vector_store|check_cached_vector_store" src/lfx/src/lfx/base/vectorstores -S || true
rg -n "check_cached_vector_store" src/lfx/src/lfx -S || true
echo "---- Read check_cached_vector_store definition ----"
FILE_MATCH=$(rg -n "def check_cached_vector_store" -S src/lfx/src/lfx | head -n 1 | cut -d: -f1)
if [ -n "${FILE_MATCH:-}" ] && [ -f "$FILE_MATCH" ]; then
echo "Found: $FILE_MATCH"
wc -l "$FILE_MATCH"
sed -n '1,200p' "$FILE_MATCH"
else
echo "Could not find check_cached_vector_store definition"
fi
echo "---- Inspect qdrant.py imports and surrounding class definition ----"
FILE="src/lfx/src/lfx/components/qdrant/qdrant.py"
wc -l "$FILE"
sed -n '1,140p' "$FILE"Repository: langflow-ai/langflow
Length of output: 50377
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="src/lfx/src/lfx/base/vectorstores/model.py"
if [ ! -f "$FILE" ]; then
echo "Missing $FILE"
exit 1
fi
wc -l "$FILE"
rg -n "def check_cached_vector_store|check_cached_vector_store" "$FILE"
# Show around the decorator definition
LINE=$(rg -n "def check_cached_vector_store" "$FILE" | head -n 1 | cut -d: -f1)
START=$((LINE-30))
END=$((LINE+120))
sed -n "${START},${END}p" "$FILE"Repository: langflow-ai/langflow
Length of output: 2099
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="src/lfx/src/lfx/base/vectorstores/model.py"
LINE=$(rg -n "def check_cached_vector_store" "$FILE" | head -n 1 | cut -d: -f1)
START=$((LINE-20))
if [ "$START" -lt 1 ]; then START=1; fi
END=$((LINE+160))
echo "check_cached_vector_store at line $LINE; printing $START-$END from $FILE"
wc -l "$FILE"
sed -n "${START},${END}p" "$FILE"Repository: langflow-ai/langflow
Length of output: 6895
🌐 Web query:
QdrantClient.create_collection when collection already exists throws which exception HTTP status 409 conflict Collection already exists
💡 Result:
When you call QdrantClient.create_collection for a collection name that already exists, it results in an HTTP 409 Conflict (“Collection ... already exists!”) error; e.g., the client surfaces “Unexpected Response: 409 (Conflict)” with the server message “Wrong input: Collection <name> already exists!” [1]. Note: newer qdrant-client versions introduced an optional if_not_exists parameter to avoid this by returning immediately when the collection already exists [2].
Citations:
- 1: client.get_collection(...) throws Pydantic validation error qdrant/qdrant-client#547
- 2: Add if_not_exists parameter to create_collection qdrant/qdrant-client#1149
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="src/lfx/src/lfx/components/qdrant/qdrant.py"
wc -l "$FILE"
sed -n '1,80p' "$FILE"Repository: langflow-ai/langflow
Length of output: 3188
Make collection creation idempotent under concurrent runs.
@check_cached_vector_store only caches within a single component invocation, so it doesn’t prevent a cross-process race. With the current check-then-create, parallel runs can both observe collection_exists(...) == False; the loser then calls create_collection(...) for an existing collection and hits the Qdrant “already exists” (HTTP 409 Conflict) error.
if not client.collection_exists(self.collection_name):
vector_size = len(self.embedding.embed_query("test"))
distance = DISTANCE_MAP.get(self.distance_func, Distance.COSINE)
client.create_collection(
collection_name=self.collection_name,
vectors_config=VectorParams(size=vector_size, distance=distance),
)Handle the “already exists”/409 case around create_collection (or use if_not_exists if your qdrant-client version supports it) so concurrent ingests remain stable.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/lfx/src/lfx/components/qdrant/qdrant.py` around lines 99 - 105, The
check-then-create for the Qdrant collection (using client.collection_exists and
client.create_collection) is racy under concurrent runs; wrap the
create_collection call (or use client.create_collection(..., if_not_exists=True)
if your qdrant-client supports it) to handle the “already exists” (HTTP 409)
case gracefully: compute vector_size via self.embedding.embed_query("test") and
distance via DISTANCE_MAP.get(self.distance_func, Distance.COSINE) as before,
but catch the client error/exception thrown by create_collection and ignore or
log only the 409 conflict while re-raising other errors so concurrent ingests
don’t fail.
Overview
from_documents to add_documentsso ingestion appends to an initialized Qdrant vector store instead of recreating the store during document insertion. Source: Langchain_qdrantTests passed:
Summary by CodeRabbit
New Features
Bug Fixes
Tests