-
Notifications
You must be signed in to change notification settings - Fork 0
Migrate from Qdrant to Postgres #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate from Qdrant to Postgres #4
Conversation
|
||
temp_path = temp_file.name | ||
|
||
logger.info(f"Starting JSON indexing: site={site}, objects={len(data) if isinstance(data, list) else 1}") |
Check failure
Code scanning / CodeQL
Log Injection High
user-provided value
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 1 month ago
To fix this log injection vulnerability, sanitize the site
user-supplied value before logging it. The generally recommended approach for plain-text logs is to strip newlines and carriage returns (\r
, \n
, \r\n
) from the value to prevent attackers from injecting extra log lines. This can be achieved by calling .replace()
on the variable before interpolation in the log message.
Best practice is to sanitize as close as possible to usage, but for readability and to prevent accidental omission, it makes sense to sanitize just above where the variable is first used in a log statement. In this file, sanitize site
just before it's logged:
- Replace newlines and carriage returns from
site
usingsite.replace('\r', '').replace('\n', '')
and use the sanitized variable in the log message. - Optionally, explicitly mark the sanitized version, e.g.
site_for_log
. - No external dependencies are required.
Make this change where site
is logged on line 60 (and optionally line 72), by introducing a sanitized version just before.
-
Copy modified lines R60-R61 -
Copy modified line R73
@@ -57,7 +57,8 @@ | ||
|
||
temp_path = temp_file.name | ||
|
||
logger.info(f"Starting JSON indexing: site={site}, objects={len(data) if isinstance(data, list) else 1}") | ||
site_for_log = site.replace('\r', '').replace('\n', '') if isinstance(site, str) else str(site) | ||
logger.info(f"Starting JSON indexing: site={site_for_log}, objects={len(data) if isinstance(data, list) else 1}") | ||
|
||
# Use loadJsonToDB with the temporary file | ||
total_documents = await loadJsonToDB( | ||
@@ -69,7 +70,7 @@ | ||
database=database | ||
) | ||
|
||
logger.info(f"JSON indexing completed: {total_documents} documents indexed for site {site}") | ||
logger.info(f"JSON indexing completed: {total_documents} documents indexed for site {site_for_log}") | ||
|
||
# Return success=False if no documents were indexed | ||
success = total_documents > 0 |
database=database | ||
) | ||
|
||
logger.info(f"JSON indexing completed: {total_documents} documents indexed for site {site}") |
Check failure
Code scanning / CodeQL
Log Injection High
user-provided value
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 1 month ago
To fix the problem, we should sanitize the user-provided site
value before it is included in the log entry. Specifically, we should remove any carriage return (\r
) and line feed (\n
) characters, which are most often used in log injection attacks. We can do this using the replace()
method as described in the background, applying both .replace('\n', '')
and .replace('\r', '')
(or combining them). This sanitization should be done before logging operations that interpolate or concatenate user input.
The relevant change is needed in the region of code before line 72, where the log statement is made. We should sanitize site
after extracting it from the body and before it is used in logging statements (lines 60, 72, and possibly any other log sink in the handler). For completeness and future safety, it is best to sanitize site
at the top after extraction and use the sanitized version throughout the function.
No additional dependencies are required. Only a code edit is needed.
-
Copy modified lines R35-R36 -
Copy modified line R62 -
Copy modified line R74
@@ -32,6 +32,8 @@ | ||
|
||
data = body.get('data') | ||
site = body.get('site') | ||
# Sanitize site input to prevent log injection | ||
safe_site = site.replace('\r', '').replace('\n', '') if site else '' | ||
|
||
if not data or not site: | ||
return web.json_response({ | ||
@@ -57,7 +59,7 @@ | ||
|
||
temp_path = temp_file.name | ||
|
||
logger.info(f"Starting JSON indexing: site={site}, objects={len(data) if isinstance(data, list) else 1}") | ||
logger.info(f"Starting JSON indexing: site={safe_site}, objects={len(data) if isinstance(data, list) else 1}") | ||
|
||
# Use loadJsonToDB with the temporary file | ||
total_documents = await loadJsonToDB( | ||
@@ -69,7 +71,7 @@ | ||
database=database | ||
) | ||
|
||
logger.info(f"JSON indexing completed: {total_documents} documents indexed for site {site}") | ||
logger.info(f"JSON indexing completed: {total_documents} documents indexed for site {safe_site}") | ||
|
||
# Return success=False if no documents were indexed | ||
success = total_documents > 0 |
Merging this, tested search and indexing locally. |
In this PR