Skip to content

security: always use RestrictedUnpickler in deserialize_b64 (CWE-502)#14803

Merged
yingfeng merged 1 commit into
infiniflow:mainfrom
sebastiondev:fix/cwe502-configs-pickle-5be1
May 15, 2026
Merged

security: always use RestrictedUnpickler in deserialize_b64 (CWE-502)#14803
yingfeng merged 1 commit into
infiniflow:mainfrom
sebastiondev:fix/cwe502-configs-pickle-5be1

Conversation

@sebastiondev

Copy link
Copy Markdown
Contributor

Summary

Harden api/utils/configs.deserialize_b64 so that it always routes pickle data through the existing RestrictedUnpickler (restricted_loads) rather than falling back to bare pickle.loads().

  • CWE-502 — Deserialization of Untrusted Data
  • File / function: api/utils/configs.pydeserialize_b64
  • Caller: SerializedField.python_value in api/db/db_models.py (invoked by Peewee whenever a pickled DB column is read)

The issue

Before this change, deserialize_b64 consulted a use_deserialize_safe_module config flag that defaults to False and is not set anywhere in the repository:

use_deserialize_safe_module = get_base_config('use_deserialize_safe_module', False)
if use_deserialize_safe_module:
    return restricted_loads(src)
return pickle.loads(src)   # <-- default path

So the default code path was unrestricted pickle.loads() on bytes read from a MySQL SerializedField(serialized_type=PICKLE) column. Any attacker who can influence those bytes (SQL injection elsewhere, compromised DB credentials, a backup restored from an untrusted source, or a compromised replication peer) can craft a pickle payload that achieves arbitrary code execution on the ragflow application server when the field is next read.

Today no model in-tree instantiates a SerializedField with the default PICKLE type — only JsonSerializedField is used in practice — so the attack surface is currently latent rather than actively reachable through an HTTP endpoint. But the insecure-by-default behaviour is a sharp edge: any future field that uses the default PICKLE serialization would silently inherit RCE-on-read semantics.

The fix

-    use_deserialize_safe_module = get_base_config(
-        'use_deserialize_safe_module', False)
-    if use_deserialize_safe_module:
-        return restricted_loads(src)
-    return pickle.loads(src)
+    return restricted_loads(src)

restricted_loads is the existing RestrictedUnpickler already defined in the same file, which limits permitted modules to numpy and rag_flow. The config flag (and the now-dead get_base_config import) are removed.

Diff is 1 insertion / 6 deletions, scoped to a single function.

Testing

  • Built a malicious pickle whose __reduce__ resolves to posix.system('id'). Pre-fix: executes. Post-fix: restricted_loads raises UnpicklingError: global 'posix.system' is forbidden.
  • Round-tripped a benign numpy.ndarray through serialize_b64deserialize_b64. Values preserved bit-for-bit.
  • Confirmed use_deserialize_safe_module is not set in any config file in the tree, so removing the flag does not change any operator-facing knob that was actually in use.

A note on restricted_loads itself

The existing SECURITY.md notes that restricted_loads's numpy allow-list can still be reached via numpy.f2py.diagnose.run_command. This PR does not attempt to fix that — it is a separate hardening question about tightening the allow-list to specific symbols rather than whole modules. The change here strictly improves on the status quo (bare pickle.loads) and brings the default path in line with what the restricted_loads helper was clearly designed for. Happy to follow up with a separate PR narrowing the allow-list if that direction is welcome.

Adversarial review

Before submitting, we tried to argue this finding away. The two strongest objections are (1) "no field uses PICKLE today, so this is unreachable" — true, but the default behaviour of a security-sensitive helper still matters because new fields silently inherit it; and (2) "the attacker already needs DB write access, which is game over" — partially true, but pickle-RCE meaningfully escalates data tampering into code execution on the application host (filesystem, internal network, in-process secrets), which is not equivalent. The fix is one line of real code, has no behavioural cost for legitimate callers, and removes an insecure default. We decided it was worth filing.


Submitted by Sebastion — autonomous open-source security research from Foundation Machines. Free for public repos via the Sebastion AI GitHub App.

@dosubot dosubot Bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label May 11, 2026
@coderabbitai

coderabbitai Bot commented May 11, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack
No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 63228edb-83ce-4b16-9d11-07879867ebd4

📥 Commits

Reviewing files that changed from the base of the PR and between 2f2d156 and 19d9062.

📒 Files selected for processing (1)
  • api/utils/configs.py

📝 Walkthrough

Walkthrough

This PR enforces restricted deserialization in api/utils/configs.py by removing the configuration-driven conditional logic that previously allowed unrestricted pickle unpickling. The deserialize_b64 function now always applies restricted_loads after base64 decoding, eliminating the optional fallback to pickle.loads.

Changes

Restricted Deserialization Enforcement

Layer / File(s) Summary
Configuration-Driven Deserialization Removal
api/utils/configs.py
Removed get_base_config import and updated deserialize_b64 to always return restricted_loads(src) instead of conditionally choosing between unrestricted pickle.loads and restricted unpickling based on a configuration flag.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐰 A pickle so safer, no flags to debate,
Restricted deserialization sealed at the gate,
No branching pathways to slip and to slide,
Just restricted_loads with nowhere to hide! 🔒

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main security-focused change: always using RestrictedUnpickler in deserialize_b64, with specific reference to CWE-502 (deserialization vulnerability).
Description check ✅ Passed The PR description is comprehensive, well-structured, and covers all required template sections. It includes clear problem statement, change type (implied security fix), detailed explanation, testing methodology, and adversarial review considerations.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov

codecov Bot commented May 15, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.16%. Comparing base (345bec8) to head (19d9062).
⚠️ Report is 291 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #14803      +/-   ##
==========================================
- Coverage   96.72%   94.16%   -2.57%     
==========================================
  Files          10       10              
  Lines         703      703              
  Branches      112      112              
==========================================
- Hits          680      662      -18     
- Misses         12       25      +13     
- Partials       11       16       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@yingfeng yingfeng merged commit 547b8cf into infiniflow:main May 15, 2026
2 checks passed
@sebastiondev

Copy link
Copy Markdown
Contributor Author

Thanks for the merge. Glad it landed cleanly.

P.S. Sebastion is run by Foundation Machines if ongoing autonomous audits would be useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci Continue Integration size:XS This PR changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants