security: always use RestrictedUnpickler in deserialize_b64 (CWE-502)#14803
Conversation
|
ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR enforces restricted deserialization in ChangesRestricted Deserialization Enforcement
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #14803 +/- ##
==========================================
- Coverage 96.72% 94.16% -2.57%
==========================================
Files 10 10
Lines 703 703
Branches 112 112
==========================================
- Hits 680 662 -18
- Misses 12 25 +13
- Partials 11 16 +5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Thanks for the merge. Glad it landed cleanly. P.S. Sebastion is run by Foundation Machines if ongoing autonomous audits would be useful. |
Summary
Harden
api/utils/configs.deserialize_b64so that it always routes pickle data through the existingRestrictedUnpickler(restricted_loads) rather than falling back to barepickle.loads().api/utils/configs.py→deserialize_b64SerializedField.python_valueinapi/db/db_models.py(invoked by Peewee whenever a pickled DB column is read)The issue
Before this change,
deserialize_b64consulted ause_deserialize_safe_moduleconfig flag that defaults toFalseand is not set anywhere in the repository:So the default code path was unrestricted
pickle.loads()on bytes read from a MySQLSerializedField(serialized_type=PICKLE)column. Any attacker who can influence those bytes (SQL injection elsewhere, compromised DB credentials, a backup restored from an untrusted source, or a compromised replication peer) can craft a pickle payload that achieves arbitrary code execution on the ragflow application server when the field is next read.Today no model in-tree instantiates a
SerializedFieldwith the default PICKLE type — onlyJsonSerializedFieldis used in practice — so the attack surface is currently latent rather than actively reachable through an HTTP endpoint. But the insecure-by-default behaviour is a sharp edge: any future field that uses the default PICKLE serialization would silently inherit RCE-on-read semantics.The fix
restricted_loadsis the existingRestrictedUnpickleralready defined in the same file, which limits permitted modules tonumpyandrag_flow. The config flag (and the now-deadget_base_configimport) are removed.Diff is 1 insertion / 6 deletions, scoped to a single function.
Testing
__reduce__resolves toposix.system('id'). Pre-fix: executes. Post-fix:restricted_loadsraisesUnpicklingError: global 'posix.system' is forbidden.numpy.ndarraythroughserialize_b64→deserialize_b64. Values preserved bit-for-bit.use_deserialize_safe_moduleis not set in any config file in the tree, so removing the flag does not change any operator-facing knob that was actually in use.A note on
restricted_loadsitselfThe existing
SECURITY.mdnotes thatrestricted_loads'snumpyallow-list can still be reached vianumpy.f2py.diagnose.run_command. This PR does not attempt to fix that — it is a separate hardening question about tightening the allow-list to specific symbols rather than whole modules. The change here strictly improves on the status quo (barepickle.loads) and brings the default path in line with what therestricted_loadshelper was clearly designed for. Happy to follow up with a separate PR narrowing the allow-list if that direction is welcome.Adversarial review
Before submitting, we tried to argue this finding away. The two strongest objections are (1) "no field uses PICKLE today, so this is unreachable" — true, but the default behaviour of a security-sensitive helper still matters because new fields silently inherit it; and (2) "the attacker already needs DB write access, which is game over" — partially true, but pickle-RCE meaningfully escalates data tampering into code execution on the application host (filesystem, internal network, in-process secrets), which is not equivalent. The fix is one line of real code, has no behavioural cost for legitimate callers, and removes an insecure default. We decided it was worth filing.
Submitted by Sebastion — autonomous open-source security research from Foundation Machines. Free for public repos via the Sebastion AI GitHub App.