Skip to content

fix: return None instead of empty string for unset string fields#236

Open
nedeadinside wants to merge 1 commit into
nfstream:masterfrom
nedeadinside:fix/empty-string-fields-return-none
Open

fix: return None instead of empty string for unset string fields#236
nedeadinside wants to merge 1 commit into
nfstream:masterfrom
nedeadinside:fix/empty-string-fields-return-none

Conversation

@nedeadinside

@nedeadinside nedeadinside commented Feb 17, 2026

Copy link
Copy Markdown

Description

String fields decoded from C structures via ffi.string().decode() returned "" when the underlying C char array was empty (e.g. client_fingerprint, requested_server_name. etc.).

This was inconsistent with non-sync mode, where the exact same fields are explicitly set to None:

# sync=True -> ffi.string(...).decode(...) =  ""
# sync=False -> None

The inconsistency caused a subtle data quality issue with pandas: a DataFrame built directly from NFlow objects showed no null values, but after a to_csv() / read_csv() round-trip NaN values appeared in the affected columns because pandas read_csv() interprets empty CSV cells as NaN while an empty Python string "" is not null.

Fix: added or None after every .decode() call for optional string fields, so that an empty decoded string is normalised to None. Applied consistently in __init__ (sync branch), sync() method, and system_visibility_mode initialisation block.

Fixes # (no existing issue)

Type of change

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Reproduced manually with script like that:

PARAMS = [
    "src_ip",
    "dst_ip",
    "src_mac",
    "dst_mac",
    "src_oui",
    "dst_oui",
    "protocol",
    "src_port",
    "dst_port",
    "application_name",
    ...
]
flows = []
for flow in NFStreamer(source="..."):
    flows.append(dict(zip(PARAMS, (getattr(flow, p) for p in PARAMS))))

df = pd.DataFrame(flows)
# i saw problem with: ["requested_server_name", "client_fingerprint", "server_fingerprint"]
print(df["client_fingerprint"].isna().sum())

df.to_csv("flows.csv", index=False)
df2 = pd.read_csv("flows.csv")

assert df["client_fingerprint"].isna().sum() == df2["client_fingerprint"].isna().sum()

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

String fields decoded from C via ffi.string().decode() returned ""
when the underlying C char array was empty (e.g. application_name
when protocol is not detected). This was inconsistent with non-sync
mode where the same fields are explicitly set to None, and caused a
subtle pandas bug: DataFrame built from NFlow had no nulls, but after
to_csv()/read_csv() round-trip NaN values appeared.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant