Fix CSV import for files containing Byte Order Mark (BOM) #23332
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Scope
Seems like this turns out to be a UTF-8 byte order mark (BOM) issue. Although it was resolved back in #12970 for
csv-parser
,csv-parser
was then changed topapaparse
in #19739, hence BOM isn't being handled again.Related issues in PapaParse's repo:
With the
sample.csv
reproduction file provided in #21727 which contains BOM (if you open it with vscode or notepad, the bottom right corner should sayUTF-8 with BOM
), as well as using thetransformHeader
config option as a quick test:That allows us to identify the problem, which is the first column
always_ignored
actually gets processed asalways_ignored
(notice the visible leading space):This causes the import process to attempt to import to a technically non-existing column, hence the missing data for users that are using CSV with BOM for their first column every time.
This PR opted to use a simplistic
.trim()
as brought up in PapaParse's issues, but other potential alternatives could be:only trimming it when
index === 0
to make it clearer we're cleaning up just the first columnOnly cleaning up BOM specifically which a slightly more elaborate code such as:
What's changed:
transformHeader
papaparse config for CSV imports to trim the headersPotential Risks / Drawbacks
Review Notes / Questions
trim()
is acceptable, or other alternatives mentioned above are more preferred since they are more specific in terms of code readability/maintainabilityFixes #21727