sources: replace filetype with mimetype and add explicit human-readable filter #1951

mikelolasagasti · 2025-10-02T21:31:44Z

Description:

The old logic skipped all application/* files as binary, which risked ignoring human-readable formats if detection improved. In practice, JSON/YAML/etc. were scanned only because filetype returned "unknown".

This change:

uses mimetype for broader and more accurate detection
adds isHumanReadable() to explicitly whitelist text/* and common textual application/* types (json, xml, yaml, toml, js, xhtml)
makes the skip policy explicit instead of relying on "unknown" fallback
adds regression tests to verify classification

Additionally, this patch migrates the detection library from h2non/filetype to the more accurate and actively maintained gabriel-vasile/mimetype.

Checklist:

Does your PR pass tests?
Have you written new tests for your changes?
Have you lint your code locally prior to submission?

…le filter The old logic skipped all application/* files as binary, which risked ignoring human-readable formats if detection improved. In practice, JSON/YAML/etc. were scanned only because filetype returned "unknown". This change: - uses mimetype for broader and more accurate detection - adds isHumanReadable() to explicitly whitelist text/* and common textual application/* types (json, xml, yaml, toml, js, xhtml) - makes the skip policy explicit instead of relying on "unknown" fallback - adds regression tests to verify classification Additionally, this patch migrates the detection library from `h2non/filetype` to the more accurate and actively maintained `gabriel-vasile/mimetype`. Signed-off-by: Mikel Olasagasti Uranga <mikel@olasagasti.info>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

sources: replace filetype with mimetype and add explicit human-readable filter #1951

sources: replace filetype with mimetype and add explicit human-readable filter #1951

mikelolasagasti commented Oct 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

sources: replace filetype with mimetype and add explicit human-readable filter #1951

Are you sure you want to change the base?

sources: replace filetype with mimetype and add explicit human-readable filter #1951

Conversation

mikelolasagasti commented Oct 2, 2025

Description:

Checklist:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant