UniversalDetector vs charset.detect difference

Hi, thanks for creating chardet! Today I used `chardet.detect` to find the encoding of about 1600 files. A bit later I got `UnicodeDecodeErrors` with 3 of those files when I tried opening them in `mode='r'` with the derived encoding. All three were mis-identified as windows-1252. I did a quick check in the terminal as shown below and I was surprised to see a different result: Mac Roman. Is there an explanation for this difference? I thought `chardet.detect` was the recommended way. I now use 
`UniversalDetector()` to get the correct result.

I'm using Python 3.11 with a recent chardet version.

```
   # Interpreter
   >>> contents = open(FILENAME, "rb").read()
   >>> chardet.detect(contents)
   {'encoding': 'Windows-1252', 'confidence': 0.7282676610947401, 'language':
   ''}

   # Terminal
   $ python -m chardet FILENAME
   FILENAME: MacRoman with confidence 0.7167379080370483
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UniversalDetector vs charset.detect difference #296

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

UniversalDetector vs charset.detect difference #296

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions