Fix REST sensor charset handling to respect Content-Type header #148223

bdraco · 2025-07-05T15:01:48Z

Proposed change

This PR fixes the REST sensor and binary_sensor platforms to properly respect the charset parameter from the Content-Type header when decoding response text.

Background

When the REST integration was migrated from httpx to aiohttp in #146306, the behavior for character encoding changed:

Before (httpx): The encoding configuration parameter existed but was ignored due to a bug, causing httpx to always auto-detect charset from Content-Type headers
After (aiohttp): The migration "fixed" the bug by actually using the encoding parameter, but this broke charset auto-detection

The Issue

Currently, the REST sensor/binary_sensor always uses the configured encoding (default UTF-8) and ignores what the server specifies in the Content-Type header. This causes decoding errors when servers return content in other encodings (ISO-8859-1, Windows-1252, GB2312, etc.) with the proper Content-Type header.

The Solution

This PR makes the REST sensor/binary_sensor respect the Content-Type header charset while maintaining backward compatibility:

If the server provides a charset in the Content-Type header, use it
If no charset is provided, fall back to the configured encoding (default UTF-8)
Users can still configure a specific encoding when needed (e.g., for servers that don't set proper headers)

This brings the behavior in line with the REST switch component (which uses httpx and respects Content-Type headers) and restores the original (accidentally correct) behavior.

Type of change

Dependency upgrade
Bugfix (non-breaking change which fixes an issue)
New integration (thank you!)
New feature (which adds functionality to an existing integration)
Deprecation (breaking change to happen in the future)
Breaking change (fix/feature causing existing functionality to break)
Code quality improvements to existing code or addition of tests

Additional information

This PR fixes or closes issue: fixes Scrape Sensor: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xdc in position 401: invalid continuation byte #148081
This PR is related to issue:
Link to documentation pull request:
Link to developer documentation pull request:
Link to frontend pull request:

Checklist

The code change is tested and works locally.
Local tests pass. Your PR cannot be merged unless tests pass
There is no commented out code in this PR.
I have followed the development checklist
I have followed the perfect PR recommendations
The code has been formatted using Ruff (ruff format homeassistant tests)
Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

Documentation added/updated for www.home-assistant.io

If the code communicates with devices, web services, or third-party tools:

The manifest file has all fields filled out correctly.
Updated and included derived files by running: python3 -m script.hassfest.
New or updated dependencies have been added to requirements_all.txt.
Updated by running python3 -m script.gen_requirements_all.
For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.

To help with the load of incoming pull requests:

I have reviewed two other open pull requests in this repository.

Copilot

Pull Request Overview

This PR updates the REST sensor (and binary_sensor) to honor the charset parameter in the Content-Type header when decoding HTTP responses, falling back to a configured encoding only if no charset is provided.

Enhance the aiohttp mock in tests to expose a charset property and default to auto-detection when encoding=None.
Update rest/data.py to call response.text() without forcing an encoding if the server provided a charset.
Add test cases covering auto-detection, fallback, and override scenarios.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
tests/test_util/aiohttp.py	Add `charset` property and adjust `text()`/`json()` to match aiohttp’s auto-detect behavior.
tests/components/rest/test_sensor.py	New tests for charset auto-detection, fallback to configured encoding, and override behavior.
homeassistant/components/rest/data.py	Conditional call to `response.text()` to respect header charset or use configured encoding.

Copilot · 2025-07-05T15:02:36Z

tests/components/rest/test_sensor.py

    assert hass.states.get("sensor.mysensor").state == "tack själv"


+async def test_setup_auto_encoding_from_content_type(


[nitpick] These tests share a common setup and follow similar patterns; consider parameterizing them with pytest.mark.parametrize to reduce duplication and improve maintainability.

Doesn't make sense here

Why not put the content, headers, and the sensor "enconding" config in test params? the rest of the test is the same for the 3 tests.

tests/test_util/aiohttp.py

bdraco · 2025-07-05T15:36:29Z

thanks

frenck · 2025-07-05T15:38:25Z

Any time 🍻

bdraco · 2025-07-05T15:54:22Z

Looks like I need to fix the mocking for cast

lanthaler · 2025-07-05T19:15:07Z

Thanks a lot @bdraco for looking into this right away.

I wonder whether it wouldn't be better to make the encoding configuration optional. In almost all cases users wouldn't have to configure it as it can be auto-detected from the content type header.

There are a few buggy servers out there though for which it would be valuable to be able to override the header. With the solution in this PR, users have no way to work around such issues.

bdraco · 2025-07-05T19:21:14Z

The goal right now is to avoid the breaking change and restore it to how it worked before. We would need to add a new field to override the content-type. But since nobody asked for it before it seems like a case of YAGNI

lanthaler · 2025-07-06T17:31:08Z

We would need to add a new field to override the content-typ

Why is that? The encoding field is marked as optional. The problem is that in practice it actually isn't optional as it defaults to UTF-8 and that value of now enforced (which this PR is fixing).

IMHO the semantics of an optional field suggest that an unset value mean's that the user doesn't care ("figure it out for me" --> use the content type header). If the value is set, however, it would go against the explicitly expressed intent of the user to use the specified encoding.

bdraco · 2025-07-06T17:38:51Z

IMHO the semantics of an optional field suggest that an unset value mean's that the user doesn't care ("figure it out for me" --> use the content type header). If the value is set, however, it would go against the explicitly expressed intent of the user to use the specified encoding.

I understand your perspective, but changing the behavior of the optional default would be a breaking change. My goal with this PR is just to address the conversion issue, not to introduce new features or broader changes to the integration. If you'd like to explore that direction further, I’d suggest opening a separate PR for discussion.

abmantis · 2025-07-07T13:32:25Z

tests/components/rest/test_sensor.py

    assert hass.states.get("sensor.mysensor").state == "tack själv"


+async def test_setup_auto_encoding_from_content_type(


Why not put the content, headers, and the sensor "enconding" config in test params? the rest of the test is the same for the 3 tests.

Fix REST sensor charset handling to respect Content-Type header

30d4bf3

Copilot AI review requested due to automatic review settings July 5, 2025 15:01

home-assistant bot added bugfix cla-signed core has-tests integration: rest small-pr PRs with less than 30 lines. Quality Scale: No score labels Jul 5, 2025

Copilot AI reviewed Jul 5, 2025

View reviewed changes

bdraco added this to the 2025.7.2 milestone Jul 5, 2025

preen

b63b639

frenck approved these changes Jul 5, 2025

View reviewed changes

bdraco added 2 commits July 5, 2025 11:04

fix mock

b8ba7b7

Merge branch 'dev' into rest_charset

76ce40e

bdraco added 2 commits July 6, 2025 12:40

Merge branch 'dev' into rest_charset

c0dec83

Merge branch 'dev' into rest_charset

9e13eaa

abmantis approved these changes Jul 7, 2025

View reviewed changes

abmantis merged commit 8007bf1 into dev Jul 7, 2025
48 checks passed

abmantis deleted the rest_charset branch July 7, 2025 13:33

github-actions bot locked and limited conversation to collaborators Jul 8, 2025

frenck added the cherry-picked label Jul 14, 2025

		assert hass.states.get("sensor.mysensor").state == "tack själv"


		async def test_setup_auto_encoding_from_content_type(

Uh oh!

Fix REST sensor charset handling to respect Content-Type header #148223

Fix REST sensor charset handling to respect Content-Type header #148223

Uh oh!

Conversation

bdraco commented Jul 5, 2025

Proposed change

Background

The Issue

The Solution

Type of change

Additional information

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jul 5, 2025

Choose a reason for hiding this comment

Uh oh!

bdraco Jul 5, 2025

Choose a reason for hiding this comment

Uh oh!

abmantis Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bdraco commented Jul 5, 2025

Uh oh!

frenck commented Jul 5, 2025

Uh oh!

bdraco commented Jul 5, 2025

Uh oh!

lanthaler commented Jul 5, 2025

Uh oh!

bdraco commented Jul 5, 2025

Uh oh!

lanthaler commented Jul 6, 2025

Uh oh!

bdraco commented Jul 6, 2025

Uh oh!

abmantis Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants