Skip to content

Conversation

@bdraco
Copy link
Member

@bdraco bdraco commented Jul 5, 2025

Proposed change

This PR fixes the REST sensor and binary_sensor platforms to properly respect the charset parameter from the Content-Type header when decoding response text.

Background

When the REST integration was migrated from httpx to aiohttp in #146306, the behavior for character encoding changed:

  • Before (httpx): The encoding configuration parameter existed but was ignored due to a bug, causing httpx to always auto-detect charset from Content-Type headers
  • After (aiohttp): The migration "fixed" the bug by actually using the encoding parameter, but this broke charset auto-detection

The Issue

Currently, the REST sensor/binary_sensor always uses the configured encoding (default UTF-8) and ignores what the server specifies in the Content-Type header. This causes decoding errors when servers return content in other encodings (ISO-8859-1, Windows-1252, GB2312, etc.) with the proper Content-Type header.

The Solution

This PR makes the REST sensor/binary_sensor respect the Content-Type header charset while maintaining backward compatibility:

  • If the server provides a charset in the Content-Type header, use it
  • If no charset is provided, fall back to the configured encoding (default UTF-8)
  • Users can still configure a specific encoding when needed (e.g., for servers that don't set proper headers)

This brings the behavior in line with the REST switch component (which uses httpx and respects Content-Type headers) and restores the original (accidentally correct) behavior.

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New integration (thank you!)
  • New feature (which adds functionality to an existing integration)
  • Deprecation (breaking change to happen in the future)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • I have followed the perfect PR recommendations
  • The code has been formatted using Ruff (ruff format homeassistant tests)
  • Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

If the code communicates with devices, web services, or third-party tools:

  • The manifest file has all fields filled out correctly.
    Updated and included derived files by running: python3 -m script.hassfest.
  • New or updated dependencies have been added to requirements_all.txt.
    Updated by running python3 -m script.gen_requirements_all.
  • For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.

To help with the load of incoming pull requests:

Copilot AI review requested due to automatic review settings July 5, 2025 15:01
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the REST sensor (and binary_sensor) to honor the charset parameter in the Content-Type header when decoding HTTP responses, falling back to a configured encoding only if no charset is provided.

  • Enhance the aiohttp mock in tests to expose a charset property and default to auto-detection when encoding=None.
  • Update rest/data.py to call response.text() without forcing an encoding if the server provided a charset.
  • Add test cases covering auto-detection, fallback, and override scenarios.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
tests/test_util/aiohttp.py Add charset property and adjust text()/json() to match aiohttp’s auto-detect behavior.
tests/components/rest/test_sensor.py New tests for charset auto-detection, fallback to configured encoding, and override behavior.
homeassistant/components/rest/data.py Conditional call to response.text() to respect header charset or use configured encoding.

assert hass.states.get("sensor.mysensor").state == "tack själv"


async def test_setup_auto_encoding_from_content_type(
Copy link

Copilot AI Jul 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] These tests share a common setup and follow similar patterns; consider parameterizing them with pytest.mark.parametrize to reduce duplication and improve maintainability.

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't make sense here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not put the content, headers, and the sensor "enconding" config in test params? the rest of the test is the same for the 3 tests.

@bdraco bdraco added this to the 2025.7.2 milestone Jul 5, 2025
@bdraco
Copy link
Member Author

bdraco commented Jul 5, 2025

thanks

@frenck
Copy link
Member

frenck commented Jul 5, 2025

Any time 🍻

@bdraco
Copy link
Member Author

bdraco commented Jul 5, 2025

Looks like I need to fix the mocking for cast

@lanthaler
Copy link
Contributor

Thanks a lot @bdraco for looking into this right away.

I wonder whether it wouldn't be better to make the encoding configuration optional. In almost all cases users wouldn't have to configure it as it can be auto-detected from the content type header.

There are a few buggy servers out there though for which it would be valuable to be able to override the header. With the solution in this PR, users have no way to work around such issues.

@bdraco
Copy link
Member Author

bdraco commented Jul 5, 2025

The goal right now is to avoid the breaking change and restore it to how it worked before. We would need to add a new field to override the content-type. But since nobody asked for it before it seems like a case of YAGNI

@lanthaler
Copy link
Contributor

We would need to add a new field to override the content-typ

Why is that? The encoding field is marked as optional. The problem is that in practice it actually isn't optional as it defaults to UTF-8 and that value of now enforced (which this PR is fixing).

IMHO the semantics of an optional field suggest that an unset value mean's that the user doesn't care ("figure it out for me" --> use the content type header). If the value is set, however, it would go against the explicitly expressed intent of the user to use the specified encoding.

@bdraco
Copy link
Member Author

bdraco commented Jul 6, 2025

IMHO the semantics of an optional field suggest that an unset value mean's that the user doesn't care ("figure it out for me" --> use the content type header). If the value is set, however, it would go against the explicitly expressed intent of the user to use the specified encoding.

I understand your perspective, but changing the behavior of the optional default would be a breaking change. My goal with this PR is just to address the conversion issue, not to introduce new features or broader changes to the integration. If you'd like to explore that direction further, I’d suggest opening a separate PR for discussion.

assert hass.states.get("sensor.mysensor").state == "tack själv"


async def test_setup_auto_encoding_from_content_type(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not put the content, headers, and the sensor "enconding" config in test params? the rest of the test is the same for the 3 tests.

@abmantis abmantis merged commit 8007bf1 into dev Jul 7, 2025
48 checks passed
@abmantis abmantis deleted the rest_charset branch July 7, 2025 13:33
@github-actions github-actions bot locked and limited conversation to collaborators Jul 8, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Scrape Sensor: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xdc in position 401: invalid continuation byte

5 participants