-
-
Notifications
You must be signed in to change notification settings - Fork 840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change LineDecoder to match stdlib splitlines, resulting in significant speed up #2423
Conversation
08ad3ce
to
50c6811
Compare
A solution using the built-in |
Actually requests is using splitlines, too. So should I just switch to that and update the failing tests? |
Switching to a |
Haha, yes, the first implementation I did with |
9cbd511
to
27b880e
Compare
27b880e
to
ee687ae
Compare
@florimondmanca @tomchristie is this acceptable? I suppose the benefit of this API is that if you do something like |
ee687ae
to
042ab35
Compare
No. I don't believe resolving this issue should require the tests to change. |
Hrm. I'd not appreciated that we have different behaviour from (Our output includes trailing |
Right, I feel like there are 3 options to what the behaviour could be:
1 and 3 can be done efficiently and simply with Preservation of the existing semantics could be done with a regex perhaps, but it's tricky to get right. Or maybe a straight python implementation that just doesn't start from the beginning every time, it wouldn't be as fast as the ones which call to C code, but it could at least be linear-time? Lemme know which direction you want to take it and I can give it a go.. |
I'd suggest that we've got the current behaviour wrong, and that we need a breaking API change here. |
Yes imo |
After reading my comment pointing at a behavior change, it’s true that I would rather be expecting iter_lines() to return lines without newline separators at the end. This is what splitlines() does, but also other languages like Rust’s str.lines() (AdventOfCode season…), or JS when doing What are the ways this change could impact existing code? I don’t see many. If code is currently using iter_lines() and calling trim() on the result, they shouldn’t be impacted, only those trim calls would become unnecessary. People can’t be relying on the current behavior to reconstruct the original content either, so there shouldn’t be breakage on this use case either. |
Handle text ending in `\r` more gracefully. Return as much content as possible.
Noticed that this implementation wouldn't return until Implementation update in d3c6a8e. Test change in 46cad89... decoder = LineDecoder()
assert decoder.decode("") == []
# This will now return `["a", "", "b"]` and *only* buffer the last portion.
assert decoder.decode("a\r\rb\rc\r") == ["a", "", "b"]
assert decoder.flush() == ["c"] |
LGTM (with a small style suggestion which can be easily ignored) |
Can we also update this pull request title to note the change in behavior since it's no longer just a change in algo complexity? |
Co-authored-by: cdeler <serj.krotov@gmail.com>
Okay, I think we can merge this in now. |
This is very exciting, thanks! |
Closes #2422
Leading to enormous speedups when doing things such as Response(...).iter_lines()