-
Notifications
You must be signed in to change notification settings - Fork 63
Avoid full copy of buffer in reading json #85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
`map_string` copies the full buffer (`b.Bi_outbuf.o_s`) with calling `Bytes.to_string`. * The size of buffer seems to increase consistently during the reading, which makes the copies expensive. * But the function `f` is using only a substring of the buffer. This PR tries to copy less in `map_string` with `Bytes.sub_string`.
Please let me know, if this thought is wrong. A similar issue is also in Lines 152 to 154 in 6f8ed3f
I believe the last line can be replaced with f (Bytes.sub_string lexbuf.lex_buffer lexbuf.lex_start_pos len) 0 lenbut I am not sure what is the assumption about the mapping function If you agree with this, I will add this change in this PR too. |
|
@NathanReb Could you check this PR out please? |
|
I was off for a few weeks, I'll take a look ASAP! |
|
@Leonidas-from-XIV mind taking a look at this? |
|
I looked a little bit at this and of course the PR doesn't apply cleanly anymore because we removed But in general this code does not seem to be necessary since That said, |
|
Hello @Leonidas-from-XIV . Thank you for looking into this.
We were using atdgen to transfer soma data from C++ to OCaml. In the atdgen-generated code, there were
OK, |
As mentioned in #85 by @skcho, the code in `map_ident`/`map_string` was changed to not copy the whole buffer and give a start/end point (though this happened somewhat more on the accidental than deliberate side) thus subtly changing the semantics of the data `f` was called on. This saves copying but prevents `f` from peeking beyond the borders it was assigned. Whether this breaks consumers depends on what they do with it. Nevertheless, what this PR does is it applies the same logic to `map_lexeme`, so while the semantics might have changed at least they change in a consistent way. Closes #85
Ahh, okay, that's what it is for. Interesting. I assume this stays the same but you uncovered a potential incompatibility in the 2.0 release, where this is changed to always return
I've opened #108 to at least align the behaviour of the functions so they're at least consistent and of course slightly more efficient. |
As far as I can remember, in
I can't wait it! Thank you. |
As mentioned in ocaml-community#85 by @skcho, the code in `map_ident`/`map_string` was changed to not copy the whole buffer and give a start/end point (though this happened somewhat more on the accidental than deliberate side) thus subtly changing the semantics of the data `f` was called on. This saves copying but prevents `f` from peeking beyond the borders it was assigned. Whether this breaks consumers depends on what they do with it. Nevertheless, what this PR does is it applies the same logic to `map_lexeme`, so while the semantics might have changed at least they change in a consistent way. Closes ocaml-community#85
CHANGES: ### Removed - Removed dependency on easy-format and removed `pretty_format` from `Yojson`, `Yojson.Basic`, `Yojson.Safe` and `Yojson.Raw`. (@c-cube, ocaml-community/yojson#90) - Removed dependency on `biniou`, simplifying the chain of dependencies. This changes some APIs: * `Bi_outbuf.t` in signatures is replaced with `Buffer.t` * `to_outbuf` becomes `to_buffer` and `stream_to_outbuf` becomes `stream_to_buffer` (@Leonidas-from-XIV, ocaml-community/yojson#74, and @gasche, ocaml-community/yojson#132) - Removed `yojson-biniou` library - Removed deprecated `json` type aliasing type `t` which has been available since 1.6.0 (@Leonidas-from-XIV, ocaml-community/yojson#100). - Removed `json_max` type (@Leonidas-from-XIV, ocaml-community/yojson#103) - Removed constraint that the "root" value being rendered (via either `pretty_print` or `to_string`) must be an object or array. (@cemerick, ocaml-community/yojson#121) - Removed `validate_json` as it only made sense if the type was called `json`. (@Leonidas-from-XIV, ocaml-community/yojson#137) ### Add - Add an opam package `yojson-bench` to deal with benchmarks dependency (@tmcgilchrist, ocaml-community/yojson#117) - Add a benchmark to judge the respective performance of providing a buffer vs letting Yojson create an internal (ocaml-community/yojson#134, @Leonidas-from-XIV) - Add an optional `suf` keyword argument was added to functions that write serialized JSON, thus allowing NDJSON output. Most functions default to not adding any suffix except for `to_file` (ocaml-community/yojson#124, @panglesd) and functions writing sequences of values where the default is `\n` (ocaml-community/yojson#135, @Leonidas-from-XIV) ### Change - The `stream_from_*` and `stream_to_*` functions now use a `Seq.t` instead of a `Stream.t`, and they are renamed into `seq_from_*` and `seq_to_*` (@gasche, ocaml-community/yojson#131). ### Fix - Avoid copying unnecessarily large amounts of strings when parsing (ocaml-community/yojson#85, ocaml-community/yojson#108, @Leonidas-from-XIV) - Fix `stream_to_file` (ocaml-community/yojson#133, @tcoopman and @gasche)
CHANGES: ### Removed - Removed dependency on easy-format and removed `pretty_format` from `Yojson`, `Yojson.Basic`, `Yojson.Safe` and `Yojson.Raw`. (@c-cube, ocaml-community/yojson#90) - Removed dependency on `biniou`, simplifying the chain of dependencies. This changes some APIs: * `Bi_outbuf.t` in signatures is replaced with `Buffer.t` * `to_outbuf` becomes `to_buffer` and `stream_to_outbuf` becomes `stream_to_buffer` (@Leonidas-from-XIV, ocaml-community/yojson#74, and @gasche, ocaml-community/yojson#132) - Removed `yojson-biniou` library - Removed deprecated `json` type aliasing type `t` which has been available since 1.6.0 (@Leonidas-from-XIV, ocaml-community/yojson#100). - Removed `json_max` type (@Leonidas-from-XIV, ocaml-community/yojson#103) - Removed constraint that the "root" value being rendered (via either `pretty_print` or `to_string`) must be an object or array. (@cemerick, ocaml-community/yojson#121) - Removed `validate_json` as it only made sense if the type was called `json`. (@Leonidas-from-XIV, ocaml-community/yojson#137) ### Add - Add an opam package `yojson-bench` to deal with benchmarks dependency (@tmcgilchrist, ocaml-community/yojson#117) - Add a benchmark to judge the respective performance of providing a buffer vs letting Yojson create an internal (ocaml-community/yojson#134, @Leonidas-from-XIV) - Add an optional `suf` keyword argument was added to functions that write serialized JSON, thus allowing NDJSON output. Most functions default to not adding any suffix except for `to_file` (ocaml-community/yojson#124, @panglesd) and functions writing sequences of values where the default is `\n` (ocaml-community/yojson#135, @Leonidas-from-XIV) ### Change - The `stream_from_*` and `stream_to_*` functions now use a `Seq.t` instead of a `Stream.t`, and they are renamed into `seq_from_*` and `seq_to_*` (@gasche, ocaml-community/yojson#131). ### Fix - Avoid copying unnecessarily large amounts of strings when parsing (ocaml-community/yojson#85, ocaml-community/yojson#108, @Leonidas-from-XIV) - Fix `stream_to_file` (ocaml-community/yojson#133, @tcoopman and @gasche)
CHANGES: ### Removed - Removed dependency on easy-format and removed `pretty_format` from `Yojson`, `Yojson.Basic`, `Yojson.Safe` and `Yojson.Raw`. (@c-cube, ocaml-community/yojson#90) - Removed dependency on `biniou`, simplifying the chain of dependencies. This changes some APIs: * `Bi_outbuf.t` in signatures is replaced with `Buffer.t` * `to_outbuf` becomes `to_buffer` and `stream_to_outbuf` becomes `stream_to_buffer` (@Leonidas-from-XIV, ocaml-community/yojson#74, and @gasche, ocaml-community/yojson#132) - Removed `yojson-biniou` library - Removed deprecated `json` type aliasing type `t` which has been available since 1.6.0 (@Leonidas-from-XIV, ocaml-community/yojson#100). - Removed `json_max` type (@Leonidas-from-XIV, ocaml-community/yojson#103) - Removed constraint that the "root" value being rendered (via either `pretty_print` or `to_string`) must be an object or array. (@cemerick, ocaml-community/yojson#121) - Removed `validate_json` as it only made sense if the type was called `json`. (@Leonidas-from-XIV, ocaml-community/yojson#137) ### Add - Add an opam package `yojson-bench` to deal with benchmarks dependency (@tmcgilchrist, ocaml-community/yojson#117) - Add a benchmark to judge the respective performance of providing a buffer vs letting Yojson create an internal (ocaml-community/yojson#134, @Leonidas-from-XIV) - Add an optional `suf` keyword argument was added to functions that write serialized JSON, thus allowing NDJSON output. Most functions default to not adding any suffix except for `to_file` (ocaml-community/yojson#124, @panglesd) and functions writing sequences of values where the default is `\n` (ocaml-community/yojson#135, @Leonidas-from-XIV) ### Change - The `stream_from_*` and `stream_to_*` functions now use a `Seq.t` instead of a `Stream.t`, and they are renamed into `seq_from_*` and `seq_to_*` (@gasche, ocaml-community/yojson#131). ### Fix - Avoid copying unnecessarily large amounts of strings when parsing (ocaml-community/yojson#85, ocaml-community/yojson#108, @Leonidas-from-XIV) - Fix `stream_to_file` (ocaml-community/yojson#133, @tcoopman and @gasche)
map_stringcopies the full buffer (b.Bi_outbuf.o_s) with callingBytes.to_string.fis using only a substring of the buffer.This PR tries to copy less in
map_stringwithBytes.sub_string.