-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decoding only part of a stream? #128
Comments
I think MessagePack is designed to handle fixed-sized data structures and it's not easy to handle your case. You can define your own protocol on the top of MessagePack, like this:
Thus, the data structure does not depend on a particular MessagePack library. |
This needs to be fixed. Looks like every implementation of msgpack makes it very difficult to handle such a simple and presumably common use case. I managed to work around this in Python without hacking the module, but haven't yet figured any sane way to get current reading position in Javascript (a solution that does involve parsing the text of that exception for the object length). Adding an extra field for msgpack object length is not a solution, nor is encoding all other data as msgpack raw binary objects. The MessagePack format itself knows very well where the object ends, and applications should have access to this information too. |
Essentially would need public access to
I suppose that the function could be called |
I'm not exactly sure about your use cases, but I'm willing to make some methods public (with some changes, if needed). Feel free to make pull-requests with tests that simulate your use cases. |
@gfx The use case same as @mcclure. An ArrayBuffer (or another stream) which contains
Think of the msgpack object as a header for the other data that follows. Unfortunately the other data cannot easily be msgpack bin field mainly because it is often too large to fit in memory. The other data itself knows when it ends, so we know when to ask for the msgpack parser to step in again. I'll see if I can make a sensible PR without complicating the already extensive API too much. |
If I may chime in with another use-case: I'm not using MessagePack for persistent storage but for streaming data to a browser over WebTransport. The data contains some metadata and then a relatively large binary blob. Now imagine if "relatively large" meant somethink like a hundred megabytes. I'd like to avoid having to decode the entire binary blob into memory before I can do anything else with it. It would be great if you could decode the metadata, get the size of the binary from it, and then pass the underlying stream along to something that can accept the binary blob as a While experimenting with Protobuf at first, I've worked around this by using
|
Scenario: I have a small app where I store a binary blob. The binary blob has metadata associated with it, so, I think: I will store first some sort of header, and then the binary blob. I decide that the header should be a msgpack item. So I encode() my header, write() the result to a file, and then write() my binary blob. (There are reasons why I do not simply include the binary blob inside the msgpack item.)
When it comes time to read my file back in, I get a ReadableStream for the file, I call decodeAsync() on the ReadableStream, and… I get an error,
Extra 512 of 529 byte(s) found at buffer[17]
. Which, yes, that is expected, I put it there.My only options for decoding msgpack seem to be decode/decodeAsync, which error if there is extra data at the end of the stream; and
decodeStream
, which understands there can be many consecutive data items but assumes they are all msgpack.I can decode a single msgpack at the start of a stream by doing
for await (const idk of decodeStream(fileStream)) {
and then doing abreak
inside the loop, but if i do this I find fileStream is exhausted (0 remaining bytes), so I cannot resume from the stream following that first msgPack item. (And I don't have a way of knowing how many bytes the first msgPack item was, so I can't even start over from the start and skip past it).Alternately, I can do the
for await
trick and attempt to read from the stream inside the loop after msgPack has decoded one item, but this won't work either because ReadableStreams are only allowed to have one Reader at a time.How should I proceed? It seems like "read one item from this stream, but allow me to still use the stream when you are done with it" is not a particularly outlandish use case, but it does not seem to be supported.
The text was updated successfully, but these errors were encountered: