GzDecoder stops decoding file toward the start.

In trying to decode the common crawl index files. `GzDecoder` stops at about 1.8M of input of a 690M file.
The file is too large to use `.read_to_end` (i.e. read it into memory).

If you download the file and use `gzip -d cdx-00010.gz` the whole file is expanded.
How do you use `GzDecoder` to get the same behavior as `gzip -d`?

The code exits early because `decoder.Read` returns 0 bytes, whereas reading from the stream (`input_stream.Read`) will continue. So, I assume there is some format issue in the file that `GzDecoder` does not handle and `gzip` does. It prints 'Read 0 x' before exiting so I assume there are no errors.

Thanks

Example input:
https://data.commoncrawl.org/cc-index/collections/CC-MAIN-2023-06/indexes/cdx-00010.gz

 Example code:
```
 let mut file = File::open("cdx-00010.gz").expect("Could not open index file.");
 decode_to_stream(&mut file);

use std::io::prelude::*;
use std::io;
use std::io::BufReader;
use std::fs::File;
use flate2::read::{GzDecoder};


pub fn decode_to_stream(input_stream: &mut dyn Read)
{
    let mut output_file = File::create("decoded").expect("Could not create output file.");
    let mut decoder = GzDecoder::new(input_stream);
    let mut buffer = [0; 65536];
    let mut total_read = 0;
    while let Ok(read_size) = decoder.read(&mut buffer[..])
    {
        println!("Read {} ({}).", read_size, total_read);
        if read_size <= 0 {
            break;
        }

        output_file.write(&buffer[..read_size]);
        total_read = total_read + read_size;
    }
}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GzDecoder stops decoding file toward the start. #339

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GzDecoder stops decoding file toward the start. #339

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions