Skip to content

jeroen/tar-vfs-index

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tar-vfs-index

Generates a file index with raw offsets of a tarball, using the same format as emscripten file packager. This this metadata can be used to mount the tar blob in Emscripten's WORKERFS virtual filesystem without extracting it.

For a longer intro see this blog post: Mounting tar archives as a filesystem in WebAssembly

Installation

npm install tar-vfs-index

Command line

npx tar-vfs-index archive.tar.gz
npx tar-vfs-index archive.tar.zst [output.json]
npx tar-vfs-index --append archive.tar.gz

If no input file is given, stdin is used:

curl -sSL https://cran.r-project.org/src/contrib/Archive/jose/jose_1.0.tar.gz | npx tar-vfs-index

Output is written to stdout, or to a file if two arguments are given:

{
  "files": [
    { "filename": "mypackage/DESCRIPTION", "start": 512, "end": 548 },
    { "filename": "mypackage/R/code.R", "start": 1536, "end": 1563 }
  ],
  "remote_package_size": 3072
}

JavaScript API

import tarindex from 'tar-vfs-index';
import { createReadStream } from 'node:fs';

const result = await tarindex(createReadStream('archive.tar.gz'));
console.log(result.files);
// [
//   { filename: 'mypackage/DESCRIPTION', start: 512, end: 548 },
//   { filename: 'mypackage/R/code.R', start: 1536, end: 1563 },
// ]
console.log(result.remote_package_size); // total bytes consumed

The start and end values are byte offsets within the decompressed tar stream.

Use with Emscripten WORKERFS

Emscripten's WORKERFS filesystem lets you mount a vfs image inside a web worker, giving compiled C/C++ code read-only access to its files without copying. Mounting an image requires a metadata JSON object (normally produced by file_packager --separate-metadata) alongside a Blob of the raw archive data.

tar-vfs-index generates this metadata object for a tar archive. Note that if your tar file is gzipped (tar.gz) you should use the browser-native DecompressionStream to get the blob of the uncompressed tarball.

const [metaRes, dataRes] = await Promise.all([
  fetch('archive.tar.gz.json'),  // output of tar-vfs-index
  fetch('archive.tar.gz'),
]);
const metadata = await metaRes.json();

// WORKERFS slices the blob using the offsets in metadata, which refer to
// positions in the decompressed tar stream, so decompress before mounting.
const blob = await new Response(
  dataRes.body.pipeThrough(new DecompressionStream('gzip'))
).blob();

FS.mkdir('/pkg');
FS.mount(WORKERFS, { packages: [{ metadata, blob }] }, '/pkg');

Embedding the index in the archive itself

The --append flag embeds the index directly into the archive as a .vfs-index.json entry, followed by a 16-byte lookup hint. This produces a self-contained .tar.gz that can be mounted by webR without a separate metadata file (as described in tar-metadata):

npx tar-vfs-index --append archive.tar.gz          # modifies the file in-place

About

Generates metadata.json to mount a tar file in WORKERFS

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors