tar-vfs-index

Generates a file index with raw offsets of a tarball, using the same format as emscripten file packager. This this metadata can be used to mount the tar blob in Emscripten's WORKERFS virtual filesystem without extracting it.

For a longer intro see this blog post: Mounting tar archives as a filesystem in WebAssembly

Installation

npm install tar-vfs-index

Command line

npx tar-vfs-index archive.tar.gz
npx tar-vfs-index archive.tar.zst [output.json]
npx tar-vfs-index --append archive.tar.gz

If no input file is given, stdin is used:

curl -sSL https://cran.r-project.org/src/contrib/Archive/jose/jose_1.0.tar.gz | npx tar-vfs-index

Output is written to stdout, or to a file if two arguments are given:

{
  "files": [
    { "filename": "mypackage/DESCRIPTION", "start": 512, "end": 548 },
    { "filename": "mypackage/R/code.R", "start": 1536, "end": 1563 }
  ],
  "remote_package_size": 3072
}

JavaScript API

import tarindex from 'tar-vfs-index';
import { createReadStream } from 'node:fs';

const result = await tarindex(createReadStream('archive.tar.gz'));
console.log(result.files);
// [
//   { filename: 'mypackage/DESCRIPTION', start: 512, end: 548 },
//   { filename: 'mypackage/R/code.R', start: 1536, end: 1563 },
// ]
console.log(result.remote_package_size); // total bytes consumed

The start and end values are byte offsets within the decompressed tar stream.

Use with Emscripten WORKERFS

Emscripten's WORKERFS filesystem lets you mount a vfs image inside a web worker, giving compiled C/C++ code read-only access to its files without copying. Mounting an image requires a metadata JSON object (normally produced by file_packager --separate-metadata) alongside a Blob of the raw archive data.

tar-vfs-index generates this metadata object for a tar archive. Note that if your tar file is gzipped (tar.gz) you should use the browser-native DecompressionStream to get the blob of the uncompressed tarball.

const [metaRes, dataRes] = await Promise.all([
  fetch('archive.tar.gz.json'),  // output of tar-vfs-index
  fetch('archive.tar.gz'),
]);
const metadata = await metaRes.json();

// WORKERFS slices the blob using the offsets in metadata, which refer to
// positions in the decompressed tar stream, so decompress before mounting.
const blob = await new Response(
  dataRes.body.pipeThrough(new DecompressionStream('gzip'))
).blob();

FS.mkdir('/pkg');
FS.mount(WORKERFS, { packages: [{ metadata, blob }] }, '/pkg');

Embedding the index in the archive itself

The --append flag embeds the index directly into the archive as a .vfs-index.json entry, followed by a 16-byte lookup hint. This produces a self-contained .tar.gz that can be mounted by webR without a separate metadata file (as described in tar-metadata):

npx tar-vfs-index --append archive.tar.gz          # modifies the file in-place

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
bin		bin
test		test
.gitignore		.gitignore
README.md		README.md
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tar-vfs-index

Installation

Command line

JavaScript API

Use with Emscripten WORKERFS

Embedding the index in the archive itself

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

tar-vfs-index

Installation

Command line

JavaScript API

Use with Emscripten WORKERFS

Embedding the index in the archive itself

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages