A parser for the Apache Parquet file format, optimized for minimal dependencies and great performance. Available as a Java library and a command-line tool.
Goals of the project are:
- Be light-weight: Implement the Parquet file format avoiding any 3rd party dependencies other than for compression algorithms (e.g. Snappy)
- Be correct: Support all Parquet files which are supported by the canonical parquet-java library
- Be fast: Be as fast or faster as parquet-java
- Be complete: Add a Parquet file writer (after 1.0)
Latest version: 1.0.0.Beta2, 2026-04-29
Full documentation is available at hardwood.dev.
<dependency>
<groupId>dev.hardwood</groupId>
<artifactId>hardwood-core</artifactId>
<version>1.0.0.Beta2</version>
</dependency>import dev.hardwood.InputFile;
import dev.hardwood.reader.ParquetFileReader;
import dev.hardwood.reader.RowReader;
try (ParquetFileReader fileReader = ParquetFileReader.open(InputFile.of(path));
RowReader rowReader = fileReader.rowReader()) {
while (rowReader.hasNext()) {
rowReader.next();
long id = rowReader.getLong("id");
String name = rowReader.getString("name");
LocalDate birthDate = rowReader.getDate("birth_date");
Instant createdAt = rowReader.getTimestamp("created_at");
}
}See the Getting Started guide for detailed setup instructions.
| Document | Purpose |
|---|---|
| ARCHITECTURE.md | High-level architecture and module layout. |
| CONTRIBUTING.md | How to contribute: workflow, commit format, PR expectations. |
| ROADMAP.md | Implementation status, roadmap, and milestones. |
| NATIVE_BUILD.md | How the GraalVM native CLI build works. |
| PERFORMANCE.md | Benchmark results and how to run performance tests. |
| TESTING.md | Manual testing recipes (e.g. S3 via s3proxy). |
| RELEASING.md | Release process. |
- Hardwood: A New Parser for Apache Parquet — project announcement
- Open Source Friday with Gunnar Morling with Hardwood — GitHub Open Source Friday
Contributions are welcome! See CONTRIBUTING.md for the full guide — how to find work, the issue-first workflow, commit message format, and PR expectations.
- File bugs and feature requests on the issue tracker.
- Ask questions or discuss ideas on GitHub Discussions.
- Looking for something to work on? Browse
good first issueandhelp wanted.
LLM-assisted contributions are welcome, but vibe coding — accepting AI-generated changes without understanding them — is not. The aspiration is a high-quality, maintainable, performant, safe codebase.
See ROADMAP.md for the detailed implementation status, roadmap, and milestones.
This project requires Java 25 or newer for building (to create the multi-release JAR with Java 22+ FFM support). The resulting JAR runs on Java 21+ (libdeflate support requires Java 22+).
Docker must be running for the build to succeed, as the test suite uses Testcontainers to spin up services (e.g. S3 integration tests). If Docker is not available, the build will fail during the test phase for these tests.
It comes with the Apache Maven wrapper, i.e. a Maven distribution will be downloaded automatically, if needed.
Run the following command to build this project:
./mvnw clean verifyOn Windows, run the following command:
mvnw.cmd clean verifyPass the -Dquick option to skip all non-essential plug-ins and create the output artifact as quickly as possible:
./mvnw clean verify -DquickRun the following command to format the source code and organize the imports as per the project's conventions:
./mvnw process-sourcesThe hardwood CLI can be compiled to a GraalVM native binary using the -Dnative flag.
Requires GraalVM (Java 25+) installed locally. Install via SDKMAN:
sdk install java 25.0.2-graalceThen build:
./mvnw -Dnative package -pl cli -amThe resulting distribution is at cli/target/hardwood-<version>/bin/hardwood.
Requires Docker. The build runs inside a Linux container and produces a Linux x86_64 ELF binary:
./mvnw -Dnative -Dquarkus.native.container-build=true package -pl cli -amNote: The container build always produces a Linux binary. Running it on macOS will fail with
exec format error. Use the local GraalVM build for macOS binaries.
See NATIVE_BUILD.md for details on how the native build works (compression codec handling, build arguments).
The documentation site can be previewed locally using Docker:
# Build the image (once, or after changing requirements.txt)
docker build -t hardwood-docs docs/
# Serve locally with hot reload — preview at http://127.0.0.1:8000
docker run --rm -p 8000:8000 -v "$(pwd):/repo" hardwood-docs
# Build static site (output in docs/site/)
docker run --rm -v "$(pwd):/repo" hardwood-docs build -f docs/mkdocs.ymlA Docker Compose set-up is provided for running Claude Code with all build dependencies (Java 25, Maven, gh) pre-installed.
GH_TOKEN=<your-token> docker compose run --rm claudeSet GH_TOKEN to a GitHub personal access token so that Claude Code can interact with issues and pull requests. The project directory is mounted into the container at /workspace, and Claude Code configuration is persisted in a named volume across sessions.
See RELEASING.md.
To generate an API change report comparing the current build against a previous release:
./mvnw package japicmp:cmp -pl :hardwood-core -DskipTests -Djapicmp.oldVersion=<PREVIOUS_VERSION>The package phase is needed to build the current jar before comparing. The report is written to core/target/japicmp/. Internal packages (dev.hardwood.internal) are excluded. This is run automatically during releases.
See PERFORMANCE.md for benchmark results and instructions on running performance tests.
This code base is available under the Apache License, version 2.