Skip to content
View pjox's full-sized avatar
Drinking coffee
Drinking coffee

Highlights

  • Pro

Organizations

@commoncrawl @bigscience-workshop @oscar-project

Block or report pjox

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
8 stars written in Java
Clear filter

A configuration as code language with rich validation and tooling.

Java 11,108 368 Updated Apr 4, 2026

A machine learning software for extracting information from scholarly documents

Java 4,765 538 Updated Apr 6, 2026

A machine learning tool for fishing entities

Java 269 24 Updated Feb 27, 2026

The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)

Java 225 63 Updated Dec 22, 2022

Common Crawl fork of Apache Nutch

Java 41 3 Updated Apr 3, 2026

Analytic platform for the HAL research archive (in development)

Java 13 1 Updated Oct 2, 2020

Simple Java client for GROBID REST services

Java 9 5 Updated Apr 26, 2021