Skip to content
View jg-bernard's full-sized avatar
  • University of Canterbury

Highlights

  • Pro

Block or report jg-bernard

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
26 results for source starred repositories written in Java
Clear filter

Free universal database tool and SQL client

Java 47,849 3,974 Updated Dec 17, 2025

A browser automation framework and ecosystem.

Java 33,786 8,631 Updated Dec 17, 2025

Graphs for Everyone

Java 15,558 2,545 Updated Dec 4, 2025

OpenRefine is a free, open source power tool for working with messy data and improving it

Java 11,650 2,107 Updated Dec 16, 2025

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.

Java 10,019 2,717 Updated Nov 27, 2025

AI + Data, online. https://vespa.ai

Java 6,673 685 Updated Dec 17, 2025

Gephi - The Open Graph Viz Platform

Java 6,310 1,592 Updated Dec 10, 2025

A machine learning software for extracting information from scholarly documents

Java 4,504 525 Updated Dec 17, 2025

Cyberduck is a libre FTP, SFTP, WebDAV, Amazon S3, Backblaze B2, Microsoft Azure & OneDrive and OpenStack Swift file transfer client for Mac and Windows.

Java 4,132 326 Updated Dec 17, 2025

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.

Java 3,110 780 Updated Dec 11, 2025

TrackerControl Android: monitor and control trackers and ads.

Java 2,289 97 Updated Dec 10, 2025

Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class…

Java 1,596 214 Updated Dec 17, 2023

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to t…

Java 1,019 352 Updated Dec 9, 2025

A scalable, mature and versatile web crawler based on Apache Storm

Java 953 268 Updated Dec 15, 2025

Content management platform to build modern business applications

Java 686 390 Updated Dec 17, 2025

CMU ARK Twitter Part-of-Speech Tagger

Java 575 196 Updated Dec 17, 2023

ACHE is a web crawler for domain-specific search.

Java 475 135 Updated Aug 31, 2025

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

Java 419 139 Updated Mar 30, 2023

News crawling with StormCrawler - stores content as WARC

Java 360 40 Updated Feb 19, 2025

Latent Dirichlet Allocation (LDA) model for Microblogs (Twitter, weibo etc.)

Java 319 108 Updated May 4, 2018

Tools to work with the big reddit JSON data dump.

Java 255 31 Updated Jul 6, 2024

Code samples to help you get started with the Amazon Mechanical Turk Requester API

Java 170 58 Updated Aug 2, 2024

Discourse Network Analyzer (DNA)

Java 146 44 Updated Jun 4, 2025

Index Common Crawl archives in tabular format

Java 124 14 Updated Dec 4, 2025

We introduce TACIT: An Open-Source Text Analysis, Crawling and Interpretation Tool. TACIT's plugin architecture has three main components: 1. Crawling plugins 2. Corpus management 3. Analysis plugi…

Java 109 16 Updated Mar 27, 2019

Unsupervised method for extracting quotation-speaker pairs from large news corpora.

Java 29 3 Updated Jul 4, 2018