Skip to content
View jg-bernard's full-sized avatar
  • University of Canterbury

Highlights

  • Pro

Block or report jg-bernard

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
69 results for source starred repositories written in HTML
Clear filter

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

HTML 14,903 2,134 Updated Dec 6, 2025

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website …

HTML 13,425 1,105 Updated Dec 17, 2025

⚡ HugoBlox: Markdown sites in minutes. Academic/resume/lab/portfolio for AI researchers & startups. Premium templates. Deploy to GitHub Pages now in 1-click 👇

HTML 9,251 2,960 Updated Dec 16, 2025

extract text from any document. no muss. no fuss.

HTML 4,396 659 Updated Dec 2, 2024

Twitter Text Libraries. This code is used at Twitter to tokenize and parse text to meet the expectations for what can be used on the platform.

HTML 3,117 528 Updated Apr 26, 2024

A frictionless, pipeable approach to dealing with summary statistics

HTML 1,138 81 Updated Jul 27, 2025

Beautiful and customizable model summaries in R.

HTML 943 83 Updated Dec 13, 2025

A Python 3 compatible version of goose http://goose3.readthedocs.io/en/latest/index.html

HTML 896 108 Updated Dec 8, 2025

Open source project for data preparation for GenAI applications

HTML 867 231 Updated Dec 12, 2025

Statistical Inference via Data Science: A ModernDive into R and the Tidyverse

HTML 798 533 Updated Dec 9, 2025

Colors for data scientists.

HTML 673 58 Updated Jan 10, 2024

Find dates inside text using Python and get back datetime objects

HTML 665 170 Updated May 13, 2024

Mixed-effects models in R using S4 classes and methods with RcppEigen

HTML 664 161 Updated Dec 17, 2025

Automatic evals for LLMs

HTML 568 71 Updated Jun 27, 2025

Extraction of the journalistic five W and one H questions (5W1H) from news articles: who did what, when, where, why, and how?

HTML 528 87 Updated Oct 25, 2024

The coronavirus dataset

HTML 496 203 Updated Mar 20, 2023

A repository to monitor attack vectors from state-backed information operations

HTML 400 38 Updated Jul 28, 2024

Summer Institutes in Computational Social Science

HTML 363 288 Updated Dec 9, 2025

The File System State Monitor keeps track of the state of any number of paths and will fire events when said state changes (create/update/delete). FSSM supports using FSEvents on MacOS, Inotify on …

HTML 354 27 Updated Oct 27, 2023

An R package for the extraction of sentiment and sentiment-based plot arcs from text

HTML 338 72 Updated Aug 11, 2023

A package management tools for R

HTML 325 40 Updated Feb 22, 2024

Static and dynamic network visualization with R - code and tutorial from Sunbelt 2019 workshop.

HTML 307 150 Updated Jun 27, 2021

Everyday things people use in Pytorch. No need to spend hours reading Pytorch forums trying to find them!

HTML 283 53 Updated Oct 12, 2021

ggplot2 shortcuts (transformations made easy)

HTML 277 20 Updated Jun 18, 2025

An open source online platform for collaborative image labeling

HTML 276 54 Updated Sep 19, 2024

The Internet Monitor is a research project to evaluate, describe, and summarize the means, mechanisms, and extent of Internet content controls and Internet activity around the world.

HTML 266 23 Updated Sep 28, 2023

A collection of R packages spanning natural language processing, statistical analysis, data visualization, and text analysis

HTML 218 36 Updated Jun 8, 2025

R client for the Google Translation API, Google Cloud Natural Language API and Google Cloud Speech API

HTML 202 43 Updated Oct 22, 2025

Intro to Machine Learning with the Tidyverse

HTML 181 90 Updated Feb 14, 2020

Code and Resources for "Applied Machine Learning"

HTML 163 91 Updated Jun 18, 2020
Next