Skip to content
View kermitt2's full-sized avatar

Organizations

@istex @termith-anr @anHALytics @science-miner @howisonlab

Block or report kermitt2

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Analyses of software mentions and dependencies

Go 11 3 Updated Dec 16, 2025

Tympi News web app

JavaScript 2 Updated Nov 6, 2024

Source of the article "Mining experimental data from Materials Science literature with Large Language Models: an evaluation study"

TeX 7 Updated Aug 15, 2024

🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022

Jupyter Notebook 9,521 1,005 Updated Feb 5, 2025

Indexes metadata from Crossref into Elasticsearch. Primarily to be used with Biblio-Glutton

Go 5 Updated May 30, 2023

PhD Dissertation "Automated Extraction and Curation of Materials Information from Scientific Literature"

TeX 11 Updated Feb 20, 2024

Opensource IDE For Exploring and Testing API's (lightweight alternative to Postman/Insomnia)

JavaScript 39,332 2,014 Updated Dec 18, 2025

Convert PDF to markdown + JSON quickly with high accuracy

Python 30,433 2,065 Updated Nov 19, 2025

Viewer for the structure extracted by Grobid on PDF documents

Python 57 11 Updated Nov 7, 2025

Streamlit PDF viewer

Python 191 19 Updated Dec 17, 2025

library supporting NLP and CV research on scientific papers

Python 784 63 Updated Nov 8, 2024

Scientific Document Insight Q/A

Python 32 5 Updated Sep 1, 2025

A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.

Python 64 6 Updated Jul 29, 2024

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

Python 6,714 550 Updated Jul 11, 2024

Implementation of Nougat Neural Optical Understanding for Academic Documents

Python 9,755 622 Updated Feb 21, 2025

Python tools for processing the stackexchange data dumps into a text dataset for Language Models

Python 85 18 Updated Dec 6, 2023

Active learning for systematic reviews

Python 814 153 Updated Dec 15, 2025

One downloader for many scientific data and code repositories! DOI 👐 Data

Python 83 13 Updated Dec 15, 2025

The official codes for "PMC-LLaMA: Towards Building Open-source Language Models for Medicine"

Python 673 61 Updated Jul 8, 2024

A fast DVI, EPS, and PDF to SVG converter

C++ 347 36 Updated Dec 8, 2025

Slides and resources from my CSV Conf 2023 keynote

16 Updated May 1, 2023

Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings (EMNLP 2022 paper)

Python 74 1 Updated Nov 11, 2022

This repository collects 100 papers related to negative sampling methods.

197 20 Updated Jun 25, 2023

**deprecated**

JavaScript 408 50 Updated Sep 30, 2025

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports comp…

C++ 8,719 1,252 Updated Dec 18, 2025

Get answers to research questions from 200M+ papers. Link to demo -

Jupyter Notebook 207 21 Updated Nov 5, 2025

Easily compute clip embeddings and build a clip retrieval system with them

Jupyter Notebook 2,708 239 Updated Aug 15, 2025

Enterprise-grade and API-first LLM workspace for unstructured documents, including data extraction, redaction, rights management, prompt playground, and more!

Python 1,108 114 Updated Dec 17, 2025
Next