-
SAE
- Tallinn, Estonia
- https://www.andybeger.com
- https://orcid.org/0000-0003-1883-3169
- @andybeger.bsky.social
Highlights
- Pro
Starred repositories
How a (small) language model walks through its training text: a teaching demo of a bigram Markov chain as a random walk. Live: shannon-language-model.pages.dev
Graph Neural Network Library for PyTorch
PyTorch Geometric Temporal: Spatiotemporal Signal Processing with Neural Machine Learning Models (CIKM 2021)
An official OpenAI toolkit for social scientists and data scientists to measure quantitative attributes in text, images, or audio using the GPT API.
A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.
Code for reconstructing full-text news articles from the GDELT Web News NGrams 3.0 dataset
A compute graph for loading and transforming OWID's data
R package to fit statistical models to repeated categorical rating data using Stan
"RAG-Anything: All-in-One RAG Framework"
Community maintained hardware plugin for vLLM on Apple Silicon
🧠「大模型」2小时完全从0训练64M的小参数LLM!Train a 64M-parameter LLM from scratch in just 2h!
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.
Demo on using Spark GraphFrames presented on PyData Tallinn Nov/25
Natural Gradient Boosting for Probabilistic Prediction
On the Theoretical Limitations of Embedding-Based Retrieval
An R interface to the DW-NOMINATE roll call scaling program
🍃 Organic Maps is a free Android & iOS offline maps app for more than 6M travelers, tourists, hikers, and cyclists. It uses crowd-sourced OpenStreetMap data and is developed with love by the commun…
A platform for community discussion. Free, open, simple.
⚡ TabPFN: Foundation Model for Tabular Data ⚡
Calculate threshold, CV, and VO2max paces from race performances from 800m-10k
This hands-on walks you through fine-tuning an open source LLM on Azure and serving the fine-tuned model on Azure. It is intended for Data Scientists and ML engineers who have experience with fine-…
ConfliBERT: A Pre-trained Language Model for Political Conflict and Violence (NAACL 2022)
Efficient few-shot learning with Sentence Transformers
VIINA: Violent Incident Information from News Articles on the 2022 Russian Invasion of Ukraine