Stars
A PyTorch native platform for training generative AI models
🔥 The Web Data API for AI - Turn entire websites into LLM-ready markdown or structured data
Deprecated Catapult GitHub. Please instead use http://crbug.com "Speed>Benchmarks" component for bugs and https://chromium.googlesource.com/catapult for downloading and editing source code..
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama mode…
Data set of top third party web domains with rich metadata about them
Fast computation of Krippendorff's alpha agreement measure in Python.
FirmSec Dataset
⚙️ Convert HTML to Markdown. Even works with entire websites and can be extended through rules.
Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"
XMap is a fast network scanner designed for performing Internet-wide IPv6 & IPv4 network research scanning.
Open source annotation tool for machine learning practitioners.
Toolchain to retrieve and parse privacy policies from websites as described in our paper "Unifying Privacy Policy Detection" by Henry Hosseini, Martin Degeling, Christine Utz, and Thomas Hupperich.…
Python client for Baidu Yun (Personal Cloud Storage) 百度云/百度网盘Python客户端
A tokenizer and sentence splitter for German and English web and social media texts.
Privacy browser extension using machine learning to summarize privacy policies
Artifacts of the paper "Arcanum: Detecting and Evaluating the Privacy Risks of Browser Extensions on Web Pages and Web Content" in USENIX Security Symposium 2024
A simple and efficient llama3 local service deployment solution that supports real-time streaming response and is optimized for common Chinese character garbled characters.
A script tool which recut the original llama3_70B_instruct model into 2 or 4 shards, so that one can run the model efficiently on a `2x80GB` or `4x40GB` GPUs environments.
Source code of PurPliance analysis tool.
Repository for the CookieBlock browser extension, which automatically enforces user privacy policy on browser cookies.
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
PoliGraph: Automated Privacy Policy Analysis using Knowledge Graphs
Classifier and Feature Extraction scripts used for the CookieBlock extension.