Highlights
- Pro
Stars
The API to search, scrape, and interact with the web at scale. 🔥
Ain sport a library for online action spotting for video soccer games.
Metadata extraction and validation in scientific papers
Synthetic data curation for post-training and structured data extraction
A python package made to generate sequences (greedy and beam-search) from Pytorch (not necessarily HF transformers) models.
JAX - A curated list of resources https://github.com/google/jax
Minimal library to train LLMs on TPU in JAX with pjit().
LLM training code for Databricks foundation models
Implementation of many Arabic NLP and CV projects. Providing real time experience using many interfaces like web, command line and notebooks.
Arabic Tokenization Library. It provides many tokenization algorithms.
The dataset and code for paper: TheoremQA: A Theorem-driven Question Answering dataset
Easily fine-tune, evaluate and deploy Gemma 4, Qwen3.5, Qwen3.6, gpt-oss, DeepSeek-R1, or any open source LLM / VLM!
Instruction dataset for Arabic with 10,000 instruction and output pairs. CIDAR can be used to fine-tune LLMs to follow instructions.
A data preprocessor for the Quranic Treebank using neural networks. Divides longer verses into smaller chunks.
Hey 👋, Glad to see you here! Check out this repository to learn more about me 🤓. You can also use it to make your awesome GitHub README ✨ (Don't Just Fork, Star Too 😅)
[NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"
تفريغ النصوص وإنشاء ملفات SRT و VTT باستخدام نماذج Whisper وتقنية wit.ai.
Maha is a text processing library specially developed to deal with Arabic text.
Several deep learning models for restoring Arabic diacritics using Pytorch.
Reproduce results and replicate training fo T0 (Multitask Prompted Training Enables Zero-Shot Task Generalization)
This directory gathers the tools developed by the Data Sourcing Working Group
Toolkit for creating, sharing and using natural language prompts.
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
Set of functionalities enable Arabic website developers to serve professional search, present and process Arabic content in PHP
End to end Arabic TTS system based on tacotron