Skip to content
Change the repository type filter

All

    Repositories list

    • InterPARES-Audio is an automated, end-to-end pipeline that transforms complex, lengthy audio files into structured, insightful text
      Jupyter Notebook
      0100Updated Dec 8, 2025Dec 8, 2025
    • An AI-powered conversational assistant designed to help users explore and understand InterPARES (International Research on Permanent Authentic Records in Electronic Systems) documents through natural language queries.
      Python
      0000Updated Dec 4, 2025Dec 4, 2025
    • InterPARES-Vision is an advanced OCR (Optical Character Recognition) and layout analysis tool designed specifically for archival documents. It combines state-of-the-art AI vision models to extract text, preserve document structure, and generate machine-readable outputs from scanned documents and images.
      Python
      0000Updated Dec 4, 2025Dec 4, 2025
    • nilechat

      Public
      2700Updated Nov 11, 2025Nov 11, 2025
    • This repository contains the evaluation code and data for the PalmX 2025 Shared Task on Benchmarking LLMs for Arabic and Islamic Culture.
      Python
      1100Updated Sep 3, 2025Sep 3, 2025
    • Toucan

      Public
      1600Updated Sep 2, 2025Sep 2, 2025
    • sahara

      Public
      Benchmarking African NLP
      Python
      0100Updated Aug 18, 2025Aug 18, 2025
    • palm

      Public
      Python
      63100Updated Aug 8, 2025Aug 8, 2025
    • pearl

      Public
      An official repository for the paper “Pearl: A Multimodal, Culturally-Aware Arabic Instruction Dataset.”
      Python
      0500Updated May 29, 2025May 29, 2025
    • afrolid

      Public
      AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.
      Python
      103411Updated Mar 10, 2025Mar 10, 2025
    • 0000Updated Mar 4, 2025Mar 4, 2025
    • uDistil-Whisper: Label-Free Data Filtering for Knowledge Distillation in Low-Data Regimes ( NAACL'2025 )
      Python
      0210Updated Feb 11, 2025Feb 11, 2025
    • SPARROW

      Public
      EMNLP 2023
      0300Updated Feb 7, 2025Feb 7, 2025
    • peacock

      Public
      This is the official repository for Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks.
      22610Updated Dec 9, 2024Dec 9, 2024
    • Jupyter Notebook
      1910Updated Oct 10, 2024Oct 10, 2024
    • VioletV2

      Public
      Python
      0000Updated Sep 11, 2024Sep 11, 2024
    • Cheetah

      Public
      2400Updated Aug 12, 2024Aug 12, 2024
    • MoS

      Public
      Python
      0000Updated Aug 7, 2024Aug 7, 2024
    • llmas

      Public
      Python
      0300Updated Aug 7, 2024Aug 7, 2024
    • copticmt

      Public
      Python
      0000Updated Jul 7, 2024Jul 7, 2024
    • A repo for Fumbling in Babel paper at NAACL2024: https://aclanthology.org/2024.findings-naacl.274/
      0000Updated Jul 5, 2024Jul 5, 2024
    • AraNet

      Public
      Python
      82104Updated Jun 15, 2024Jun 15, 2024
    • HTML
      0000Updated Jun 11, 2024Jun 11, 2024
    • fintral

      Public
      0810Updated Jun 5, 2024Jun 5, 2024
    • araT5

      Public
      AraT5: Text-to-Text Transformers for Arabic Language Understanding
      2493111Updated May 16, 2024May 16, 2024
    • octopus

      Public
      Octopus is a neural machine generation toolkit for Arabic Natural Lnagauge Generation (NLG)
      Python
      1910Updated Apr 29, 2024Apr 29, 2024
    • nadi

      Public
      Nuanced Arabic Dialect Identification Shared Tasks (NADI) 2020 and 2021
      Python
      2500Updated Mar 4, 2024Mar 4, 2024
    • Repo for Project RP06 at I Trust AI project
      0000Updated Feb 7, 2024Feb 7, 2024
    • orca

      Public
      ORCA is a large-scale Arabic Language Understanding Evaluation Benchmark
      Python
      3820Updated Oct 30, 2023Oct 30, 2023
    • serengeti

      Public
      SERENGETI: Massively Multilingual Language Models for Africa
      Jupyter Notebook
      21710Updated Oct 26, 2023Oct 26, 2023