- Paris
-
10:15
(UTC +01:00) - https://portizs.eu
- https://orcid.org/0000-0003-0343-8852
- @pjox13
- @pjox@mastodon.social
- @pjox.bsky.social
Highlights
- Pro
-
-
-
cc-downloader Public
Forked from commoncrawl/cc-downloaderA polite and user-friendly downloader for Common Crawl data
Rust Apache License 2.0 UpdatedJan 13, 2025 -
-
oscar2parquet Public
Converts OSCAR's jsonl files into parquet
-
scdx Public
Forked from thunderpoot/scdxA simple tool for querying the Common Crawl CDX
Rust MIT License UpdatedFeb 27, 2024 -
isogloss Public
Forked from thunderpoot/isoglossISO 639 and IETF Language Code Lookup Tool
Python MIT License UpdatedFeb 1, 2024 -
-
-
-
-
ctclib Public
Forked from Uinelj/ctclibA collection of utilities related to CTC
Rust MIT License UpdatedDec 18, 2023 -
-
-
-
-
oscar-utils Public
A new set of utilities to work with the OSCAR Corpus
-
rust-html2text Public
Forked from jugglerchris/rust-html2textRust library to render HTML as text.
Rust MIT License UpdatedApr 2, 2023 -
-
-
wowchemy-hugo-themes Public
Forked from HugoBlox/hugo-blox-builder🔥 Hugo website builder, Hugo themes & Hugo CMS. No code, build with widgets! 创建在线课程,学术简历或初创网站。
SCSS MIT License UpdatedSep 16, 2022 -
-
-
-
-
latex-mimosis Public template
Forked from Pseudomanifold/latex-mimosisA minimal & modern LaTeX template for your (bachelor's | master's | doctoral) thesis
TeX MIT License UpdatedSep 15, 2021 -
datasets Public
Forked from huggingface/datasets🤗 The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools
Python Apache License 2.0 UpdatedFeb 12, 2021 -
-
LEM17 Public
Forked from e-ditiones/LEM17Data and models for lemmatising and POS-tagging modern French (16-18th c.)
Shell UpdatedDec 13, 2020 -
cc_net Public
Forked from facebookresearch/cc_netTools to download and cleanup Common Crawl data
Python Other UpdatedSep 3, 2020