-
Around Data
- Mumbai
- https://medium.com/@achary.roja
Stars
Python package for consolidated and extensive Univariate,Bivariate Data Analysis and Visualization catering to both categorical and continuous datasets.
Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here 👇🏼
Tesseract Open Source OCR Engine (main repository)
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.
✨Fast Coreference Resolution in spaCy with Neural Networks
☁️ Build multimodal AI applications with cloud-native stack
⚒️ Data preprocessing is the process of transforming raw data into an understandable format. It is also an important step in data mining as we cannot work with raw data. The quality of the data sho…
A full spaCy pipeline and models for scientific/biomedical documents.
🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy
Multiplatform plotting library based on the Grammar of Graphics
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools
Datasets, Transforms and Models specific to Computer Vision
Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation).
ImageMagick is a free, open-source software suite for creating, editing, converting, and displaying images. It supports 200+ formats and offers powerful command-line tools and APIs for automation, …
Synthetic data generators for tabular and time-series data
🪐 End-to-end NLP workflows from prototype to production
AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data…