Skip to content
View karanjeets's full-sized avatar

Organizations

@USCDataScience

Block or report karanjeets

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

Java 419 138 Updated Mar 30, 2023

Conceptual - Temporal - Spatial analysis of the trec polar dataset

JavaScript 11 8 Updated Jan 4, 2023

DataComp: In search of the next generation of multimodal datasets

Python 770 65 Updated Apr 28, 2025

Materials for the Learn PyTorch for Deep Learning: Zero to Mastery course.

Jupyter Notebook 17,135 4,729 Updated Feb 11, 2026

This repository contains my full work and notes on Coursera's NLP Specialization (Natural Language Processing) taught by the instructor Younes Bensouda Mourri and Łukasz Kaiser offered by deeplearn…

Jupyter Notebook 750 504 Updated Jun 28, 2021

This repository contains implementations and illustrative code to accompany DeepMind publications

Jupyter Notebook 14,689 2,846 Updated Feb 10, 2026

Repository using NLP techniques such as Transformers, Frequency analysis, document similarity at Warren Buffets texts.

Jupyter Notebook 39 16 Updated May 31, 2021

A crash course in six episodes for software developers who want to become machine learning practitioners.

Jupyter Notebook 2,834 912 Updated May 3, 2024

All Algorithms implemented in Python

Python 217,616 50,044 Updated Feb 2, 2026

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 156,359 32,035 Updated Feb 11, 2026

Java version of the Playwright testing and automation library

Java 1,442 272 Updated Feb 1, 2026

Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.

TypeScript 82,455 5,114 Updated Feb 11, 2026

A community driven list of useful Scala libraries, frameworks and software.

Python 9,206 1,265 Updated Sep 20, 2024

Extraction code used to create the Dresden Web Table Corpus

Java 14 8 Updated Feb 25, 2015

A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning

20,123 2,513 Updated Feb 9, 2026

Postman for protobuf APIs

TypeScript 458 51 Updated Jun 4, 2024

The Distributed Release Audit Tool (DRAT) for code analysis and verification.

JavaScript 8 1 Updated Jul 20, 2023

Various Dockerfiles I use on the desktop and on servers.

Dockerfile 13,958 2,532 Updated Jul 6, 2024

A set of sbt-native-pakager examples

Scala 235 55 Updated Apr 8, 2020

Short tutorial for TensorFlow, designed to be presented in-person

Jupyter Notebook 299 223 Updated Sep 24, 2016

The repository contains Google's robots.txt parser and matcher as a C++ library (compliant to C++11).

C++ 3,450 246 Updated Aug 2, 2024

A topic-centric list of HQ open datasets.

72,725 11,127 Updated Jan 30, 2026

Notes talking about the design and implementation of Apache Spark

5,357 1,837 Updated Apr 2, 2024

High-performance Arrow and Task in Scala

Scala 238 17 Updated Sep 17, 2018

Wonderful reusable code from Twitter

Scala 2,724 578 Updated Dec 8, 2025

A curated list of awesome Machine Learning frameworks, libraries and software.

Python 71,628 15,305 Updated Jan 29, 2026

✨Fast Coreference Resolution in spaCy with Neural Networks

C 2,892 472 Updated Apr 13, 2023

WInte.r is a Java framework for end-to-end data integration. The WInte.r framework implements well-known methods for data pre-processing, schema matching, identity resolution, data fusion, and resu…

Java 113 32 Updated May 20, 2022

DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate removal, language detection, and near-duplicate removal.

Java 52 7 Updated Jun 12, 2020
Next