-
Apple Inc.
- San Francisco
- https://www.linkedin.com/in/singhkaranjeet
Stars
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running mat…
Statistical Machine Intelligence & Learning Engine
Example code from Learning Spark book
Java version of the Playwright testing and automation library
A scalable, mature and versatile web crawler based on Apache Storm
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
WInte.r is a Java framework for end-to-end data integration. The WInte.r framework implements well-known methods for data pre-processing, schema matching, identity resolution, data fusion, and resu…
ImageCat is an Apache OODT RADIX application that uses Apache Solr, Apache Tika and Apache OODT to ingest 10s of millions of files (images,but could be extended to other files) in place, and to ext…
DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate removal, language detection, and near-duplicate removal.
Extraction code used to create the Dresden Web Table Corpus