- Los Angeles, CA
- https://pkmital.com
- @pkmital
Highlights
- Pro
Stars
Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels with Hunyuan3D World Model
From Images to High-Fidelity 3D Assets with Production-Ready PBR Material
Python's missing "algorave" module. Live code music with Python using MIDI, OSC and/or SuperCollider.
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Silero Models: pre-trained text-to-speech models made embarrassingly simple
Muzic: Music Understanding and Generation with Artificial Intelligence
Instant voice cloning by MIT and MyShell. Audio foundation model.
A multi-voice TTS system trained with an emphasis on quality
[WIP] VoiceSmith makes training text to speech models easy.
A repository for storing models that have been inter-converted between various frameworks. Supported frameworks are TensorFlow, PyTorch, ONNX, OpenVINO, TFJS, TFTRT, TensorFlowLite (Float32/16/INT8…
Official implementation of "Separate Anything You Describe"
A new timeline addon for openframeworks.
Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
Share, discover, and collect prompts from the community. Free and open source — self-host for your organization with complete privacy.
Collection of audio-focused loss functions in PyTorch
Tracking states of the arts and recent results (bibliography) on sound tasks.
The “Quite OK Audio Format” for fast, lossy audio compression
This toolbox aims to unify audio generation model evaluation for easier comparison.
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)
A novel diffusion-based model for synthesizing long-context, high-fidelity music efficiently.
Audio generation using diffusion models, in PyTorch.
A collection of pre-trained audio models, in PyTorch.
A collection of resources and papers on Diffusion Models
"Automatic Language-Agnostic Subtitle Synchronization"
Wavelet scattering transforms in Python with GPU acceleration
Trainer for audio-diffusion-pytorch
A generative network for animal vocalizations. For dimensionality reduction, sequencing, clustering, corpus-building, and generating novel 'stimulus spaces'. All with notebook examples using freely…