Stars
Official repository of "FMA-Net++: Motion- and Exposure-Aware Real-World Joint Video Super-Resolution and Deblurring"
Implementation of Robust Template Matching Using Scale-Adaptive Deep Convolutional Features
DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model. NeurIPS 2024 Spotlight.
An Open Source Python alternative to NotebookLM's podcast feature: Transforming Multimodal Content into Captivating Multilingual Audio Conversations with GenAI
Master programming by recreating your favorite technologies from scratch.
Generative Models by Stability AI
🎥 Create youtube videos from a text prompt in seconds
YASGU : Youtube Automatised Shorts Generator And Uploader. YASGU is a tool to generate and upload Youtube Shorts videos automatically.
🔊 Text-Prompted Generative Audio Model
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
PySpark Tutorial for Beginners - Practical Examples in Jupyter Notebook with Spark version 3.4.1. The tutorial covers various topics like Spark Introduction, Spark Installation, Spark RDD Transform…
Image Composition via Stable Diffusion
Inpaint anything using Segment Anything and inpainting models.
Nightly release of ControlNet 1.1
🧙🏻♂️A list of papers curated for you to dive into the Awesome Radiance Field-based 3D Editing.
Real time face detection and recognition system supports multiple cameras streaming
This is a facerecognition model made using python. It mainly gets a face img as a input and identifies that face in a video.
a surveillance system for CCTV cameras which recognizes selected multiple target individuals and tracks in real time across multiple cameras, with detection, recognition, and kernel-based tracking …
Accepted as [NeurIPS 2024] Spotlight Presentation Paper
Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models
[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
Drag and drop page builder library written in vanilla javascript without dependencies or build tools.
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
Productive, portable, and performant GPU programming in Python.
A curated list of awesome Taichi applications, courses, demos and features.
🚀🎬 ShortGPT - Experimental AI framework for youtube shorts / tiktok channel automation
Refine high-quality datasets and visual AI models
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding