3Dprinting (175) A.I. (720) animation (338) blender (196) colour (229) commercials (49) composition (151) cool (359) design (631) Featured (68) hardware (306) IOS (109) jokes (134) lighting (282) modeling (124) music (185) photogrammetry (176) photography (750) production (1247) python (85) quotes (485) reference (309) software (1325) trailers (295) ves (535) VR (219)
👍 SOTA Performance: Wan2.1 consistently outperforms existing open-source models and state-of-the-art commercial solutions across multiple benchmarks.
🚀 Supports Consumer-grade GPUs: The T2V-1.3B model requires only 8.19 GB VRAM, making it compatible with almost all consumer-grade GPUs. It can generate a 5-second 480P video on an RTX 4090 in about 4 minutes (without optimization techniques like quantization). Its performance is even comparable to some closed-source models.
🎉 Multiple tasks: Wan2.1 excels in Text-to-Video, Image-to-Video, Video Editing, Text-to-Image, and Video-to-Audio, advancing the field of video generation.
🔮 Visual Text Generation: Wan2.1 is the first video model capable of generating both Chinese and English text, featuring robust text generation that enhances its practical applications.
💪 Powerful Video VAE: Wan-VAE delivers exceptional efficiency and performance, encoding and decoding 1080P videos of any length while preserving temporal information, making it an ideal foundation for video and image generation.
https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files
https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/example%20workflows_Wan2.1
https://huggingface.co/Wan-AI/Wan2.1-T2V-14B
https://huggingface.co/Kijai/WanVideo_comfy/tree/main
This is convenient for captioning videos, understanding social dynamics, and for specific cases such as sports analytics, or detecting when drivers or operators are distracted.
https://huggingface.co/spaces/moondream/gaze-demo
https://moondream.ai/blog/announcing-gaze-detection
https://github.com/SkyworkAI/SkyReels-V1
All-in-one AI platform for video creation, including voiceover, lipsync, SFX, and editing. One click turn text to video & image to video. Turns idea into stunning video in minutes. Check Pricing Details. Start For Free. All-In-One Platform.
SkyReels-V1 is purpose-built for AI short video production based on Hynyuan. It achieves cinematic-grade micro-expression performances with 33 nuanced facial expressions and 400+ natural body movements that can be freely combined. The model integrates film-quality lighting aesthetics, generating visually stunning compositions and textures through text-to-video or image-to-video conversion – outperforming all existing open-source models across key metrics.
https://huggingface.co/stepfun-ai/stepvideo-t2v
The model generates videos up to 204 frames, using a high-compression Video-VAE (16×16 spatial, 8x temporal). It processes English and Chinese prompts via bilingual text encoders. A 3D full-attention DiT, trained with Flow Matching, denoises latent frames conditioned on text and timesteps. A video-based DPO further reduces artifacts, enhancing realism and smoothness.
MiniMax is thrilled to announce the release of the MiniMax-01 series, featuring two groundbreaking models:
MiniMax-Text-01: A foundational language model.
MiniMax-VL-01: A visual multi-modal model.
Both models are now open-source, paving the way for innovation and accessibility in AI development!
🔑 Key Innovations
1. Lightning Attention Architecture: Combines 7/8 Lightning Attention with 1/8 Softmax Attention, delivering unparalleled performance.
2. Massive Scale with MoE (Mixture of Experts): 456B parameters with 32 experts and 45.9B activated parameters.
3. 4M-Token Context Window: Processes up to 4 million tokens, 20–32x the capacity of leading models, redefining what’s possible in long-context AI applications.
💡 Why MiniMax-01 Matters
1. Innovative Architecture for Top-Tier Performance
The MiniMax-01 series introduces the Lightning Attention mechanism, a bold alternative to traditional Transformer architectures, delivering unmatched efficiency and scalability.
2. 4M Ultra-Long Context: Ushering in the AI Agent Era
With the ability to handle 4 million tokens, MiniMax-01 is designed to lead the next wave of agent-based applications, where extended context handling and sustained memory are critical.
3. Unbeatable Cost-Effectiveness
Through proprietary architectural innovations and infrastructure optimization, we’re offering the most competitive pricing in the industry:
$0.2 per million input tokens
$1.1 per million output tokens
🌟 Experience the Future of AI Today
We believe MiniMax-01 is poised to transform AI applications across industries. Whether you’re building next-gen AI agents, tackling ultra-long context tasks, or exploring new frontiers in AI, MiniMax-01 is here to empower your vision.
✅ Try it now for free: hailuo.ai
📄 Read the technical paper: filecdn.minimax.chat/_Arxiv_MiniMax_01_Report.pdf
🌐 Learn more: minimaxi.com/en/news/minimax-01-series-2
💡API Platform: intl.minimaxi.com/
Mochi 1 AI operates on a pay-as-you-go model, meaning you only pay for the services you utilize without any hidden fees.
https://scaniverse.com/news/spz-gaussian-splat-open-source-file-format
https://github.com/nianticlabs/spz
• Slashes file sizes by 90% (250MB → 25MB) with virtually zero quality loss
• Lightning-fast uploads/downloads, especially on mobile
• Dramatically reduced memory footprint
• Enables real-time processing right on your phone
Tech breakthrough:
• Smart compression of position, rotation, color & scale data
• Column-based organization for maximum efficiency
• Innovative fixed-point quantization & log encoding
https://www.8thwall.com/products/niantic-studio
https://www.producthunt.com/products/motionity
Motionity is an free and open source animation editor in the web. It’s a mix of After Effects and Canva, with powerful features like keyframing, masking, filters, and more, and integrations to browse for assets to easily drag and drop into your video.
The model is fast, producing a 2.25-megapixel depth map in 0.3 seconds on a standard GPU.
https://github.com/apple/ml-depth-pro
https://arxiv.org/pdf/2410.02073
FLUX (or FLUX. 1) is a suite of text-to-image models from Black Forest Labs, a new company set up by some of the AI researchers behind innovations and models like VQGAN, Stable Diffusion, Latent Diffusion, and Adversarial Diffusion Distillation
https://fal.ai/models/fal-ai/flux
https://github.com/black-forest-labs/flux
It comes in three models:
ComfyUI implementation :
https://github.com/tin2tin/Pallaidium/
Text to video | Text to audio |
Text to speech | Text to image |
Image to image | Image to video |
Video to video | Image to text |
ControlNet | OpenPose |
ADetailer | IP Adapter Face/Style |
Canny | Illusion |
Multiple LoRAs | Segmind distilled SDXL |
Seed | Quality steps |
Frames | Word power |
Style selector | Strip power |
Batch conversion | Batch refinement of images. |
Batch upscale & refinement of movies. | Model card selector. |
Render-to-path selector. | Render finished notification. |
Model Cards | One-click install and uninstall dependencies. |
User-defined file path for generated files. | Seed and prompt added to strip name. |
COLLECTIONS
| Featured AI
| Design And Composition
| Explore posts
POPULAR SEARCHES
unreal | pipeline | virtual production | free | learn | photoshop | 360 | macro | google | nvidia | resolution | open source | hdri | real-time | photography basics | nuke
FEATURED POSTS
Social Links
DISCLAIMER – Links and images on this website may be protected by the respective owners’ copyright. All data submitted by users through this site shall be treated as freely available to share.