AI is changing faster than developers can keep up. This repo is a monthly collection of AI news and resources for developers.
- Ollama Now Supports Codex App: Select an open Ollama model to use with the Codex App.
- Grok Build Beta: An agentic CLI for coding.
- Codex in ChatGPT Mobile: Codex is coming to your phone.
- GitHub Copilot App: Now available in technical preview.
- Thinking Machines Interaction Models: A Scalable Approach to Human-AI Collaboration.
- Introducing Perceptron Mk1: Frontier video and embodied reasoning model.
- Introducing Googlebook: Gemini-powered laptop.
- GPT-Realtime-2, Translate & Transcribe: Advancing voice intelligence with new models in the API.
- Grok Voice: Custom Voices and Voice Library.Docs, blog.
- Code with Claude: Anthropic’s developer conference.
- VS Code: Agentic browser and OpenTelemetry tracing.
- Bitrig: Now works with Xcode projects from GitHub.
- Grok Imagine: Quality Mode API.
- Deepgram Flux Multilingual: Conversational Speech Model for Global Voice Agents.
- MoshiRAG: Asynchronous Knowledge Retrieval for Full-Duplex Speech Language Models.
- Vision Agents v0.5.5: Open-Source framework for building voice/video/vision AI apps.
- Introducing GPT‑5.5: A new class of intelligence for real work.
- Grok Voice Think Fast 1.0: Realtimew voice agent API like Gemini Live.
- Introducing DeepSeek v4 family of models.
- Introducing Soniox Text-to-Speech:The voice platform for every language.
- HyperFrames: Plugin for Codex.
- Voice-Controlled Apps: Using gpt-realtime-1.5.
- Meta Sapiens 2: High-resolution transformers for humancentric vision, focused on generalization.
- Bitrig: You can now build iOS, macOS, and watchOS apps with AI.
- Introducing ChatGPT Images 2.0: A new era of image generation.
- Vision Banana: Image Generators are Generalist Vision Learners
- Qwen3.6-27B: Coding in a 27B Dense Model. GitHub, Hugging Face
- Kimi K2.6: Advancing Open-Source Coding.
- Open Agents: Spawn coding agents that run infinitely in the cloud.
- Introducing Claude Design by Anthropic Labs: Make prototypes, slides, and one-pagers by talking to Claude.
- Introducing Claude Opus 4.7.
- Introducing GPT‑Rosalind: For life sciences research.
- Gemini 3.1 Flash TTS: Google's next generation of expressive AI speech
- Meet Qwen3.6-35B-A3B: Now Open-Source. X post 🚀🚀.
- Introducing Gemini Robotics ER 1.6: New SOTA robotics model 🤖 which excels at visual and spacial reasoning.
- Moss-TTS: TTS foundation model for real-world voice apps.
- Gradium Phonon: On-device text-to-speech.
- Meta Muse Spark: Meta's personal superintelligence model.
- Gemma 4
- Cursor 3
- Bitrig: You can now build Mac apps with AI
- MAI-Transcribe-1: State of the Art Speech Recognition.
- Qwen3.6-Plus
- Gemini 3.1 Flash Live: New realtime model to build voice and vision agents.
- Voxtral TTS: New text-to-speech model from Mistral.
- OpenPencil: Open-source AI-Native design tool. Design as code.
- Blitz: A free, open-source macOS app that gives AI agents full control over iOS and macOS development. blitz.dev.
- The new Stitch by Google: Transform ideas into UI designs for mobile and web apps.
- Composer 2: Now available in Cursor.
- Firebase: In Google AI Studio.
- Introducing GPT‑5.4 mini and nano
- Mistral Moderation 2
- Grok Voice: API release.
- Stich SDK by Google
- Ollama: Now an official provider for OpenClaw.
- Unsloth Studio: Open-source web UI to train and run LLMs.
- Codex App: Personalize with themes.
- OpenAI Video API: Powered by Sora 2.
- TADA: Hume AI's first open source TTS model.
- NVIDIA Nemotron 3 Super
- Fish Audio S2: Expressive TTS with controllable emotion.
- ChatGPT: For interactive learning of math and science.
- Introducing Expo Agent
- Gemini powered Docs, Sheets, Slides
- Claude Code Marketplace
- Introducing GPT‑5.4
- Gemini 3.1 Flash Lite
- Assembly AI Universal-3 Pro Streaming: Real-time transcription model for voice agents.
- Qwen 3.5: Small model series
- Codex Windows App: The OpenAI Codex app is now on Windows.
- AssemblyAI Universal-3-Pro: Real-time transcription for agents.
- SwiftUI Agent Skills
- Tiny Aya by Cohere Labs: Making Multilingual AI Accessible.
- Foundation Models SDK for Python
- Nano Banana 2
- gpt-realtime-1.5: OpenAI's new speech-to-speech model for building voice workflows.
- Vercel Chat SDK: One codebase, every chat platform.
- Gemini 3.1 Pro
- Lyria 3: Custom music generation with Gemini
- Sonnet 4.6
- OpenClaw Joins OpenAI
- Read: Something big is happening.
- Qwen 3.5: Open-weight model in the Qwen3.5 series.
- Cursor Composer 1.5
- Agent Skills Launch Party: 17th Feb, 2026 by Anthropic.
- Introducing GPT-5.3-Codex
- Introducing Claude Opus 4.6
- OpenAI Frontier: A platform that helps enterprises build, deploy, and manage AI coworkers that can do real work.
- OpenAI Codex macOS app: Quick start
- Codex in Xcode 26.3
- Claude Agent SDK in Xcode 26.3: Get the full power of Claude Code directly in Xcode.
- Agentic Coding in Xcode
- Voxtral Transcribe 2: Mistral Speech-to-Text models with state-of-the-art transcription quality & diarization.
- Anthropic Cowork: For Windows.
- Cursor Plugins: Extend the agent with plugins.
- Project Genie: Create, edit, and explore virtual worlds.
- Qwen3-ASR: Now open-source.
- Grok Imagine API: State-of-the-art video generation.
- Kimi Code: Kimi CLI Coding.
- Kimi K2.5: Aesthetic Coding x Agent Swarm.
- NVIDIA PersonaPlex: The most Natural speech-to-speech conversational AI.
- Introducing Agentic Vision in Gemini 3 Flash
- Ollama Launch: A new Ollama command that sets up and runs your favorite coding tools like Claude Code, OpenCode, and Codex.
- Clawdbot: The AI that does everything.
- Ollama Image Generation:: Generate images on macOS & Windows.
- Ollama + Claude Code: Run Claude Code locally with open-source models.
- Ollama + OpenAI Codex: Run Codex locally with open-source models.
- RalphTUI: An AI Agent Loop Orchestrator.
- Chroma 1.0: Open-source real-time speech-to-speech model. Quick start.
- GLM-4.7-Flash: Local coding and agentic assistant.
- Heartmula: Local and open-source AI music generator. Tutorial. Read the research paper.
- Blackbox CLI: Run Claude Code, Codex, Gemini CLI, + others in a single CLI.
- ChatGPT Go: Low-cost ChatGPT subscription.
- TranslateGemma: A new suite of open translation models.
- openresponses.org: An open-source spec for building multi-provider LLM interfaces. X post.
- GPT-5.2 Codex in OpenAI API: In GitHub Copilot, Cursor, Warp.
- The FreeMoCap project: X post. A free motion capture for everyone.
- Anthropic Cowork: Claude Code for non-technical tasks.
- Kyutai Pocket TTS: A high quality TTS that gives your CPU a voice.
- Advancing Claude: Healthcare and the life sciences.
- UCP: Universal Commerce Protocol on Google.
- NVIDIA Alpamayo: Reason-driven AI model for autonomous vehicles.
- LTX-2 Model: Best open-source multimodal AI video generation.
- ChatGPT Health
- NVIDIA New Open Models
- G0 Plus VLA: Pick up anything AI model.
- FlowDeck: Build and ship SwiftUI apps without leaving Cursor.
- skillseekersweb.com/: Automatically convert documentation websites, GitHub repositories, and PDF files into production-ready skills for any LLM platform—Claude, Gemini, OpenAI
- Manus joins Meta
- Minimax 2.1: Real-world agentic coding
- GLM-4.7: Advanced AI coding
- Grok Voice Agent API: Realtime Speech-to-Speech
- Gemini 3 Flash
- Introducing GPT-5.2-Codex
- Qwen Image Layered
- Molmo 2: Video understanding AI model
- gpt-image-1.5
- Zoom Federated AI
- SAM Audio: Multimodal Model for Audio Separation
- Qwen Code v0.5.0
- Google Code Wiki: Auto-generate Architectural diagrams for code
- Nemotron 3 Family of Open Models
- Gemini Interactions API
- Build iPhone apps on iPhone
- GPT-5.2, X post
- Gemini Text-to-Speech models
- Visual editor for Cursor Browser, X post
- Agentic AI Foundation: Anthropic + OpenAI
- Devstral 2 and Mistral Vibe CLI
- DeepSeek-V3.2
- Mistral 3
- Qwen3-TTS
- VoxCPM Text-to-Speech
- Claude 4.5 Opus: World's best coding model
- Gemini 3
- Google Antigravity: AI-Assisted IDE
- Grok 4.1
- Meta Sam 3: Segment Anything AI Model
- GPT-5.1: A smarter, more conversational ChatGPT
- Kimi K2 Thinking
- Gemini Built-In RAG: File Search in the API
- ElevenLabs Scribe v2 Realtime: Speech-to-Text
- Vision Agents: Build vision/voice/video AI apps in Python
- Claude Developer Platform (API): Structured outputs
- Anthropic is building its own AI infrastructure
- Cursor 2.0: Redesigned Agentic UI
- MiniMax M2: For Efficient Agentic Coding
- Neo: Humanoid Robot
- ChatGPT Atlas AI Browser
- Gemini Vibe Coding in Google AI Studio
- Claude Haiku 4.5
- Qwen3-VL in Ollama Cloud
- Gemini 2.5 Computer Use model
- OpenAI Agent Builder
- Apps in ChatGPT
- Apps SDK in ChatGPT
- Moondream 3 Preview on fal
- Grok Imagine
- GPT-5 Pro in the API
- OpenAI DevDay 2025
- Claude Agent SDK
- ElevenLabs Agent Workflow
- OpenAI Sora 2
- Claude Agents SDK
- Claude Sonnet 4.5
- OpenAI + Stripe Agentic Commerce Protocol
- ChatGPT Parental Control
- Gemini Robotics-ER 1.5, Blog, Research Paper
- GitHub Copilot CLI
- Grok 4 Vision
- OpenAI function calling update
- Subagents in Claude Code
- ChatGPT Pulse
- Kimi Ok Computer: Agent mode for Kimi Chat
- Mooondream 3 Preview
- Meta Code World Model (CWM)
- Qwen3-VL
- Qwen3-TTS API: X post
- Qwen3-Omni: Text, image, audio & video model. X post
- Ollama Cloud Models: Run larger models locally with fast, datacenter-grade hardware
- DeepSeek-V3.1-Terminus
- OpenAI & NVIDIA partnership
- ElevenLabs Studio 3.0
- Google GenKit Go 1.0
- Stitch by Google: New features
- Meta Ray-Ban
- Gemini in Chrome
- GPT-5-Codex, Blog post: A version of GPT-5 further optimized for agentic coding in Codex
- GPT-5 now built-in in Xcode 26
- AgentScope: Agent-oriented programming for building LLM apps
- sosumi.ai: Making Apple docs AI-readable
- Kimi K2-0905 update
- Google on-device AI: EmbeddingGemma
- Qwen3-Max-Preview
- Claude Sonnet 4 in Xcode 26 Beta 7
- GPT Realtime
- Google vids.new
- Gemini 2.5 Flash Image Generation: Blog, X
- Claude for Chrome
- VibeVoice
- Agents.md
- GPT 5
- Cursor CLI
- Codex CLI in Cursor & VS Code
- Grok Code Fast 1 in Cursor & kilocode.ai
- Open models by OpenAI
- GPT-OSS Playground
- Claude Opus 4.1
- v0.app, X post
- ElevenLabs Music
- Genie 3, X post
- Gemini Storybook
- Qwen Image, Blog
- Qwen Image Edit
- Gemma 3 270M
- DINOv3
- Swift Agent: Swift SDK for building AI agents
- ElevenLabs Next.js Starter Kit, Next.js Playground
- Eleven v3
- ElevenLabs Video-to-Music, X post
- DeepSeek V3.1
- Qoder: Agentic coding platform
- Fireplexity
- Cartesia Line SDK
- Google Opal: Build mini AI apps
- ChatGPT study mode
- GLM-4.5 model: Reasoning, Coding, and Agentic model
- Ollama for Mac
- Introducing GhatGPT Agent
- Qwen3 Coder
- Kiro: Amazon's Agentic IDE, Kiro.dev
- Gemini Embedding in the API
- Kimi K2
- Grok 4
- Perplexity Comet
- Grok 4 in Zed IDE
- Mistral Voxtral: Speech Recognition models
- Gemma 3n
- Imagen 4
- Andrej Karpathy: Software Is Changing
- Warp 2.0 Agentic Dev
- Gemini CLI, Repo
- WWDC25: Use ChatGPT in Xcode 26
- WWCD25: Apple Foundation Models Framework
- OpenAI o3 Pro
- WWDC25: MLX for Apple Silicon
- Stich by Google: Web/mobile UI vibe coding
- Anthropic Code with Claude live stream
- Jony/Sam AI-Powered Computers
- Jules Agentic Coding
- GitHub Copilot is now open-source
- OpenAI Codex
- Anthropic API: Web Search Tool
- Gemini-powered coding agent
- Windsurf SWE-1 model
- Elevenlabs Soundboard
- Introducing Qwen3
- Llama API
- OpenAI o3 and o4-mini
- OpenAI Codex CLI
- Introducing GPT-4.1 in the API
- GPT-4.1 Prompting Guide
- AI in Enterprise: OpenAI
- OpenAI: A Practical Guide to Building Agents
- Firebase Studio
- Identifying and Scaling AI Use Cases
- Google Agent Development Kit
- Gemini Cookbook
- Vercel AI Chat SDK, Get started
- The Llama 4 herd
- OpenAI Image Generation in ChatGPT
- OpenAI Response API and Agent SDK
- OpenAI.FM
- Gemini 2.5
- DeepSeek-V3-0324
- Manus
- Vapi: Voice AI Agents for Developers
- Gemma 3
- QwQ-32B Reasoning Model
- Introducing LMStudio SDK
- FastHTML and MonsterUI
- Mistral Small 3.1
- Claude Web Search
- Krea AI Video Training
- NotebookLM Mind Maps
- Hunyuan 3D Generation AI
- Stability AI New Virtual Camera
- Gemini Canvas & Audio Overview
-
- OpenAI GPT 4.5
- Claude 3.7 Sonnet and Claude Code
- Google Gemini Code Assist
- Grok 3 Beta
- Hugging Face FastRTC
- Microsoft Phi-4 Multimodal
- ElevenLabs Scribe
- Qwen Chat: Thinking, Web Search, Artifacts, Video
- Alibaba Wan 2.1 AI video
- Perplexity Voice Mode with Grok 3
- Amazon Alexa+
- Mistral Le Chat app
- Anthropic Jailbreaks
- Pika adds Pikaddition
- Google Gemini 2.0 Pro
- Replit free text-to-app
- ByteDance’s AI avatars
- OpenAI’s Deep Research
- HuggingFace AI App Store
- 12 Days of OpenAI
- Day 1: o1 and ChatGPT Pro
- Day 2: Reinforcement Fine-Tuning
- Day 3: Sora
- Day 4: ChatGPT Canvas
- Day 5: Apple Intelligence
- Day 6: Advanced voice with video & Santa mode
- Day 7: Projects in ChatGPT
- Day 8: ChatGPT Search
- Day 9: OpenAI o1 and new tools for developers
- Day 10: 1-800-CHATGPT
- Day 11: Work with apps
- Day 12: o3 Preview
- Gemini 2.0 Grok Image Generation Release
- Ollama Structured Outputs
- Llama 3.3
- ElevenLabs: Build AI Agents That Speak
- PydanticAI
- Introducing Amazon Nova Models
- DeepSeek V3