or
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
San Francisco, California, United States
Palo Alto, CA
Menlo Park, California
Durham, NC
Senior Machine Learning Engineer @ Cruise | Google | Red Hat
Ananth Nagaraj
In 5 days, we’re going to showcase how we're solving one of voice AI's biggest challenges - bridging the gap between what customers say and how they say it. Gnani.ai teams are going to be attending NVIDIA's GTC Conference in San Jose, California on March 17 to exhibit our speech-to-speech AI models. As one of the first few companies building these models globally, our partnership with NVIDIA to use their innovative platform to solve critical voice AI challenges is crucial and groundbreaking. Why we’ve been invited to speak: The most challenging piece about voice AI agents is that when you start using them, you realize the software pipeline only looks into the text behind the conversations. As we were implementing voice AI agents for multiple customers, we realized we were missing essential elements: the pitch, the tone, and the way customers react to certain words the bot was saying. So, in addition to solving the classic LLM problems of latency and hallucinations, which continue to be big challenges, we also needed to make the voice AI agents’ conversations more authentic. To do this, we decided to build speech-to-speech AI models for real-time performance, from voice recognition to response generation. Our software stack makes customer interactions fast, natural, and context-aware. This idea resonated with NVIDIA. That’s why we’re building these models on broader NVIDIA architecture. What we’ll be presenting: At GTC, we’ll be sharing the specific innovations we've invented and what this really means for not just voice AI agents, but the voice AI industry as a whole. We’ll also reveal how our solution: - Is trained on over 14 million hours of multilingual conversational data - Has out-of-the-box support for 14 languages - Is equipped with accurate, context-rich responses that adapt to real-time interactions - Transforms raw data into strategic customer intelligence - Provides insights that elevate enterprise decision-making - Incorporates industry-leading software like NVIDIA TensorRT-LLM, NVIDIA Triton and NVIDIA Riva The key features we're presenting are about how we’re able to merge multiple elements in the pipeline of a voice AI agent so that we can reduce the latency to 250ms--making interactions human-like. We’ll share the algorithms we used and the pipeline we used to make this possible, as well as the broader NVIDIA architecture that made it possible to design these voice agent flows for customers. It’s going to be great. See you there! Ganesh Gopalan Bharath Shankar Thoshith S Avinash Benki John Philip NVIDIA NVIDIA GTC
Clem Delangue 🤗
As Jensen mentioned with Brad Gerstner & Bill Gurley, something that few people know is that NVIDIA is becoming the American open-source leader in AI, with over 400 contributions of models, datasets and apps on Hugging Face in the 12 months. And I have a feeling they're just getting started!
Ameer Haj Ali, PhD
Unpopular opinion: The RL hype is a red flag for technical depth. When I hear founders or VCs casually dropping "reinforcement learning" in pitches, it's usually code for "we don't actually understand our problem space." Here's the reality check: RL only works when you have: - Perfect simulation environments - Extremely fast iteration cycles - Clear reward functions - Tolerance for massive compute costs That's why it succeeds in games (perfect simulation) and search optimization (Massive compute capacity at Deepmind AlphaGo/Anthropic/OpenAI) but fails spectacularly in the messy, slow-feedback real world. The graveyard of "RL-powered" startups tells the story. They burned through $$$$ trying to apply algorithms to problems that needed simple heuristics or supervised learning. Most of the time, a well-tuned evolutionary algorithm will outperform your fancy RL setup while using <1/10th the compute and delivering results you can actually explain to stakeholders. The next time someone pitches you an RL solution, ask them: "Have you tried solving this with a basic optimization approach first?" Watch how quickly the conversation changes. I am surprised when people are surprised that Direct Preference Optimization (DPO) can exceed PPO-based RLHF [1] or Reflective Prompt Evolution Can Outperform Reinforcement Learning [2]. [1] https://lnkd.in/gCY3bcTQ [2] https://lnkd.in/gQRwyQyM
Anish Acharya
the last few weeks feels like a mini breakthrough for AI apps hiding in plain sight - partial autonomy, high agency models, and context eng vs prompt eng. are all incredibly important concepts a lot of these principles were apparent but were just recently articulated (mostly by Karpathy - legend!!). one is the concept of partial autonomy apps - autonomy sliders, app specific UI, and “keeping the AI on a leash” two is the idea of small, high agency models that know how to solve problems vs knowing answers via tool-use / reasoning / multimodality (“maximally sacrifices encyclopedic knowledge for capability”) three is context engineering vs prompt engineering - putting the right stuff into working memory, building around the “jagged intelligence” of models etc these should probably be the starting point for founders thinking through AI native apps Refs: Partial autonomy apps: [Karpathy @YC startup school] High agency mini models: https://lnkd.in/gthCXtMe Context Eng: https://lnkd.in/gha2i7BX
Alessandro Duico
Nano Banana Hackathon hosted by Google DeepMind and Cerebral Valley in SF, with fal and ElevenLabs. The bar was incredibly high. Believe it or not, the 1st prize winners (Banana Peel) built a social network for editing photos through threaded comments in just a few hours, with Nano Banana generating images live. • What’s Nano Banana? Google’s advanced image generation and editing model (officially Gemini 2.5 Flash Image), delivering top-tier quality at a low cost: just $0.039 per image. • High-Speed Outputs: Not only affordable, it’s also blazing fast—generating 1024x1024 images in ~2.3 seconds on cloud, using 2.1GB GPU memory with 15% better energy efficiency than competitors. Expect it to run on TPU-powered Android phones soon. • Multimodal: Built on Multimodal Diffusion Transformer (MMDiT), it processes text prompts and multiple images together in a single workflow, similar to gpt-image-1 (chatGPT’s image generation). This enables seamless image merges (e.g., blending a person, clothes, and scene), style transfers, and consistent character details with accurate lighting and perspective. Try it out for free on Google AI Studio, or as a third-party model on fal. Ivan Porollo Paige Bailey Burkay Gur
Find curated posts and insights for relevant topics all in one place.
Agree & Join LinkedIn