Very ML
State-of-the-art Machine Learning News Feed
/r/MachineLearning
последний пост 1 час назад
[D] Best papers of 2025
[D] Best papers of 2025

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 час назад @ reddit.com
[D] Best survey papers of 2025?
[D] Best survey papers of 2025?

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 час назад @ reddit.com
[R] 92.86% CIFAR-100 in <10 minutes using Analytical Manifold Dilation (AMD)
[R] 92.86% CIFAR-100 in <10 minutes using Analytical Manifold Dilation (AMD)

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

12 часов назад @ reddit.com
[R] Octonion Bitnet with fused Triton kernels
[R] Octonion Bitnet with fused Triton kernels

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

13 часов назад @ reddit.com
[D] The Intelligence-Energy Bound: Thermodynamic framework for AI scaling limits (feedback requested)
[D] The Intelligence-Energy Bound: Thermodynamic framework for AI scaling limits (feedback requested)

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

14 часов назад @ reddit.com
[D] Feedback or Collaboration on Machine Learning Simulations?
[D] Feedback or Collaboration on Machine Learning Simulations?

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 1 hour назад @ reddit.com
[D] Is Implementing Machine Learning Algorithms from Scratch Still Worth It for Newers?
[D] Is Implementing Machine Learning Algorithms from Scratch Still Worth It for Newers?

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 4 hours назад @ reddit.com
[P] The Story Of Topcat (So Far)
[P] The Story Of Topcat (So Far)

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 5 hours назад @ reddit.com
[P] SIID: A scale invariant pixel-space diffusion model; trained on 64x64 MNIST, generates readable 1024x1024 digits for arbitrary ratios with minimal deformities (25M parameters)
[P] SIID: A scale invariant pixel-space diffusion model; trained on 64x64 MNIST, generates readable 1024x1024 digits for arbitrary ratios with minimal deformities (25M parameters) [P] SIID: A scale invariant pixel-space diffusion model; trained on 64x64 MNIST, generates readable 1024x1024 digits for arbitrary ratios with minimal deformities (25M parameters)

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 7 hours назад @ reddit.com
[D] Any success with literature review tools?
[D] Any success with literature review tools?

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 8 hours назад @ reddit.com
[D]2025 Year in Review: The old methods quietly solving problems the new ones can't
[D]2025 Year in Review: The old methods quietly solving problems the new ones can't

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 9 hours назад @ reddit.com
[P] How I built the edit model behind Tab completion for a coding agent
[P] How I built the edit model behind Tab completion for a coding agent [P] How I built the edit model behind Tab completion for a coding agent

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 11 hours назад @ reddit.com
[D] Paper Accepted Then Rejected: Can We Use Sky Sports Commentary Videos for Research? Need Advice
[D] Paper Accepted Then Rejected: Can We Use Sky Sports Commentary Videos for Research? Need Advice

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 14 hours назад @ reddit.com
[D] ML coding interview experience review
[D] ML coding interview experience review

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 15 hours назад @ reddit.com
[P] PixelBank - Leetcode for ML
[P] PixelBank - Leetcode for ML

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 21 hours назад @ reddit.com
Towards Data Science
последний пост 7 часов назад
Keeping Probabilities Honest: The Jacobian Adjustment
Keeping Probabilities Honest: The Jacobian Adjustment Keeping Probabilities Honest: The Jacobian Adjustment

After transformation, this maps to an interval around \( y = x^2 \) with width\( \Delta y \approx \left| g'(x) \right| \Delta x = |2x| \Delta x \).

After transformation y = g(x), this interval maps to an interval around y with width\( \Delta y \approx \left| g'(x) \right| \Delta x \).

Original X histogram, zoomed on small values (X < 1), with equal small intervals of width 0.1 — to show the source of compression.

Original X histogram for larger values (X > 1), with equal intervals of width 1 — to show the source of stretching.

It is a direct application of the Probability Integral Transform (PIT):If \( Y = F_X(X) \) and X is continuous, then Y ~ Uniform\([0,1]\).

7 часов назад @ towardsdatascience.com
Why MAP and MRR Fail for Search Ranking (and What to Use Instead)
Why MAP and MRR Fail for Search Ranking (and What to Use Instead) Why MAP and MRR Fail for Search Ranking (and What to Use Instead)

often use Mean Reciprocal Rank (MRR) and Mean Average Precision (MAP) to assess the quality of their rankings.

In this post, we will discuss why \(MAP\) and \(MRR\) poorly aligned with modern user behavior in search ranking.

Mean Reciprocal Rank (MRR)Mean Reciprocal Rank (\(MRR\)) is the average rank where the first relevant item occurs.

If the model ranks relevant items later, Precision@k reduces due to a larger kAveraging the Precisions: We average the precisions over the total number of relevant items.

Why MAP and MRR are Bad for Search RankingNow that we have covered the definitions, let’s understand why \(MAP\) and \(MRR\) are not used for search results ranking.

9 часов назад @ towardsdatascience.com
The Machine Learning “Advent Calendar” Day 24: Transformers for Text in Excel
The Machine Learning “Advent Calendar” Day 24: Transformers for Text in Excel The Machine Learning “Advent Calendar” Day 24: Transformers for Text in Excel

In this article, we will focus on one core idea only: how the attention matrix transforms input embeddings into something more meaningful.

Transformers in Excel – all images by authorAt the input level, we deliberately use the same embedding for the word “mouse” in both cases.

2.4 Interpreting the attention matrixThe attention matrix is the central object of self-attention.

2.5 From attention weights to output embeddingsThe attention matrix itself is not the final result.

They learn:how to compare wordshow to route informationhow to project meaning into different spacesThe attention matrix controls where information flows.

1 day, 2 hours назад @ towardsdatascience.com
Is Your Model Time-Blind? The Case for Cyclical Feature Encoding
Is Your Model Time-Blind? The Case for Cyclical Feature Encoding Is Your Model Time-Blind? The Case for Cyclical Feature Encoding

Cyclical encoding prevents fake “long distances” at boundaries.

Cyclical encoding removes sharp discontinuities at midnight.

Cyclical encoding gives them a head start.

If yes, consider cyclical encoding.

Cyclical encoding with sine and cosine fixes this elegantly, preserving proximity, reducing artifacts, and helping models learn faster.

1 day, 7 hours назад @ towardsdatascience.com
4 Techniques to Optimize AI Coding Efficiency
4 Techniques to Optimize AI Coding Efficiency 4 Techniques to Optimize AI Coding Efficiency

In a previous article, I of the most important techniques I utilize to code effectively with AI agents.

If you’re not coding using AI agents, you’re falling behind.

Why you should code with AI agentsI’ve previously described how coding with AI agents make me a lot more effective as a programmer.

Another important point here is that you give your coding agents enough permissions to run for an extended period of time.

ConclusionIn this article, I’ve discussed four specific techniques I use every single day when I’m coding.

1 day, 9 hours назад @ towardsdatascience.com
Bonferroni vs. Benjamini-Hochberg: Choosing Your P-Value Correction
Bonferroni vs. Benjamini-Hochberg: Choosing Your P-Value Correction Bonferroni vs. Benjamini-Hochberg: Choosing Your P-Value Correction

What if we observe a rare background decay rate, a decay rate that simply resembles that of an undiscovered decaying particle?

We can use the:Family Wise Error Rate (FWER) and the Bonferroni correctionFalse Discovery Rate (FDR) and the Benjamini-Hochberg procedureThese are not interchangeable!

\[\text{Pr}(A_1 \cup A_2 \cup \cdots \cup A_k) \le \sum_{i=1}^{k} \text{Pr}(A_i)\]False Discovery Rate (Benjamini-Hochberg)The Benjamini-Hochberg procedure also isn’t too complicated.

Accept the first k where pₖ ​> α/(m−k+1)In this approach, the goal is to control the false discovery rate (FDR).

Positives, H. M. F. Multiple Comparisons: Bonferroni Corrections and False Discovery Rates.

1 day, 10 hours назад @ towardsdatascience.com
The Machine Learning “Advent Calendar” Day 23: CNN in Excel
The Machine Learning “Advent Calendar” Day 23: CNN in Excel The Machine Learning “Advent Calendar” Day 23: CNN in Excel

From the model’s point of view, “not good” and “good not” looked exactly the same.

Building a 1D CNN for text in ExcelIn this article, we build a 1D CNN architecture in Excel with the following components:Embedding dictionaryWe use a 2-dimensional embedding.

1D CNN in Excel – all images by authorThis pipeline corresponds to a standard CNN text classifier.

This makes every step visible and easy to understand in Excel, while keeping the logic identical to deeper CNN architectures.

We will only use a dictionary of three words : good, bad and not.

1 day, 20 hours назад @ towardsdatascience.com
How Agents Plan Tasks with To-Do Lists
How Agents Plan Tasks with To-Do Lists How Agents Plan Tasks with To-Do Lists

To-do lists help agents plan and track complex tasks more effectively, making them especially useful for multi-tool coordination and long-running operations where progress needs to be visible.

(2) Key Components of To-Do CapabilitiesA planning agent’s to-do list management capabilities boil down to these four key components:To-do task item List of to-do items A tool that writes and updates the to-do list To-do system prompt updateThe TodoListMiddleware brings these elements together to enable an agent’s to-do list capabilities.

Let’s take a closer look at each component and how it is implemented in the to-do middleware code.

(2.1) To-do task itemA to-do item is the smallest unit in a to-do …

2 days, 7 hours назад @ towardsdatascience.com
Stop Retraining Blindly: Use PSI to Build a Smarter Monitoring Pipeline
Stop Retraining Blindly: Use PSI to Build a Smarter Monitoring Pipeline Stop Retraining Blindly: Use PSI to Build a Smarter Monitoring Pipeline

PSI: The Data Smoke DetectorThe Population Stability Index (PSI) is a classic tool.

PSI checks if your model’s current data has changed too much compared to the data used to build it.

Train Regression Model model = LinearRegression().fit(X_ref, y_ref)Now, let’s generate some drifted data.

Calculate PSI for the drifted feature for v in df.columns[:-1]: psi_value= psi(X_ref[v], X_new[v]) print(f"PSI Score for Feature {v}: {psi_value:.4f}")PSI Score for Feature var1: 2.3016 PSI Score for Feature var2: 0.0546 PSI Score for Feature var3: 0.1078And, finally, let us check the impact it has on the estimated y .

Before You GoWe saw how simple it is to calculate PSI, and how it can show us where the …

2 days, 9 hours назад @ towardsdatascience.com
Synergy in Clicks: Harsanyi Dividends for E-Commerce
Synergy in Clicks: Harsanyi Dividends for E-Commerce Synergy in Clicks: Harsanyi Dividends for E-Commerce

To do so, he decides to calculate Harsanyi Dividends to see which group collaborated the most effectively.

For convenience, here are their individual aggregated scores mentioned earlier:v(a) = 10v(b) = 12v(c) = 18Can you guess what their Harsanyi Dividends are?

Parallel programming can significantly reduce the time required to compute Harsanyi dividends when working with large datasets.

Given the context in which one would likely use Harsanyi Dividends, individual players aren’t invaluable, but they are practical in that context.

Image provided by the authorThese are the multiple-player coalitions with the largest Harsanyi Dividends; in other words, the players who generate the most synergy…

2 days, 10 hours назад @ towardsdatascience.com
The Machine Learning “Advent Calendar” Day 22: Embeddings in Excel
The Machine Learning “Advent Calendar” Day 22: Embeddings in Excel The Machine Learning “Advent Calendar” Day 22: Embeddings in Excel

In this article, we focus on text embeddings, explain their role in the machine learning landscape, and show how they can be understood and explored in Excel.

All the previous articles deal with classic machine learning, that can be described in two complementary ways.

From an optimization point of view, deep learning does not introduce a new learning rule.

CNN in Excel – all images by authorFor images, the challenge is not to make the data numerical, but to extract meaningful representations from already numerical data.

2.2 Supervised embeddingsIn supervised learning, embeddings are learned as part of a prediction task.

2 days, 23 hours назад @ towardsdatascience.com
The Machine Learning “Advent Calendar” Day 21: Gradient Boosted Decision Tree Regressor in Excel
The Machine Learning “Advent Calendar” Day 21: Gradient Boosted Decision Tree Regressor in Excel The Machine Learning “Advent Calendar” Day 21: Gradient Boosted Decision Tree Regressor in Excel

previous article, we introduced the core mechanism of Gradient Boosting through Gradient Boosted Linear Regression.

Gradient Boosting AlgorithmThe Gradient Boosting algorithm follows a simple and repetitive structure.

Datasets for Gradient Boosted Decision Tree Regressor – all image by author2.3 InitializationThe Gradient Boosting process starts with a constant model.

3.3 General comparison with other modelsCompared to a single decision tree, Gradient Boosted Trees produce smoother predictions, reduce overfitting, and improve generalization.

These combined properties explain why Gradient Boosted Decision Tree Regressors perform so well across a wide range of real-world applications.

2 days, 23 hours назад @ towardsdatascience.com
The Machine Learning “Advent Calendar” Day 20: Gradient Boosted Linear Regression in Excel
The Machine Learning “Advent Calendar” Day 20: Gradient Boosted Linear Regression in Excel The Machine Learning “Advent Calendar” Day 20: Gradient Boosted Linear Regression in Excel

This is probably the first time you have heard about applying Gradient Boosted Linear Regression.

Fitting Linear Regression to Residuals: We fit a new base model (here, a linear regression) to these residuals.

Gradient Boosted Linear Regression \ Simple dataset with linear regression — Image by author3.2 Gradient Boosting algorithmThe implementation of these formulas is straightforward in Google Sheet or Excel.

Secondly, when adding a linear regression to another linear regression, it is still a linear regression.

Gradient Boosted Linear Regression, on the other hand, collapses back to a single linear model and provides no additional information about uncertainty.

3 days назад @ towardsdatascience.com
ChatLLM Presents a Streamlined Solution to Addressing the Real Bottleneck in AI
ChatLLM Presents a Streamlined Solution to Addressing the Real Bottleneck in AI ChatLLM Presents a Streamlined Solution to Addressing the Real Bottleneck in AI

The hidden tax of cognitive overheadOne of the least-discussed costs of today’s AI workflows isn’t money or performance.

Every additional tool, model choice, pricing tier, and interface introduces a small decision.

Cost is back in the conversationAs AI workflows become more multimodal, the economics start to matter again.

Without context, AI is clever but shallow.

With context, AI can feel genuinely useful.

3 days, 3 hours назад @ towardsdatascience.com
The Geometry of Laziness: What Angles Reveal About AI Hallucinations
The Geometry of Laziness: What Angles Reveal About AI Hallucinations The Geometry of Laziness: What Angles Reveal About AI Hallucinations

Figure 1: Semantic geometry of grounding.

This clustering is how the embedding models are trained.

I ran the same analysis with five completely different embedding models.

Correlation between the different models and architectures used in the experiment.

It’s consistent across embedding models.

3 days, 7 hours назад @ towardsdatascience.com
Distill.pub Distill.pub
последний пост None
TheSequence TheSequence
последний пост 10 часов назад
The Sequence Opinion #778: After Scaling: The Era of Research and New Recipes for Frontier AI
The Sequence Opinion #778: After Scaling: The Era of Research and New Recipes for Frontier AI The Sequence Opinion #778: After Scaling: The Era of Research and New Recipes for Frontier AI

Created Using GPT-5For the last few years, AI progress has felt almost… procedural.

That mood is what Ilya Sutskever calls the “age of scaling”—a period where one word (“scaling”) basically told an entire industry what to do next.

What are the techniques that convert compute into genuine generalization—models that learn faster, adapt better, and make fewer weird mistakes?

Below is a map of the most promising technique-clusters that could plausibly unlock the next wave of frontier innovation.

It’s not a list of “one weird trick.” It’s more like a toolbox for the post-pretraining world.

10 часов назад @ thesequence.substack.com
The Sequence AI of the Week #777: Thinking Fast, Thinking Cheap: Thinking Fast, Thinking Cheap: The Nemotron 3 Blueprint
The Sequence AI of the Week #777: Thinking Fast, Thinking Cheap: Thinking Fast, Thinking Cheap: The Nemotron 3 Blueprint The Sequence AI of the Week #777: Thinking Fast, Thinking Cheap: Thinking Fast, Thinking Cheap: The Nemotron 3 Blueprint

Last week, NVIDIA quietly redefined the baseline for open-weight intelligence with the release of the Nemotron 3 family.

The headline isn’t just the benchmarks (though the Nemotron 3 Nano creates a new state-of-the-art for the 30B parameter class); it is the architecture.

We are seeing the first major industrial-scale deployment of a Hybrid Mamba-Transformer Mixture-of-Experts (MoE).

Image Credit: NVIDIAFor the enterprise, this is a signal that the “brute force” era of dense Transformers is ending.

The Architecture: A Hybrid “Frankenstein” (in the best way)

1 day, 10 hours назад @ thesequence.substack.com
The Sequence Knwoledge #776: Fake It 'Til You Make It: How RL is Perfecting Synthetic Data.
The Sequence Knwoledge #776: Fake It 'Til You Make It: How RL is Perfecting Synthetic Data. The Sequence Knwoledge #776: Fake It 'Til You Make It: How RL is Perfecting Synthetic Data.

Created Using Gemini 3Today we will Discuss:The idea of using reinforcement learning(RL) environments to generate synthetic data.

The famous Reflexion paper about improving AI agents using RL data generation.

💡 AI Concept of the Day: Synthetic Data Generation with RL EnvironmentsWhen real-world data is scarce or privacy-restricted, reinforcement learning (RL) environments become a force multiplier for synthetic data.

This is especially potent for domains where outcomes are verifiable but logs are limited (coding sandboxes, web automation, spreadsheets/SQL, robotics-in-sim).

By executing tasks rather than describing them, RL pipelines mint trajectories that teach models how to act under cons…

2 days, 10 hours назад @ thesequence.substack.com
The Sequence Radar #775: Last Week in AI: Tokens, Throughput, and Trillions
The Sequence Radar #775: Last Week in AI: Tokens, Throughput, and Trillions The Sequence Radar #775: Last Week in AI: Tokens, Throughput, and Trillions

In the AI of the week edition, we discuss NVIDIA’s amazing Nemotron 3 release.

Subscribe and don’t miss out:📝 Editorial: Last Week in AI: Tokens, Throughput, and TrillionsThis week’s AI story didn’t arrive as one dramatic demo; it arrived as a synchronized upgrade across the stack—capital, platforms, and product surfaces all moving in lockstep.

AI Lab: Tongyi Lab (Alibaba Group)Summary: QwenLong-L1.5 proposes an end-to-end post-training recipe for long-context reasoning, combining a scalable synthesis pipeline for multi-hop, globally-grounded tasks with stabilized long-context RL (including task-balanced sampling, task-specific advantage estimation, and AEPO).

AI Lab: NVIDIASummary: This wh…

4 days, 10 hours назад @ thesequence.substack.com
The Sequence Opinion #774: Everything You Need to Know About Audio AI Frontier Models
The Sequence Opinion #774: Everything You Need to Know About Audio AI Frontier Models The Sequence Opinion #774: Everything You Need to Know About Audio AI Frontier Models

From speech recognition and voice synthesis to music generation and environmental sound analysis, frontier AI models for audio are tackling challenges unique to sound.

The goal is to provide a comprehensive, engaging overview of cutting-edge audio AI for a technically savvy audience.

Audio’s Unique Challenges and OpportunitiesAudio data is fundamentally different from text or images, presenting unique challenges for AI models.

This means even a few seconds of audio involve very long sequences of data points.

Unlike text (which has discrete tokens like words) or images (2D grids of pixels), raw audio is high-frequency and high-dimension.

1 week назад @ thesequence.substack.com
The Sequence AI of the Week #773: The Week Google Turned Gemini Into an Agent Runtime
The Sequence AI of the Week #773: The Week Google Turned Gemini Into an Agent Runtime The Sequence AI of the Week #773: The Week Google Turned Gemini Into an Agent Runtime

Created Using GPT 5.2Google’s agentic releases last week weren’t “yet another model drop.” They were Google quietly shipping an agent runtime—and then immediately proving it works by dropping a managed research agent that basically demands that runtime to function.

On December 11, 2025, the Gemini API changelog logged two entries that, together, mark a clean architectural pivot: the Interactions API (Beta) and the Gemini Deep Research Agent (Preview).

If you build agents for a living, you already know the pattern: most of your complexity isn’t “getting the model to answer.” It’s dealing with the messy reality that real agent workflows are stateful, tool-heavy, and often long-running.

You bu…

1 week, 1 day назад @ thesequence.substack.com
The Sequence Knowledge #772: Generate Data Using Multiturn Data Synthesis
The Sequence Knowledge #772: Generate Data Using Multiturn Data Synthesis The Sequence Knowledge #772: Generate Data Using Multiturn Data Synthesis

Created Using GPT-5.2Today we will Discuss:An introduction to multiturn data synthesis for data generation.

A review of the famous Reflexion paper that uses synthetic data to improve AI agents.

💡 AI Concept of the Day: What is Multiturn Data Synthesis?

Multi-turn synthesis and self-play are other important categories in synthetic data generation .

These methods treat data generation as an interactive process rather than a single shot.

1 week, 2 days назад @ thesequence.substack.com
The Sequence Radar #771: Last Week in AI: GPT-5.2, Mistral, and Google’s Agent Stack
The Sequence Radar #771: Last Week in AI: GPT-5.2, Mistral, and Google’s Agent Stack The Sequence Radar #771: Last Week in AI: GPT-5.2, Mistral, and Google’s Agent Stack

Created using GPT-5Next Week in The Sequence:Learn more about synthetic data generation with a deep dive into multi-turn data synthetic.

Our AI of the Week section dives into Google’s new agentic releases.

AI Lab: Carnegie Mellon UniversitySummary: The authors build a controlled synthetic reasoning framework to disentangle how pre-training, mid-training, and RL each contribute to reasoning generalization in language models.

Gemini Deep Research AgentGoogle released a new Deep Research agent with advanced tool capabilities.

FACTS BenchmarkGoogle DeepMind released the FACTS Benchmark Suite, three benchmarks to evaluate factuality in AI models.

1 week, 4 days назад @ thesequence.substack.com
The Sequence Opinion #770: The Post-GPU Era: Why AI Needs a New Kind of Computer
The Sequence Opinion #770: The Post-GPU Era: Why AI Needs a New Kind of Computer The Sequence Opinion #770: The Post-GPU Era: Why AI Needs a New Kind of Computer

What got me thinking about this idea was the announcement of Unconventional AI which raised a considerable amount of money of work precisesly on this problem.

Recent events underscore this concern: a new startup called Unconventional AI made headlines by raising an unprecedented $475 million seed round to develop radically new computing hardware for AI.

The human brain performs extraordinary feats on only ~20 watts of power, whereas training a single large AI model can devour megawatt-hours.

The sheer gap suggests that AI might require a new form of computing to continue its trajectory.

The Reign of Matrix Multiplications and GPUs

2 weeks назад @ thesequence.substack.com
The Sequence AI of the Week #769: Inside Gemini Deep Think
The Sequence AI of the Week #769: Inside Gemini Deep Think The Sequence AI of the Week #769: Inside Gemini Deep Think

Created Using GPT-5Gemini Deep Think is one of the most innovative architectures of recent times and, yet, we know so little about it.

Today, I would like to summarize some of the things I learned about Deep Think.

Gemini DeepThink made news when it score a gold medal at the 2025 international math olympiad using a parallel technique over the standard Gemini model.

It embodies the current frontier idea that how a model uses its compute at inference time matters as much as raw parameter count.

From chain-of-thought hacks to “thinking models”

2 weeks, 1 day назад @ thesequence.substack.com
The Sequence Knowledge #768: Using Rephrasing for Synthetic Data Generation
The Sequence Knowledge #768: Using Rephrasing for Synthetic Data Generation The Sequence Knowledge #768: Using Rephrasing for Synthetic Data Generation

Created Using GPT-5Today we will Discuss:Understanding the different types of rephrasing methods for synthetic data generation.

Diving inside Microsoft’s Evol-Instruct method to create highly sophisticated synthetic instruction datasets.

💡 AI Concept of the Day: Understanding the Types of Rephrasing Methods for Synthetic Data GenerationRephrasing is the most reliable way to expand a labeled dataset without changing its ground truth.

In language tasks this means paraphrasing instructions, questions, or rationales; in code it means altering comments, identifiers, or scaffolding while keeping unit tests green; in multimodal alignment it means rewriting captions or prompts without altering the …

2 weeks, 2 days назад @ thesequence.substack.com
The Sequence Radar #767: Last Week in AI: Google Logic, Amazon Utility, and Mistral Efficiency
The Sequence Radar #767: Last Week in AI: Google Logic, Amazon Utility, and Mistral Efficiency The Sequence Radar #767: Last Week in AI: Google Logic, Amazon Utility, and Mistral Efficiency

Subscribe and don’t miss out:📝 Editorial: Last Week in AI: Google Logic, Amazon Utility, and Mistral EfficiencyThe focus of model development shifted noticeably this week.

The most technically significant release is Gemini 3 Deep Think.

Instead of the standard immediate next-token prediction, Deep Think utilizes a “parallel thinking” process.

🤖 AI Tech ReleasesGemini 3 Deep ThinkGoogle released Gemini 3 Deep Think, its innovative reasoning models that scored gold medals in the recent international math olympiad.

Mistral 3Mistral released Mistral 3, which includes 3 small models (14B, 8B, and 3B) and Mistral Large 3.

2 weeks, 4 days назад @ thesequence.substack.com
The Sequence Opinion #766:Why Agents Need a “Headless” Internet
The Sequence Opinion #766:Why Agents Need a “Headless” Internet The Sequence Opinion #766:Why Agents Need a “Headless” Internet

Created Using Gemini 3Today’s installment discusses a topic that I have spent a lot of time thinking about recently.

Do we need to reimagine the web as it is for AI agents?

This idea is not as crazy as it might sound and there are already solid efforts in the space.

However, there are also plenty of challenges.

The Bifurcation of the Web

3 weeks назад @ thesequence.substack.com
The Sequence AI of the Week #765: Diving into Claude Opus 4.5
The Sequence AI of the Week #765: Diving into Claude Opus 4.5 The Sequence AI of the Week #765: Diving into Claude Opus 4.5

Created Using GPT-5Today, we are going to dive into the hottest AI release of last week.

Claude Opus 4.5 is Anthropic’s new flagship model in the Claude 4.5 family, and it’s very clearly optimized around a single thesis: large language models are no longer just chatbots, they’re operating systems for agents.

At the core, Opus 4.5 is a large decoder-only transformer trained with next-token prediction on a broad mixture of internet text, code, documents, and synthetic data, continuing the Claude lineage.

Anthropic doesn’t publish layer counts or parameter numbers, but from its behavior and public documentation it’s clear that the model combines high capacity with careful optimization for long…

3 weeks, 1 day назад @ thesequence.substack.com
The Sequence Knowledge #764: Wanna do Synthetic Data? Learn About Rephrasing
The Sequence Knowledge #764: Wanna do Synthetic Data? Learn About Rephrasing The Sequence Knowledge #764: Wanna do Synthetic Data? Learn About Rephrasing

Created Using Gemini 3Today we will Discuss:An introduction to rephrasing methods for synthetic data generation.

A review of HuggingFace’s Cosmopedia synthetically generated dataset.

💡 AI Concept of the Day: An Introduction to RephrasingRephrasing is the workhorse of synthetic data generation: you start with a correctly labeled seed example and produce semantically equivalent variants that preserve the label while expanding coverage.

For text, that means rewriting an instruction, query, or rationale without changing its truth conditions; for code, it’s altering comments, variable names, or scaffolding without affecting behavior; for multimodal tasks, it can be caption restyling or prompt re…

3 weeks, 2 days назад @ thesequence.substack.com
📓 Cool Blogs
ODS.ai Habr ODS.ai Habr
последний пост 3 months, 1 week назад
SWE-MERA — новый динамический бенчмарк для моделей агентной генерации кода
SWE-MERA — новый динамический бенчмарк для моделей агентной генерации кода SWE-MERA — новый динамический бенчмарк для моделей агентной генерации кода

Однако все задачи в MERA CODE, как впрочем и в SWE-bench и других бенчмарках подобного назначения, следуют классической парадигме, когда у нас есть фиксированный обучающий набор данных и, что более важно, фиксированный проверочный набор.

Но большие языковые модели для кодинга, которые мы и пытаемся оценивать нашим набором, также учатся на GitHub – со времен еще первой модели LLaMa.

Кажется, что 700 задач немного, но это уже очень приличное количество, и что самое важное — это новые задачи.

Current behavior: from sympy import ask, Q, Symbol x = Symbol('x') print(ask(Q.finite(x**-1), Q.real(x))) # Output: True Expected behavior: The function should return None to indicate uncertainty, as x**-…

3 months, 1 week назад @ habr.com
DRAGON: динамический бенчмарк для оценки RAG-систем на русском языке
DRAGON: динамический бенчмарк для оценки RAG-систем на русском языке DRAGON: динамический бенчмарк для оценки RAG-систем на русском языке

Ответ: Кэисукэ ТибаSPARQL-запрос SimpleSELECT DISTINCT ?s ?r ?o WHERE { { SELECT ?s ?r ?o WHERE { ?s ?r ?o . }

GROUP BY ?s ?r HAVING(count(?o) = 1) } { SELECT ?s ?r ?o WHERE { ?s ?r ?o . }

Ответ: Национальная система платежных карт (НСПК) Центр биометрических технологий (ЦБТ) ЕБСSELECT ?s ?r ?o ?len WHERE { { SELECT ?s ?r (COUNT(?o1) as ?len) (GROUP_CONCAT(DISTINCT(STR(?o1));separator="|") AS ?o) WHERE { ?s ?r ?o1 . }

FILTER(?o != ?o1) } GROUP BY ?o ?o1 ?r ?r1 HAVING(COUNT(?s) = 1) } UNION { SELECT ?s ?r ?o ?r1 ?s1 WHERE { ?s ?r ?o .

FILTER(?o != ?o1) } GROUP BY ?o ?o1 ?r ?r1 HAVING(COUNT(?s) = 1) } UNION { SELECT ?s ?r ?o ?r1 ?s1 WHERE { ?s ?r ?o .

5 months назад @ habr.com
RKNN Toolkit2: конвертация моделей и симуляция NPU Rockchip
RKNN Toolkit2: конвертация моделей и симуляция NPU Rockchip RKNN Toolkit2: конвертация моделей и симуляция NPU Rockchip

В этой статье я хочу поделиться своим опытом по конвертации нейросети в формат rknn с помощью библиотеки rknn-toolkit2.

Вот как выглядят веса pytorch модели в Netron:веса pytorch модели в NetronВажно!

Конвертация onnx модели в rknnДалее создается объект RKNN , который управляет процессом конвертации и инференса модели на платформе Rockchip.

На этом этапе происходит подготавка модели к конвертации в формат RKNN и последующему запуску на NPU Rockchip.

Создание и экспорт rknn моделиНа этом этапе происходит конвертация ONNX-модели во внутренний формат RKNN, оптимизация графа и подготовка к запуску на NPU Rockchip.

5 months, 1 week назад @ habr.com
MERA Code: всесторонняя оценка генерации кода в прикладных сценариях
MERA Code: всесторонняя оценка генерации кода в прикладных сценариях MERA Code: всесторонняя оценка генерации кода в прикладных сценариях

🔗MERA Code🔗GitHub с кодом и данными🔗Коллекция на Hugging Face🔗Статья на arxiv🔗Репозиторий проекта на GitVerseЧто такое MERA Code?

Современные кодовые языковые модели и модели общего назначения (ChatGPT, Claude, Qwen, YandexGPT, GigaChat и др.)

Список текущих задач MERA Code и их характеристикКаталог задач MERA Code и их подробное описание представлено на сайте.

В MERA Code промпты строго подобраны под задачу и корректный выбор ответа.

В заключениеMERA Code — это попытка закрыть важный пробел в тестировании LLM: насколько они действительно полезны в реальной, локализованной разработке.

5 months, 1 week назад @ habr.com
Байесовская собака: анализ пёсьего компаса
Байесовская собака: анализ пёсьего компаса Байесовская собака: анализ пёсьего компаса

", подумал я. И, к счастью, у меня как раз под рукой оказался идеальный подопытный.

Стандартное арифметическое среднее между 360° и 0° даст нам 180°, несмотря на то, что и 360°, и 0° указывают в одном направлении.

Нулевая гипотеза утверждает, что данные распределены равномерно по кругу, альтернативная — что это не так.

from pingouin import circ_vtest v, pval = circ_vtest(data['radians'], dir=np.pi) print(f"V-statistics: {v:.3f}; p-value: {pval:.6f}")>> V-statistics: 24.127; p-value: 0.002904Вот мы и подобрались к чему-то интересному!

Априорное распределение и функция правдоподобияПредположим, что у нас есть:Априорное распределение с параметрамиФункция правдоподобия для нового наблюдения с п…

9 months назад @ habr.com
Machine Learning Mastery
последний пост 3 days, 6 hours назад
3 Smart Ways to Encode Categorical Features for Machine Learning
3 Smart Ways to Encode Categorical Features for Machine Learning 3 Smart Ways to Encode Categorical Features for Machine Learning

Share Post ShareIn this article, you will learn three reliable techniques — ordinal encoding, one-hot encoding, and target (mean) encoding — for turning categorical features into model-ready numbers while preserving their meaning.

Applying target (mean) encoding for high-cardinality features without leaking the target.

fit_transform ( data ) print ( "Original Data:" , data .

The answer lies in Target Encoding, also frequently called Mean Encoding.

Target (Mean) Encoding: This is the powerful answer for high-cardinality features that would overwhelm OHE.

3 days, 6 hours назад @ machinelearningmastery.com
Pretraining a Llama Model on Your Local GPU
Pretraining a Llama Model on Your Local GPU Pretraining a Llama Model on Your Local GPU

Once trained, you can load it back with the following code:from tokenizers import Tokenizer tokenizer = Tokenizer.from_file("bpe_50k.json") 1 2 3 from tokenizers import Tokenizer tokenizer = Tokenizer .

tokenizer = tokenizer self .

device = device self .

save ( { "model" : model .

save ( { "model" : model .

3 days, 18 hours назад @ machinelearningmastery.com
Rotary Position Embeddings for Long Context Length
Rotary Position Embeddings for Long Context Length Rotary Position Embeddings for Long Context Length

dim = dim self .

cat ( ( inv_freq , inv_freq ) , dim = - 1 ) position = torch .

dim = dim self .

pi / inv_freq max_wavelen = base_length / low_factor min_wavelen = base_length / high_factor smooth_factor = ( base_length / wavelen - low_factor ) / ( high_factor - low_factor ) smoothed = ( 1 - smooth_factor ) * inv_freq / scale_factor + smooth_factor * inv_freq inv_freq = torch .

cat ( ( inv_freq , inv_freq ) , dim = - 1 ) position = torch .

5 days, 6 hours назад @ machinelearningmastery.com
How to Fine-Tune a Local Mistral or Llama 3 Model on Your Own Dataset
How to Fine-Tune a Local Mistral or Llama 3 Model on Your Own Dataset How to Fine-Tune a Local Mistral or Llama 3 Model on Your Own Dataset

In this tutorial, we’ll learn how to fine-tune two powerful open-source models, Mistral 7B and Llama 3 8B, using a customer support question-and-answer dataset.

print("Creating customer support Q&A dataset...") # Create realistic customer support data customer_support_data = [ { "instruction": "You are a helpful customer support agent.

decode ( outputs [ 0 ] , skip_special_tokens = True ) # Extract just the response text if "[/INST]" in response : response = response .

strip ( ) elif "assistant" in response : response = response .

strip ( ) elif "### Response:" in response : response = response .

6 days, 13 hours назад @ machinelearningmastery.com
5 Agentic Coding Tips & Tricks
5 Agentic Coding Tips & Tricks 5 Agentic Coding Tips & Tricks

A practical trick is a diff budget, an explicit limit on lines changed per iteration.

startswith ( ( "+++" , "---" ) ) ) changed = count_changed_lines ( diff ) if changed > MAX_CHANGED_LINES : raise ValueError ( f "Diff too large: {changed} changed lines" )For manual workflows, bake the constraint into your prompt:Output only a unified diffHard limit: 120 changed lines totalNo unrelated formatting or refactorsIf you need more, stop and ask for a second patchAgents respond well to constraints that are measurable.

ratelimit import SlidingWindowLimiter def test_allows_n_requests_per_window ( ) : lim = SlidingWindowLimiter ( limit = 3 , window_seconds = 1 ) assert lim .

allow ( "u1" ) assert li…

1 week назад @ machinelearningmastery.com
The Real Cost of Inaction: How Silos Hurt Productivity for Data Scientists (Sponsored)
The Real Cost of Inaction: How Silos Hurt Productivity for Data Scientists (Sponsored) The Real Cost of Inaction: How Silos Hurt Productivity for Data Scientists (Sponsored) 1 week, 1 day назад @ bit.ly
Top 5 Vector Databases for High-Performance LLM Applications
Top 5 Vector Databases for High-Performance LLM Applications Top 5 Vector Databases for High-Performance LLM Applications

Vector databases solve this by storing embeddings and facilitating super-fast similarity searches across billions of vectors.

This article covers the top five vector databases for production LLM applications.

WeaviateWeaviate is an open-source vector database that works well for combining vector search with traditional database capabilities.

If you already use PostgreSQL and would like to explore a vector search extension, you can also consider pgvector.

To learn more about how vector databases work, read The Complete Guide to Vector Databases for Machine Learning.

1 week, 1 day назад @ machinelearningmastery.com
The Machine Learning Engineer’s Checklist: Best Practices for Reliable Models
The Machine Learning Engineer’s Checklist: Best Practices for Reliable Models The Machine Learning Engineer’s Checklist: Best Practices for Reliable Models

Share Post ShareIntroductionBuilding newly trained machine learning models that work is a relatively straightforward endeavor, thanks to mature frameworks and accessible computing power.

Without further ado, here is the list of 10 machine learning engineer best practices I curated for you and your upcoming models to shine at their best in terms of long-term reliability.

Therefore, everything surrounding a machine learning model should be properly versioned.

Continuous Monitoring and ObservabilityThis is probably already in your checklist of best practices, but as an essential of machine learning engineering, it is worth pointing it out.

This article provided a checklist of 10 essential best…

1 week, 2 days назад @ machinelearningmastery.com
Transformer vs LSTM for Time Series: Which Works Better?
Transformer vs LSTM for Time Series: Which Works Better? Transformer vs LSTM for Time Series: Which Works Better?

IntroductionFrom daily weather measurements or traffic sensor readings to stock prices, time series data are present nearly everywhere.

lstm = nn .

transformer = nn .

Linear ( d_model , 1 ) def forward ( self , x ) : x = self .

embed ( x ) x = self .

1 week, 3 days назад @ machinelearningmastery.com
How LLMs Choose Their Words: A Practical Walk-Through of Logits, Softmax and Sampling
How LLMs Choose Their Words: A Practical Walk-Through of Logits, Softmax and Sampling How LLMs Choose Their Words: A Practical Walk-Through of Logits, Softmax and Sampling

This randomness is not a bug but a core feature of how the model samples its next token from a probability distribution.

Softmax transforms these raw scores into a probability distribution.

LLMs don’t always select the token with the highest probability; instead, they sample from the probability distribution to produce a different output each time.

Top-$p$ sampling (also known as nucleus sampling) addresses this issue by sampling tokens according to their cumulative probability rather than a fixed count.

You learned to select different values for the temperature, top-$k$, and top-$p$ sampling parameters for different LLM use cases.

1 week, 5 days назад @ machinelearningmastery.com
Fine-Tuning a BERT Model
Fine-Tuning a BERT Model Fine-Tuning a BERT Model

This article is divided into two parts; they are: • Fine-tuning a BERT Model for GLUE Tasks • Fine-tuning a BERT Model for SQuAD Tasks GLUE is a benchmark for evaluating natural language understanding (NLU) tasks.

3 weeks, 6 days назад @ machinelearningmastery.com
The Journey of a Token: What Really Happens Inside a Transformer
The Journey of a Token: What Really Happens Inside a Transformer The Journey of a Token: What Really Happens Inside a Transformer

Large language models (LLMs) are based on the transformer architecture, a complex deep neural network whose input is a sequence of token embeddings.

4 weeks, 1 day назад @ machinelearningmastery.com
Pretrain a BERT Model from Scratch
Pretrain a BERT Model from Scratch Pretrain a BERT Model from Scratch

This article is divided into three parts; they are: • Creating a BERT Model the Easy Way • Creating a BERT Model from Scratch with PyTorch • Pre-training the BERT Model If your goal is to create a BERT model so that you can train it on your own data, using the Hugging Face `transformers` library is the easiest way to get started.

4 weeks, 1 day назад @ machinelearningmastery.com
K-Means Cluster Evaluation with Silhouette Analysis
K-Means Cluster Evaluation with Silhouette Analysis K-Means Cluster Evaluation with Silhouette Analysis

Clustering models in machine learning must be assessed by how well they separate data into meaningful groups with distinctive characteristics.

1 month назад @ machinelearningmastery.com
The Complete Guide to Docker for Machine Learning Engineers
The Complete Guide to Docker for Machine Learning Engineers The Complete Guide to Docker for Machine Learning Engineers

Machine learning models often behave differently across environments.

1 month назад @ machinelearningmastery.com
ML in Production
последний пост None
Sorta Insightful Sorta Insightful
последний пост 1 month, 1 week назад
Authentic Imperfection
Authentic Imperfection Authentic Imperfection

* * *I’ve been thinking about the anger surrounding generative AI.

To keep things fair, he took the best human images and best AI images, meaning human art from famous artists, and AI art from prompters skilled at removing obvious tells of image generation.

When people complain about AI slop, I see it as a complaint against the deluge of default style AI images.

We’ve seen this happen in all forms: AI text, AI music, older forms of computer generated content like CGI.

As much as we celebrate imperfection, digital imperfection is a step too far.

1 month, 1 week назад @ alexirpan.com
Ten Years Later
Ten Years Later Ten Years Later

Every now and then, someone asks me why I blog, and I don’t know really know what to tell them.

That’s another reason I’m not celebrating 10 years with more gusto, I know I’ve been writing less.

Indiana Jones and the Great Circle: I don’t know how they did it, but Indiana Jones and the Great Circle was just fun all the way through.

My one complaint is that the hand-to-hand combat feels like the worst part of the game, so of course they put a bunch of upgrades behind learning parry timings you’ll never use later.

I have not tried Peak, but Another Crab’s Treasure was really good and is worth playing if you’re interested in a Souls-like.

4 months, 1 week назад @ alexirpan.com
Brony Musicians Seize The Means of Production: My Eyewitness Account to BABSCon 2025
Brony Musicians Seize The Means of Production: My Eyewitness Account to BABSCon 2025 Brony Musicians Seize The Means of Production: My Eyewitness Account to BABSCon 2025

A music concert in the evenings, typically set up as a rave with EDM or rock music made by brony musicians.

She has been involved in organizing pony music concerts for over a decade, for both BABSCon and other pony conventions.

Thank you, BABSCon ChairsThe brony musicians immediately jump into an emergency Discord call with Pinkaboo, to get her side of the story.

Other conventions start tweeting in support of the brony musicians, with no one taking BABSCon’s side.

It’s hard for me to explain why I like MLP fan music, because brony music really isn’t accessible.

5 months, 1 week назад @ alexirpan.com
Who is AI For?
Who is AI For? Who is AI For?

I think the easy answer to this question is that right now, AI is for the AI developers.

Code is useful, it makes money, it is a testbed for AI speeding up the development of AI, and it is easy.

I’m working in AI because it pays well and is potentially really good for the world.

The artists did not know what AI was, but when they learned, they quickly decided they did not want it.

It feels like the most likely outcome is that people go all-in on pushing raw intelligence, in the way that AI developers can measure it, leaving behind those that are not like AI developers.

8 months, 4 weeks назад @ alexirpan.com
Lil'Log
последний пост None
The Spectator
последний пост None
Off the Convex Path
последний пост None
Piekniewski's blog
последний пост None
fast.ai NLP fast.ai NLP
последний пост None
Sebastian Ruder
последний пост None
Andrew Karpathy blog
последний пост None
大トロ 大トロ
последний пост None
🔬 Science
Papers With Code Papers With Code
последний пост 5 months назад
/henry123-boy/ SpatialTrackerV2: 3D Point Tracking Made Easy
/henry123-boy/ SpatialTrackerV2: 3D Point Tracking Made Easy /henry123-boy/ SpatialTrackerV2: 3D Point Tracking Made Easy

We present SpatialTrackerV2, a feed-forward 3D point tracking method for monocular videos.

Going beyond modular pipelines built on off-the-shelf components for 3D tracking, our approach unifies the intrinsic connections between point tracking, monocular depth, and camera pose estimation into a high-performing and feedforward 3D point tracker.

It decomposes world-space 3D motion into scene geometry, camera ego-motion, and pixel-wise object motion, with a fully differentiable and end-to-end architecture, allowing scalable training across a wide range of datasets, including synthetic sequences, posed RGB-D videos, and unlabeled in-the-wild footage.

By learning geometry and motion jointly from …

5 months назад @ paperswithcode.com
/antof27/ Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation
/antof27/ Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation /antof27/ Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation

Calisthenics skill classification is the computer vision task of inferring the skill performed by an athlete from images, enabling automatic performance assessment and personalized analytics.

Traditional methods for calisthenics skill recognition are based on pose estimation methods to determine the position of skeletal data from images, which is later fed to a classification algorithm to infer the performed skill.

This work proposes a direct approach to calisthenics skill recognition, which leverages depth estimation and athlete patch retrieval to avoid the computationally expensive human pose estimation module.

Using Depth Anything V2 for depth estimation and YOLOv10 for athlete localizat…

5 months назад @ paperswithcode.com
/snowflakedb/ Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI
/snowflakedb/ Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI /snowflakedb/ Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI

Inference is now the dominant AI workload, yet existing systems force trade-offs between latency, throughput, and cost.

Arctic Inference, an open-source vLLM plugin from Snowflake AI Research, introduces Shift Parallelism, a dynamic parallelism strategy that adapts to real-world traffic while integrating speculative decoding, SwiftKV compute reduction, and optimized embedding inference.

It achieves up to 3.4 times faster request completion, 1.75 times faster generation, and 1.6M tokens/sec per GPU for embeddings, outperforming both latency- and throughput-optimized deployments.

Already powering Snowflake Cortex AI, Arctic Inference delivers state-of-the-art, cost-effective inference for ent…

5 months назад @ paperswithcode.com
/NVIDIA/ FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale
/NVIDIA/ FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale /NVIDIA/ FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale

FourCastNet 3 advances global weather modeling by implementing a scalable, geometric machine learning (ML) approach to probabilistic ensemble forecasting.

The approach is designed to respect spherical geometry and to accurately model the spatially correlated probabilistic nature of the problem, resulting in stable spectra and realistic dynamics across multiple scales.

FourCastNet 3 delivers forecasting accuracy that surpasses leading conventional ensemble models and rivals the best diffusion-based methods, while producing forecasts 8 to 60 times faster than these approaches.

In contrast to other ML approaches, FourCastNet 3 demonstrates excellent probabilistic calibration and retains realis…

5 months назад @ paperswithcode.com
/jingyanw/ Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work?
/jingyanw/ Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work? /jingyanw/ Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work?

We study A/B experiments that are designed to compare the performance of two recommendation algorithms.

The bias arising from this type of data sharing is known as "symbiosis bias".

In this paper, we highlight that, for decision-making purposes, the sign of the GTE often matters more than its precise magnitude when selecting the better algorithm.

We formalize this insight under a multi-armed bandit framework and theoretically characterize when the sign of the expected GTE estimate under data sharing aligns with or contradicts the sign of the true GTE.

Our analysis identifies the level of exploration versus exploitation as a key determinant of how symbiosis bias impacts algorithm selection.

5 months назад @ paperswithcode.com
/qqq-yi/ DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression
/qqq-yi/ DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression /qqq-yi/ DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression

Task-agnostic prompt compression leverages the redundancy in natural language to reduce computational overhead and enhance information density within prompts, especially in long-context scenarios.

Existing methods predominantly rely on information entropy as the metric to compress lexical units, aiming to achieve minimal information loss.

However, these approaches overlook two critical aspects: (i) the importance of attention-critical tokens at the algorithmic level, and (ii) shifts in information entropy during the compression process.

Motivated by these challenges, we propose a dynamic attention-aware approach for task-agnostic prompt compression (DAC).

This approach effectively integrate…

5 months назад @ paperswithcode.com
/lukasellinger/ Simplifications are Absolutists: How Simplified Language Reduces Word Sense Awareness in LLM-Generated Definitions
/lukasellinger/ Simplifications are Absolutists: How Simplified Language Reduces Word Sense Awareness in LLM-Generated Definitions /lukasellinger/ Simplifications are Absolutists: How Simplified Language Reduces Word Sense Awareness in LLM-Generated Definitions

Large Language Models (LLMs) can provide accurate word definitions and explanations for any context.

However, the scope of the definition changes for different target groups, like children or language learners.

We investigate how simplification impacts homonym definition quality across three target groups: Normal, Simple, and ELI5.

Our results show that simplification drastically degrades definition completeness by neglecting polysemy, increasing the risk of misunderstanding.

Fine-tuning Llama 3.1 8B with Direct Preference Optimization substantially improves homonym response quality across all prompt types.

5 months назад @ paperswithcode.com
/pspdada/ Mitigating Object Hallucinations via Sentence-Level Early Intervention
/pspdada/ Mitigating Object Hallucinations via Sentence-Level Early Intervention /pspdada/ Mitigating Object Hallucinations via Sentence-Level Early Intervention

Multimodal large language models (MLLMs) have revolutionized cross-modal understanding but continue to struggle with hallucinations - fabricated content contradicting visual inputs.

Existing hallucination mitigation methods either incur prohibitive computational costs or introduce distribution mismatches between training data and model outputs.

We identify a critical insight: hallucinations predominantly emerge at the early stages of text generation and propagate through subsequent outputs.

To address this, we propose **SENTINEL** (**S**entence-level **E**arly i**N**tervention **T**hrough **IN**-domain pr**E**ference **L**earning), a framework that eliminates dependency on human annotations…

5 months назад @ paperswithcode.com
/owos/ FLEXITOKENS: Flexible Tokenization for Evolving Language Models
/owos/ FLEXITOKENS: Flexible Tokenization for Evolving Language Models /owos/ FLEXITOKENS: Flexible Tokenization for Evolving Language Models

Language models (LMs) are challenging to adapt to new data distributions by simple finetuning.

This is due to the rigidity of their subword tokenizers, which typically remain unchanged during adaptation.

This inflexibility often leads to inefficient tokenization, causing overfragmentation of out-of-distribution domains, unseen languages, or scripts.

In this work, we develop byte-level LMs with learnable tokenizers to make tokenization adaptive.

Our models include a submodule that learns to predict boundaries between the input byte sequence, encoding it into variable-length segments.

5 months назад @ paperswithcode.com
/wojiufukele/ Graph-Structured Data Analysis of Component Failure in Autonomous Cargo Ships Based on Feature Fusion
/wojiufukele/ Graph-Structured Data Analysis of Component Failure in Autonomous Cargo Ships Based on Feature Fusion /wojiufukele/ Graph-Structured Data Analysis of Component Failure in Autonomous Cargo Ships Based on Feature Fusion

To address the challenges posed by cascading reactions caused by component failures in autonomous cargo ships (ACS) and the uncertainties in emergency decision-making, this paper proposes a novel hybrid feature fusion framework for constructing a graph-structured dataset of failure modes.

A hierarchical feature fusion framework is constructed, using Word2Vec encoding to encode subsystem/component features, BERT-KPCA to process failure modes/reasons, and Sentence-BERT to quantify the semantic association between failure impact and emergency decision-making.

The dataset covers 12 systems, 1,262 failure modes, and 6,150 propagation paths.

In the label prediction results, the Shore-based Meteor…

5 months назад @ paperswithcode.com
/YF-W/ Tri-Learn Graph Fusion Network for Attributed Graph Clustering
/YF-W/ Tri-Learn Graph Fusion Network for Attributed Graph Clustering /YF-W/ Tri-Learn Graph Fusion Network for Attributed Graph Clustering

In recent years, models based on Graph Convolutional Networks (GCN) have made significant strides in the field of graph data analysis.

Although the Graph Transformer architecture has mitigated some of these issues, its performance is still limited when processing heterogeneous graph data.

To address these challenges, this study proposes a novel deep clustering framework that comprising GCN, Autoencoder (AE), and Graph Transformer, termed the Tri-Learn Graph Fusion Network (Tri-GFN).

The tri-learning mechanism allows mutual learning among these modules, while the feature fusion strategy enables the model to capture complex relationships, yielding highly discriminative representations for gra…

5 months назад @ paperswithcode.com
/mr-ravin/ APTx Neuron: A Unified Trainable Neuron Architecture Integrating Activation and Computation
/mr-ravin/ APTx Neuron: A Unified Trainable Neuron Architecture Integrating Activation and Computation /mr-ravin/ APTx Neuron: A Unified Trainable Neuron Architecture Integrating Activation and Computation

We propose the APTx Neuron, a novel, unified neural computation unit that integrates non-linear activation and linear transformation into a single trainable expression.

The APTx Neuron is derived from the APTx activation function, thereby eliminating the need for separate activation layers and making the architecture both computationally efficient and elegant.

The proposed neuron follows the functional form $y = \sum_{i=1}^{n} ((\alpha_i + \tanh(\beta_i x_i)) \cdot \gamma_i x_i) + \delta$, where all parameters $\alpha_i$, $\beta_i$, $\gamma_i$, and $\delta$ are trainable.

We validate our APTx Neuron-based architecture on the MNIST dataset, achieving up to 96.69\% test accuracy in just 20 ep…

5 months назад @ paperswithcode.com
/Rec4Fun/ A Reproducibility Study of Product-side Fairness in Bundle Recommendation
/Rec4Fun/ A Reproducibility Study of Product-side Fairness in Bundle Recommendation /Rec4Fun/ A Reproducibility Study of Product-side Fairness in Bundle Recommendation

While this problem has been widely studied in traditional recommendation settings, its implications for bundle recommendation (BR) remain largely unexplored.

Existing fairness frameworks and metrics designed for traditional recommender systems may not directly translate to this multi-layered setting.

In this paper, we conduct a comprehensive reproducibility study of product-side fairness in BR across three real-world datasets using four state-of-the-art BR methods.

We analyze exposure disparities at both the bundle and item levels using multiple fairness metrics, uncovering important patterns.

Overall, our findings offer actionable insights for building fairer bundle recommender systems and…

5 months назад @ paperswithcode.com
/cbobed/ OntView: What you See is What you Meant
/cbobed/ OntView: What you See is What you Meant /cbobed/ OntView: What you See is What you Meant

However, the lack of tools that provide effective visualization is still a significant challenge.

In this paper, we present OntView, an ontology viewer that is designed to provide users with an intuitive visual representation of ontology concepts and their formal definitions through a user-friendly interface.

Building on the use of a DL reasoner, OntView follows a "What you see is what you meant" paradigm, showing the actual inferred knowledge.

One key aspect for this is its ability to visualize General Concept Inclusions (GCI), a feature absent in existing visualization tools.

OntView has been released with an open-source license for the whole community.

5 months назад @ paperswithcode.com
/Rec4Fun/ RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction
/Rec4Fun/ RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction /Rec4Fun/ RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction

These approaches fail to capture elaborate relations hidden in real-world bundle structures, resulting in suboptimal bundle representations.

To overcome this limitation, we propose RaMen, a novel method that provides a holistic multi-strategy approach for bundle construction.

RaMen utilizes both intrinsic (characteristics) and extrinsic (collaborative signals) information to model bundle structures through Explicit Strategy-aware Learning (ESL) and Implicit Strategy-aware Learning (ISL).

Integrating diverse strategies enables RaMen to learn more comprehensive and robust bundle representations.

Meanwhile, Multi-strategy Alignment & Discrimination module is employed to facilitate knowledge tr…

5 months назад @ paperswithcode.com
Papers With Code Papers With Code
последний пост 5 months назад
/PrimisAI/ Adaptive Multi-Agent Reasoning via Automated Workflow Generation
/PrimisAI/ Adaptive Multi-Agent Reasoning via Automated Workflow Generation /PrimisAI/ Adaptive Multi-Agent Reasoning via Automated Workflow Generation

The rise of Large Reasoning Models (LRMs) promises a significant leap forward in language model capabilities, aiming to tackle increasingly sophisticated tasks with unprecedented efficiency and accuracy.

However, despite their impressive performance, recent studies have highlighted how current reasoning models frequently fail to generalize to novel, unseen problems, often resorting to memorized solutions rather than genuine inferential reasoning.

In this paper, we introduce Nexus Architect, an enhanced iteration of our multi-agent system framework, Nexus, equipped with a novel automated workflow synthesis mechanism.

Given a user's prompt and a small set of representative examples, the Archi…

5 months назад @ paperswithcode.com
/sharanya02/ Real Time Captioning of Sign Language Gestures in Video Meetings
/sharanya02/ Real Time Captioning of Sign Language Gestures in Video Meetings /sharanya02/ Real Time Captioning of Sign Language Gestures in Video Meetings

One of the most tested ways to establish such a communication is through the use of sign based languages.

However, not many people are aware of the smaller intricacies involved with sign language.

Sign language recognition using computer vision aims at eliminating the communication barrier between deaf-mute and ordinary people so that they can properly communicate with others.

In recent studies, it has been found that people with hearing disabilities prefer to sign over typing during these video calls.

In this paper, we are proposing a browser extension that will automatically translate sign language to subtitles for everyone else in the video call.

5 months назад @ paperswithcode.com
/alessiopittiglio/ Leveraging Context for Multimodal Fallacy Classification in Political Debates
/alessiopittiglio/ Leveraging Context for Multimodal Fallacy Classification in Political Debates /alessiopittiglio/ Leveraging Context for Multimodal Fallacy Classification in Political Debates

In this paper, we present our submission to the MM-ArgFallacy2025 shared task, which aims to advance research in multimodal argument mining, focusing on logical fallacies in political debates.

Our approach uses pretrained Transformer-based models and proposes several ways to leverage context.

In the fallacy classification subtask, our models achieved macro F1-scores of 0.4444 (text), 0.3559 (audio), and 0.4403 (multimodal).

Our multimodal model showed performance comparable to the text-only model, suggesting potential for improvements.

PDFAbstract

5 months назад @ paperswithcode.com
/RS2002/ One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms
/RS2002/ One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms /RS2002/ One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms

On-demand ride-sharing platforms face the fundamental challenge of dynamically bundling passengers with diverse origins and destinations and matching them with vehicles in real time, all under significant uncertainty.

However, conventional MARL-based ride-sharing approaches heavily rely on the accurate estimation of Q-values or V-values, which becomes problematic in large-scale, highly uncertain environments.

To address these challenges, we propose two novel alternative methods that bypass value function estimation.

First, we adapt GRPO to ride-sharing, replacing the PPO baseline with the group average reward to eliminate critic estimation errors and reduce training bias.

Second, inspired b…

5 months назад @ paperswithcode.com
/LiXinran6/ Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation
/LiXinran6/ Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation /LiXinran6/ Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation

Include the markdown at the top of your GitHub README.md file to showcase the performance of the model.

Badges are live and will be dynamically updated with the latest ranking of this paper.

5 months назад @ paperswithcode.com
/ShimSoonYong/ ZClassifier: Temperature Tuning and Manifold Approximation via KL Divergence on Logit Space
/ShimSoonYong/ ZClassifier: Temperature Tuning and Manifold Approximation via KL Divergence on Logit Space

We introduce a novel classification framework, ZClassifier, that replaces conventional deterministic logits with diagonal Gaussian-distributed logits. Code: https://github.com/ShimSoonYong/ZClassifier

5 months, 1 week назад @ paperswithcode.com
/briziorusso/ On Gradual Semantics for Assumption-Based Argumentation
/briziorusso/ On Gradual Semantics for Assumption-Based Argumentation

In this paper, we fill this gap and propose a family of novel gradual semantics for equipping assumptions, which are the core components in ABA frameworks, with dialectical strengths. Code: https://github.com/briziorusso/GradualABA

5 months, 1 week назад @ paperswithcode.com
/wumingqi/ Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination
/wumingqi/ Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination

Cloudflare is unable to establish an SSL connection to the origin server.

If you're a visitor of this website:Please try again in a few minutes.

If you're the owner of this website:It appears that the SSL configuration used is not compatible with Cloudflare.

This could happen for a several reasons, including no shared cipher suites.

Additional troubleshooting information here.

5 months, 1 week назад @ paperswithcode.com
/IsaacYQH/ WildFX: A DAW-Powered Pipeline for In-the-Wild Audio FX Graph Modeling
/IsaacYQH/ WildFX: A DAW-Powered Pipeline for In-the-Wild Audio FX Graph Modeling

Despite rapid progress in end-to-end AI music generation, AI-driven modeling of professional Digital Signal Processing (DSP) workflows remains challenging. Code: https://github.com/IsaacYQH/WildFX

5 months, 1 week назад @ paperswithcode.com
/summer1278/ Addressing Data Imbalance in Transformer-Based Multi-Label Emotion Detection with Weighted Loss
/summer1278/ Addressing Data Imbalance in Transformer-Based Multi-Label Emotion Detection with Weighted Loss

This paper explores the application of a simple weighted loss function to Transformer-based models for multi-label emotion detection in SemEval-2025 Shared Task 11. Code: https://github.com/summer1278/semeval2025-task11

5 months, 1 week назад @ paperswithcode.com
/gabrielkmbo/ Step-wise Policy for Rare-tool Knowledge (SPaRK): Offline RL that Drives Diverse Tool Use in LLMs
/gabrielkmbo/ Step-wise Policy for Rare-tool Knowledge (SPaRK): Offline RL that Drives Diverse Tool Use in LLMs

We present Step-wise Policy for Rare-tool Knowledge (SPaRK), a novel reinforcement learning framework that teaches large language models to explore diverse tool usage patterns beyond conventional high-temperature sampling. Code: https://github.com/gabrielkmbo/explore-rl

5 months, 1 week назад @ paperswithcode.com
/Cavendish518/ Learning to Tune Like an Expert: Interpretable and Scene-Aware Navigation via MLLM Reasoning and CVAE-Based Adaptation
/Cavendish518/ Learning to Tune Like an Expert: Interpretable and Scene-Aware Navigation via MLLM Reasoning and CVAE-Based Adaptation

Service robots are increasingly deployed in diverse and dynamic environments, where both physical layouts and social contexts change over time and across locations. Code: https://github.com/Cavendish518/LE-Nav

5 months, 1 week назад @ paperswithcode.com
/MatteoFasulo/ AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles
/MatteoFasulo/ AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles

Cloudflare is unable to establish an SSL connection to the origin server.

If you're a visitor of this website:Please try again in a few minutes.

If you're the owner of this website:It appears that the SSL configuration used is not compatible with Cloudflare.

This could happen for a several reasons, including no shared cipher suites.

Additional troubleshooting information here.

5 months, 1 week назад @ paperswithcode.com
/VCA-EPFL/ SystolicAttention: Fusing FlashAttention within a Single Systolic Array
/VCA-EPFL/ SystolicAttention: Fusing FlashAttention within a Single Systolic Array

The frequent data swaps between the systolic array and external vector units result in low systolic array utilization. Code: https://github.com/VCA-EPFL/FSA

5 months, 1 week назад @ paperswithcode.com
/Buddhi19/ Precision Spatio-Temporal Feature Fusion for Robust Remote Sensing Change Detection
/Buddhi19/ Precision Spatio-Temporal Feature Fusion for Robust Remote Sensing Change Detection

Cloudflare is unable to establish an SSL connection to the origin server.

If you're a visitor of this website:Please try again in a few minutes.

If you're the owner of this website:It appears that the SSL configuration used is not compatible with Cloudflare.

This could happen for a several reasons, including no shared cipher suites.

Additional troubleshooting information here.

5 months, 1 week назад @ paperswithcode.com
Papers With Code Papers With Code
последний пост 5 months назад
/fudanvi/ Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning
/fudanvi/ Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning

Cloudflare is unable to establish an SSL connection to the origin server.

If you're a visitor of this website:Please try again in a few minutes.

If you're the owner of this website:It appears that the SSL configuration used is not compatible with Cloudflare.

This could happen for a several reasons, including no shared cipher suites.

Additional troubleshooting information here.

5 months, 1 week назад @ paperswithcode.com
/benedekrozemberczki/ PGT-I: Scaling Spatiotemporal GNNs with Memory-Efficient Distributed Training
/benedekrozemberczki/ PGT-I: Scaling Spatiotemporal GNNs with Memory-Efficient Distributed Training

Spatiotemporal graph neural networks (ST-GNNs) are powerful tools for modeling spatial and temporal data dependencies. Code: https://github.com/benedekrozemberczki/pytorch_geometric_temporal

5 months, 1 week назад @ paperswithcode.com
/chengxuphd/ DCR: Quantifying Data Contamination in LLMs Evaluation
/chengxuphd/ DCR: Quantifying Data Contamination in LLMs Evaluation

Cloudflare is unable to establish an SSL connection to the origin server.

If you're a visitor of this website:Please try again in a few minutes.

If you're the owner of this website:It appears that the SSL configuration used is not compatible with Cloudflare.

This could happen for a several reasons, including no shared cipher suites.

Additional troubleshooting information here.

5 months, 1 week назад @ paperswithcode.com
/gitter-lab/ Assay2Mol: large language model-based drug design using BioAssay context
/gitter-lab/ Assay2Mol: large language model-based drug design using BioAssay context

Scientific databases aggregate vast amounts of quantitative data alongside descriptive text. Code: https://github.com/gitter-lab/Assay2Mol

5 months, 1 week назад @ paperswithcode.com
/hayatkhan8660-maker/ DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition
/hayatkhan8660-maker/ DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition

We employ forward Kullback-Leibler (KL) divergence alongside spatio-temporal focal modulation to effectively transfer both local and global context from the Video-FocalNet Base (teacher) to the proposed VFL-Net (student). Code: https://github.com/hayatkhan8660-maker/DVFL-Net

5 months, 1 week назад @ paperswithcode.com
/JudyJuezhuLong/ Best Practices for Large-Scale, Pixel-Wise Crop Mapping and Transfer Learning Workflows
/JudyJuezhuLong/ Best Practices for Large-Scale, Pixel-Wise Crop Mapping and Transfer Learning Workflows

Cloudflare is unable to establish an SSL connection to the origin server.

If you're a visitor of this website:Please try again in a few minutes.

If you're the owner of this website:It appears that the SSL configuration used is not compatible with Cloudflare.

This could happen for a several reasons, including no shared cipher suites.

Additional troubleshooting information here.

5 months, 1 week назад @ paperswithcode.com
/joaojcorreia/ A Fuzzy Approach to Project Success: Measuring What Matters
/joaojcorreia/ A Fuzzy Approach to Project Success: Measuring What Matters

This paper introduces a novel approach to project success evaluation by integrating fuzzy logic into an existing construct. Code: https://github.com/joaojcorreia/FuzzyLogic_ProjectSuccess

5 months, 1 week назад @ paperswithcode.com
/kunkunlin1221/ InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing
/kunkunlin1221/ InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing

Extensive experiments demonstrate the effectiveness of InstructFLIP by outperforming SOTA models in accuracy and substantially reducing training redundancy across diverse domains in FAS. Code: https://github.com/kunkunlin1221/InstructFLIP

5 months, 1 week назад @ paperswithcode.com
/Linvyl/ Describe Anything Model for Visual Question Answering on Text-rich Images
/Linvyl/ Describe Anything Model for Visual Question Answering on Text-rich Images

Recent progress has been made in region-aware vision-language modeling, particularly with the emergence of the Describe Anything Model (DAM). Code: https://github.com/Linvyl/DAM-QA

5 months, 1 week назад @ paperswithcode.com
/abhijeet3922/ Developing Visual Augmented Q&A System using Scalable Vision Embedding Retrieval & Late Interaction Re-ranker
/abhijeet3922/ Developing Visual Augmented Q&A System using Scalable Vision Embedding Retrieval & Late Interaction Re-ranker

We propose multi-step custom implementation utilizing widely adopted hybrid search (metadata & embedding) and state of the art late interaction re-ranker to retrieve best matching pages. Code: https://github.com/abhijeet3922/vision-RAG

5 months, 1 week назад @ paperswithcode.com
/ziangcao0312/ PhysX: Physical-Grounded 3D Asset Generation
/ziangcao0312/ PhysX: Physical-Grounded 3D Asset Generation

3D modeling is moving from virtual to physical. Code: https://github.com/ziangcao0312/PhysX

5 months, 1 week назад @ paperswithcode.com
/henry123-boy/ SpatialTrackerV2: 3D Point Tracking Made Easy
/henry123-boy/ SpatialTrackerV2: 3D Point Tracking Made Easy

We present SpatialTrackerV2, a feed-forward 3D point tracking method for monocular videos. Code: https://github.com/henry123-boy/SpaTrackerV2

5 months, 1 week назад @ paperswithcode.com
/cncs-fit/ Emergence of Functionally Differentiated Structures via Mutual Information Optimization in Recurrent Neural Networks
/cncs-fit/ Emergence of Functionally Differentiated Structures via Mutual Information Optimization in Recurrent Neural Networks

Analysis of network performance, correlation patterns, and weight matrices reveals that mutual information minimization yields high task performance alongside clear functional modularity and moderate structural modularity. Code: https://github.com/cncs-fit/mio_rnn

5 months, 1 week назад @ paperswithcode.com
/coswindywang/ Making Language Model a Hierarchical Classifier and Generator
/coswindywang/ Making Language Model a Hierarchical Classifier and Generator

Language heads of the last layer are copied to different selected intermediate layers, and fine-tuned with different task inputs. Code: https://github.com/coswindywang/HdLM

5 months, 1 week назад @ paperswithcode.com
/ahmedehabb/ From Roots to Rewards: Dynamic Tree Reasoning with RL
/ahmedehabb/ From Roots to Rewards: Dynamic Tree Reasoning with RL

Modern language models address complex questions through chain-of-thought (CoT) reasoning (Wei et al., 2023) and retrieval augmentation (Lewis et al., 2021), yet struggle with error propagation and knowledge integration. Code: https://github.com/ahmedehabb/From-Roots-to-Rewards-Dynamic-Tree-Reasoning-with-RL

5 months, 1 week назад @ paperswithcode.com
💼 University and corporation labs
DeepMind DeepMind
последний пост 2 days, 5 hours назад
Google's year in review: 8 areas with research breakthroughs in 2025
Google's year in review: 8 areas with research breakthroughs in 2025 Google's year in review: 8 areas with research breakthroughs in 2025

Google 2025 recap: Research breakthroughs of the year

2 days, 5 hours назад @ deepmind.google
Gemini 3 Flash: frontier intelligence built for speed
Gemini 3 Flash: frontier intelligence built for speed Gemini 3 Flash: frontier intelligence built for speed

Today, we're expanding the Gemini 3 model family with the release of Gemini 3 Flash, which offers frontier intelligence built for speed at a fraction of the cost.

Last month, we kicked off Gemini 3 with Gemini 3 Pro and Gemini 3 Deep Think mode, and the response has been incredible.

With Gemini 3, we introduced frontier performance across complex reasoning, multimodal and vision understanding and agentic and vibe coding tasks.

Gemini 3 Flash retains this foundation, combining Gemini 3's Pro-grade reasoning with Flash-level latency, efficiency and cost.

Starting today, Gemini 3 Flash is rolling out to millions of people globally:

1 week, 1 day назад @ blog.google
Gemma Scope 2: helping the AI safety community deepen understanding of complex language model behavior
Gemma Scope 2: helping the AI safety community deepen understanding of complex language model behavior Gemma Scope 2: helping the AI safety community deepen understanding of complex language model behavior

Today, we are releasing Gemma Scope 2: a comprehensive, open suite of interpretability tools for all Gemma 3 model sizes, from 270M to 27B parameters.

Producing Gemma Scope 2 involved storing approximately 110 Petabytes of data, as well as training over 1 trillion total parameters.

What’s new in Gemma Scope 2Interpretability research aims to understand the internal workings and learned algorithms of AI models.

Like its predecessor, Gemma Scope 2 acts as a microscope for the Gemma family of language models.

While the original Gemma Scope enabled research in key areas of safety, such as model hallucination, identifying secrets known by a model, and training safer models, Gemma Scope 2 support…

1 week, 2 days назад @ deepmind.google
Improved Gemini audio models for powerful voice experiences
Improved Gemini audio models for powerful voice experiences Improved Gemini audio models for powerful voice experiences

Earlier this week, we introduced greater control over audio generation with an upgrade to our Gemini 2.5 Pro and Flash Text-to-Speech models.

Today, we’re releasing an updated Gemini 2.5 Flash Native Audio for live voice agents.

Gemini 2.5 Flash Native Audio is now available across Google products including Google AI Studio, Vertex AI, and has also started rolling out in Gemini Live and Search Live, bringing the naturalness of native audio to Search Live for the first time.

Beyond powering helpful agents, native audio unlocks new possibilities for global communication.

Live Voice Agents

1 week, 6 days назад @ blog.google
Deepening our partnership with the UK AI Security Institute
Deepening our partnership with the UK AI Security Institute Deepening our partnership with the UK AI Security Institute

Today, we're announcing an expanded partnership with the UK AI Security Institute (AISI) through a new Memorandum of Understanding focused on foundational security and safety research, to help ensure artificial intelligence is developed safely and benefits everyone.

The research partnership with AISI is an important part of our broader collaboration with the UK government on accelerating safe and beneficial AI progress.

This is why we have partnered with the UK AISI since its inception in November 2023 to test our most capable models.

We are actively working with AISI to build more robust evaluations for AI models, and our teams have collaborated on safety research to move the field forward…

2 weeks назад @ deepmind.google
Strengthening our partnership with the UK government to support prosperity and security in the AI era
Strengthening our partnership with the UK government to support prosperity and security in the AI era Strengthening our partnership with the UK government to support prosperity and security in the AI era

The UK has already laid a strong foundation to seize this moment and is uniquely positioned to translate AI innovation into public benefit.

That’s why we are excited to deepen our collaboration with the UK government to accelerate this work and offer a blueprint for other countries.

Accelerating access to frontier AI in key sectors: Science & EducationOur partnership will center on providing access to frontier AI in two areas foundational to the UK’s long-term success: scientific discovery and education.

The UK has a rich history of applying new technologies to drive scientific progress, from Hooke’s microscope to Faraday’s electrical experiments.

Establishing Google DeepMind’s first automa…

2 weeks, 1 day назад @ deepmind.google
FACTS Benchmark Suite: Systematically evaluating the factuality of large language models
FACTS Benchmark Suite: Systematically evaluating the factuality of large language models FACTS Benchmark Suite: Systematically evaluating the factuality of large language models

The FACTS Benchmark SuiteToday, we’re teaming up with Kaggle to introduce the FACTS Benchmark Suite.

A Search Benchmark that tests a model’s ability to use Search as a tool to retrieve information and synthesize it correctly.

Similar to our previous release, we are following standard industry practice and keeping an evaluation set held-out as a private set.

The FACTS Benchmark Suite Score (or FACTS Score) is calculated as the average accuracy of both public and private sets across the four benchmarks.

Kaggle will oversee the management of the FACTS Benchmark Suite.

2 weeks, 2 days назад @ deepmind.google
Engineering more resilient crops for a warming climate
Engineering more resilient crops for a warming climate Engineering more resilient crops for a warming climate

Scientists are using AlphaFold in their research to strengthen an enzyme that’s vital to photosynthesis, paving the way for more heat-tolerant crops.

As global warming accompanies more droughts and heatwaves, harvests of some staple crops are shrinking.

But less visible is what is happening inside these plants, where high heat can break down the molecular machinery that keeps them alive.

Plants use photosynthesis to produce the glucose that fuels their growth via an intricate choreography of enzymes inside plant cells.

"Our job is to learn from those examples and build that same resilience into the crops we depend on."

3 weeks назад @ deepmind.google
AlphaFold: Five years of impact
AlphaFold: Five years of impact AlphaFold: Five years of impact

They used AlphaFold alongside comparative genomics to better understand how plants perceive changes in their environment, paving the way for more resilient crops.

AlphaFold has been cited in more than 35,000 papers and more than 200,000 papers incorporated elements of AlphaFold 2 in their methodology.

An independent analysis of AlphaFold 2’s impact, carried out by the Innovation Growth Lab, suggests that researchers using AlphaFold 2 see an increase of over 40% in their submission of novel experimental protein structures.

Those protein structures are more likely to be dissimilar to known structures, encouraging the exploration of uncharted areas of science.

The AlphaFold Server is empowerin…

1 month назад @ deepmind.google
Revealing a key protein behind heart disease
Revealing a key protein behind heart disease Revealing a key protein behind heart disease

Both have a family history of heart disease – a reminder of what’s at stake in their work to better understand and ultimately help treat this deadly condition.

That protein, apoB100, has defied mapping not only because it’s enormous (for a protein), but also because it connects to fats and other molecules in complicated ways.

ApoB100 forms the molecular scaffold of “bad cholesterol”, which is known to scientists as low-density lipoprotein (LDL).

Discovering the structure of its key protein promised to shed light on how bad cholesterol becomes harmful inside the body, giving scientists a better chance to develop ways to prevent and treat ASCVD.

The images weren’t sharp enough to map the stru…

1 month назад @ deepmind.google
Google DeepMind supports U.S. Department of Energy on Genesis: a national mission to accelerate innovation and scientific discovery
Google DeepMind supports U.S. Department of Energy on Genesis: a national mission to accelerate innovation and scientific discovery Google DeepMind supports U.S. Department of Energy on Genesis: a national mission to accelerate innovation and scientific discovery

We stand at an inflection point where the convergence of advanced AI and scientific research promises to unlock a new golden age of discovery.

There is perhaps no clearer expression of this than the application of AI within science.

Putting our advanced AI tools into the hands of American scientistsGoogle DeepMind will provide an accelerated access program for scientists at all 17 DOE National Laboratories to our frontier AI for Science models and agentic tools, starting today with AI co-scientist on Google Cloud.

We’re excited to see what America’s leading researchers will be able to do with our frontier AI models and agentic tools.

By combining human ingenuity with advanced AI capabilitie…

1 month назад @ deepmind.google
How we’re bringing AI image verification to the Gemini app
How we’re bringing AI image verification to the Gemini app How we’re bringing AI image verification to the Gemini app

At Google, we’ve long invested in ways to provide you with helpful context about information you see online.

Now, as generative media becomes increasingly prevalent and high-fidelity, we are deploying tools to help you more easily determine whether the content you're interacting with was created or edited using AI.

Starting today, we’re making it easier for everyone to verify if an image was generated with or edited by Google AI right in the Gemini app, using SynthID, our digital watermarking technology that embeds imperceptible signals into AI-generated content.

Since then, over 20 billion AI-generated pieces of content have been watermarked using SynthID, and we have been testing our Synt…

1 month назад @ blog.google
Build with Nano Banana Pro, our Gemini 3 Pro Image model
Build with Nano Banana Pro, our Gemini 3 Pro Image model Build with Nano Banana Pro, our Gemini 3 Pro Image model

Today, we’re releasing Nano Banana Pro (Gemini 3 Pro Image), a higher-fidelity model built on Gemini 3 Pro for developers to access studio-quality image generation.

This follows our release of Nano Banana (Gemini 2.5 Flash Image) just a few months ago.

Since then, we’ve loved seeing the community put its key features to work — from character consistency to photo restoration, and even using its capabilities to make local edits in an infinite canvas.

This state-of-the-art image generation and editing model is starting to roll out in paid preview to build a new wave of intelligent, multimodal applications with the Gemini API in Google AI Studio and Vertex AI for enterprises.

This model unlocks…

1 month назад @ blog.google
Introducing Nano Banana Pro
Introducing Nano Banana Pro Introducing Nano Banana Pro

How Nano Banana Pro helps you bring any idea or design to lifeNano Banana Pro can help you visualize any idea and design anything — from prototypes, to representing data as infographics, to turning handwritten notes into diagrams.

With Nano Banana Pro, now you can:Generate more accurate, context-rich visuals based on enhanced reasoning, world knowledge and real-time informationWith Gemini 3’s advanced reasoning, Nano Banana Pro doesn’t just create beautiful images, it also helps you create more helpful content.

You can get accurate educational explainers to learn more about a new subject, like context-rich infographics and diagrams based on the content you provide or facts from the real wor…

1 month назад @ blog.google
Start building with Gemini 3
Start building with Gemini 3 Start building with Gemini 3

Google AntigravityTo advance how the model and IDE work together, we’re introducing Google Antigravity to showcase what’s possible with Gemini 3.

It’s a faster way to develop: you act as the architect, collaborating with intelligent agents that operate autonomously across the editor, terminal, and browser.

These agents plan and execute complex software tasks, communicating their work with the user via detailed artifacts.

This elevates all aspects of development, from building features, UI iteration, and fixing bugs to researching and generating reports.

Visit the Google Antigravity website to download the public preview at no charge, now available for MacOS, Windows and Linux.

1 month, 1 week назад @ blog.google
Google
последний пост 6 days, 5 hours назад
Cloud CISO Perspectives: 2025 in review: Cloud security basics and evolving AI
Cloud CISO Perspectives: 2025 in review: Cloud security basics and evolving AI Cloud CISO Perspectives: 2025 in review: Cloud security basics and evolving AI

Building the most trusted cloudWe continued to enhance our security capabilities and controls on our cloud platform to help organizations secure their cloud environments and address evolving policy, compliance, and business objectives.

Our forecast for 2026As security professionals, we know that threat actors will continue to innovate to achieve their mission objectives.

To help defenders proactively prepare for the coming year, we publish our annual forecast report with insights from across Google.

We look forward to sharing more insights to help organizations strengthen their security posture in the new year.

For more leadership guidance from Google Cloud experts, please visit our CISO In…

6 days, 5 hours назад @ cloud.google.com
Getting AI to write good SQL: Optimizing the AlloyDB AI natural language API for your use case
Getting AI to write good SQL: Optimizing the AlloyDB AI natural language API for your use case Getting AI to write good SQL: Optimizing the AlloyDB AI natural language API for your use case

Descriptive and prescriptive contextAs mentioned above, the AlloyDB AI natural language API relies on descriptive and prescriptive context to improve the accuracy of the SQL code it generates.

The AlloyDB AI natural language API facilitates the creation of descriptive and prescriptive context.

The value index clarifies what kind of entity “John Smith” is, and can be automatically created by AlloyDB AI for your application.

Natural language search over structured, unstructured and multimodal dataWhen it comes to applications that provide search over structured data, the AlloyDB AI natural language API enables a clean and powerful search experience.

Bringing the AlloyDB AI natural language AP…

1 week назад @ cloud.google.com
Announcing advanced governance capabilities for Vertex AI Agent Builder
Announcing advanced governance capabilities for Vertex AI Agent Builder Announcing advanced governance capabilities for Vertex AI Agent Builder

At Google Cloud, we continue to make critical investments to Vertex AI Agent Builder, our comprehensive and open platform, enabling you to build faster, scale efficiently, and govern with enterprise-grade security.

Today, with the integration of the Cloud API Registry, we’re excited to bring enhanced tool governance capabilities to Vertex AI Agent Builder.

With this latest update, administrators can now manage available tools for developers across your organization directly in Vertex AI Agent Builder Console, and developers can leverage tools managed by the registry with a new ApiRegistry .

Govern your tools with confidenceBuilding a useful agent requires the agent to have access to the nec…

1 week назад @ cloud.google.com
Automate AI and HPC clusters with Cluster Director, now generally available
Automate AI and HPC clusters with Cluster Director, now generally available Automate AI and HPC clusters with Cluster Director, now generally available

Today, we are delivering on those requirements with the General Availability (GA) of Cluster Director and the Preview of Cluster Director support for Slurm on Google Kubernetes Engine (GKE).

Cluster Director (GA) is a managed infrastructure service designed to meet the rigorous demands of modern supercomputing.

There's no extra charge to use Cluster Director.

How Cluster Director supports each phase of deploymentDay 0: PreparationStanding up a cluster typically involves weeks of planning, wrangling Terraform, and debugging the network.

Cluster Director changes the ‘Day 0’ experience entirely, with tools for designing infrastructure topology that’s optimized for your workload requirements.

1 week, 1 day назад @ cloud.google.com
Google named a Leader in The Forrester Wave™: AI Infrastructure Solutions, Q4 2025
Google named a Leader in The Forrester Wave™: AI Infrastructure Solutions, Q4 2025 Google named a Leader in The Forrester Wave™: AI Infrastructure Solutions, Q4 2025

Yesterday, Forrester released The Forrester Wave™: AI Infrastructure Solutions, Q4 2025 report, evaluating 13 vendors, and we believe their findings validate our commitment to solving these core challenges.

Access the full report: The Forrester Wave™: AI Infrastructure Solutions, Q4 2025Accelerating time-to-value with an integrated systemEnterprises don’t run AI in a vacuum.

Delivering continuous AI innovationWe are honored to be recognized as a Leader in The Forrester Wave™ report, which we believe validates decades of R&D and our approach to building ultra-scale AI infrastructure.

Access the full report: The Forrester Wave™: AI Infrastructure Solutions, Q4 20251.

IDC Business Value Snapsh…

1 week, 1 day назад @ cloud.google.com
Introducing Gemini 3 Flash: Intelligence and speed for enterprises
Introducing Gemini 3 Flash: Intelligence and speed for enterprises Introducing Gemini 3 Flash: Intelligence and speed for enterprises

Today, we’re expanding the Gemini 3 model family with Gemini 3 Flash, which offers frontier intelligence built for speed at a fraction of the cost.

Gemini 3 Flash builds on the model series that developers and enterprises already love, optimized for high-frequency workflows that demand speed, without sacrificing quality.

Gemini 3 Flash is built to be highly efficient, pushing the boundaries of quality at better price performance and faster speed.

It is available now in Gemini Enterprise, Vertex AI, and Gemini CLI, so businesses and developers can access:Advanced multimodal processing: Gemini 3 Flash enables enterprises to build applications capable of complex video analysis, data extraction…

1 week, 1 day назад @ cloud.google.com
Connect your enterprise data to Google’s new Antigravity IDE
Connect your enterprise data to Google’s new Antigravity IDE Connect your enterprise data to Google’s new Antigravity IDE

Google Cloud is at the forefront of this shift, empowering you to build robust, data-driven applications quickly and accurately.

With Model Context Protocol (MCP) servers powered by MCP Toolbox for Databases now available within Antigravity, you can securely connect your AI agents to services like AlloyDB for PostgreSQL, BigQuery, Spanner, Cloud SQL, Looker and others within Google’s Data Cloud, all within your development workflow.

We designed Antigravity to keep you in the flow, but the power of an AI agent is limited by what it "knows."

By integrating pre-built MCP servers directly into Antigravity, you don’t need to perform any manual configuration.

Discover and launchYou can find MCP s…

1 week, 3 days назад @ cloud.google.com
Cloud CISO Perspectives: Our 2026 Cybersecurity Forecast report
Cloud CISO Perspectives: Our 2026 Cybersecurity Forecast report Cloud CISO Perspectives: Our 2026 Cybersecurity Forecast report

Marina Kaganovich, executive trust leadThe heightened capability of agentic AI to take actions and execute tasks autonomously elevates the importance of cybersecurity basics.

Vesselin Tzvetkov, senior cybersecurity advisorAs Francis noted, agentic security operations are set to become the standard for modern SOCs, dramatically enhancing the speed and capabilities of security organizations.

Vinod D’Souza, director, manufacturing and industryIn 2026, agentic AI will help the manufacturing and industrial sector cross the critical threshold from static automation to true autonomy.

By rooting security strategies in data-centered Zero Trust, organizations stop treating security as a gatekeeper an…

1 week, 6 days назад @ cloud.google.com
A developer's guide to Gemini Live API in Vertex AI
A developer's guide to Gemini Live API in Vertex AI A developer's guide to Gemini Live API in Vertex AI

Today, we announced the general availability of Gemini Live API on Vertex AI, which is powered by the latest Gemini 2.5 Flash Native Audio model.

In this post we'll look at two templates and three reference demos that help you understand how to best use Gemini Live API.

Gemini Live API fundamentally changes the engineering approach with a unified, low-latency, native audio architecture.

Native audio processing: Gemini 2.5 Flash Native Audio model processes raw audio natively through a single, low-latency model.

Next-generation conversation featuresGemini Live API gives you a suite of production-ready features that define a new standard for AI agents:

1 week, 6 days назад @ cloud.google.com
How to connect Looker to Gemini Enterprise in minutes, with MCP Toolbox and ADK
How to connect Looker to Gemini Enterprise in minutes, with MCP Toolbox and ADK How to connect Looker to Gemini Enterprise in minutes, with MCP Toolbox and ADK

We can all agree that the quality of AI-driven answers relies on the consistency of the underlying data.

Building off the recent introduction of Looker’s Model Context Protocol (MCP) server, in this blog we take you through the process of creating an Agent Development Kit (ADK) agent that is connected to Looker via the MCP Toolbox for Databases and exposing it within Gemini Enterprise.

Instead of managing tool logic and authentication themselves, agents act as MCP clients and request tools from the Toolbox.

The MCP Toolbox handles all the underlying complexities, including secure connections to Looker, authentication and query execution.

The MCP Toolbox for Databases natively supports Looke…

1 week, 6 days назад @ cloud.google.com
Gemini Live API Now GA on Vertex AI
Gemini Live API Now GA on Vertex AI Gemini Live API Now GA on Vertex AI

Today, we are excited to announce that Gemini Live API, powered by the latest Gemini 2.5 Flash Native Audio model, is generally available on Vertex AI.

A new standard with real-time multimodal AI agentsGemini Live API represents a new standard for bringing AI to life.

Deploying on Vertex AI allows you to leverage our expanding global infrastructure across multiple regions, delivering reliability for your users.

Building real-world impact with Gemini Live APIThe true power of Gemini Live API is demonstrated by the companies who are using it today to redefine their customer experiences.

Shopify, the leading global commerce platform, developed Sidekick, a multimodal AI assistant powered by Gem…

1 week, 6 days назад @ cloud.google.com
AI agents are here. Is your infrastructure ready?
AI agents are here. Is your infrastructure ready? AI agents are here. Is your infrastructure ready?

In a recent IDC global survey of over 1,300 AI decision-makers, inference was already cited as the largest AI workload segment, accounting for 47% of all AI operations.

This surge in demand is exposing a critical vulnerability for many organizations: the AI efficiency gap.

The TCO crisis in an age of agentsThe AI efficiency gap is the difference between the theoretical performance of an AI stack and the actual, real-world performance achieved.

That is why we created AI Hypercomputer: an integrated supercomputer system designed to deliver exceptional performance and efficiency for demanding AI workloads.

Get your free copy of the whitepaper to learn more: The AI Efficiency Gap: From TCO Cris…

2 weeks назад @ cloud.google.com
How we built a multi-agent system for superior business forecasting
How we built a multi-agent system for superior business forecasting How we built a multi-agent system for superior business forecasting

This innovative solution combines two powerful, specialized AI agents: a prediction agent built by Google Cloud and App Orchid’s Data Agent offering.

Google prediction agent - The forecasting powerhouseThe prediction agent, which is primarily the custom engineering work of Google Cloud, is the system’s window to the future.

App Orchid Data Agent - The enterprise intelligence data expertAccurate predictions depend on high-quality, AI-ready data, which is where App Orchid’s Data Agent excels.

The combined business forecasting agentAt the heart of the solution is a unified business forecasting agent, which brings together the capabilities of our unique prediction and data agents in a discrete …

2 weeks назад @ cloud.google.com
Announcing MCP support in Apigee: Turn existing APIs into secure and governed agentic tools
Announcing MCP support in Apigee: Turn existing APIs into secure and governed agentic tools Announcing MCP support in Apigee: Turn existing APIs into secure and governed agentic tools

When a tools/list or tools/call request is made to the MCP endpoint, Apigee uses the operations documented in the OpenAPI spec as the MCP tools list.

And, with the recent launch of Apigee API insights, you can also use the new “Insights” tab in Apigee API hub’s catalog to view traffic and performance metrics for your MCP endpoints.

Benefits of Apigee’s approach to MCP supportOur main goal with MCP support in Apigee is to make sure that you can secure, govern, and monitor usage of MCP tools with the same policies and workflows in Apigee that you’re already familiar with.

Centralized tool catalog: After you deploy an MCP proxy, Apigee automatically registers your MCP endpoint in Apigee API hu…

2 weeks, 1 day назад @ cloud.google.com
Announcing Model Context Protocol (MCP) support for Google services
Announcing Model Context Protocol (MCP) support for Google services Announcing Model Context Protocol (MCP) support for Google services

Today we’re announcing the release of fully-managed, remote MCP servers.

Google’s existing API infrastructure is now enhanced to support MCP, providing a unified layer across all Google and Google Cloud services.

Developers can now simply point their AI agents or standard MCP clients like Gemini CLI and AI Studio to a globally-consistent and enterprise-ready endpoint for Google and Google Cloud services.

With the new Cloud API Registry and Apigee API Hub, developers can find trusted MCP tools from Google and their own organizations, respectively.

We pair this ease of discovery with rigorous control: administrators can manage access via Google Cloud IAM, rely on audit logging for observabili…

2 weeks, 1 day назад @ cloud.google.com
OpenAI
последний пост None
Microsoft Microsoft
последний пост 2 weeks назад
Agent Lightning: Adding reinforcement learning to AI agents without code rewrites
Agent Lightning: Adding reinforcement learning to AI agents without code rewrites Agent Lightning: Adding reinforcement learning to AI agents without code rewrites

To address this, a research team from Microsoft Research Asia – Shanghai has introduced Agent Lightning.

Whether it involves multiple collaborating agents or dynamic tool use, Agent Lightning breaks it down into a sequence of transitions.

Agent Lightning as middlewareAgent Lightning serves as middleware between RL algorithms and agent environments, providing with modular components that enable scalable RL through standardized protocols and well-defined interfaces.

In practice, developers can keep their existing agent frameworks and switch model calls to the Agent Lightning API without changing their agent code (Figure 5).

By bridging existing agentic systems with reinforcement learning, Age…

2 weeks назад @ microsoft.com
Promptions helps make AI prompting more precise with dynamic UI controls
Promptions helps make AI prompting more precise with dynamic UI controls Promptions helps make AI prompting more precise with dynamic UI controls

To address this, we are excited to introduce Promptions (prompt + options), a UI framework that helps developers build AI interfaces with more precise user control.

We compared the static design from the first study, called the “Static Prompt Refinement Control” (Static PRC), against a “Dynamic Prompt Refinement Control” (Dynamic PRC) with features that responded to participants’ feedback.

Comparison of user preferences for Static PRC versus Dynamic PRC across key evaluation criteria.

(1) The Option Module reads the user’s prompt and conversation history and (2) generates prompt options.

Key usability challenges include clarifying how dynamic options affect AI output and managing the comple…

2 weeks, 1 day назад @ microsoft.com
GigaTIME: Scaling tumor microenvironment modeling using virtual population generated by multimodal AI
GigaTIME: Scaling tumor microenvironment modeling using virtual population generated by multimodal AI GigaTIME: Scaling tumor microenvironment modeling using virtual population generated by multimodal AI

C, Scatter plot comparing the subtype-level GigaTIME-translated virtual mIF activations between TCGA and Providence virtual populations.

To our knowledge, this is the first large-scale study exploring multimodal AI for scaling virtual mIF generation.

H, A case study showcasing the activation maps across different virtual mIF channels for a H&E slide in our virtual population, and virtual mIF of sample patches from this slide.

By applying GigaTIME to Providence real-world data, we generated a virtual population of 14,256 patients with virtual mIF and key clinical attributes.

G, Bar plot comparing pan-cancer patient stratification performance in terms of survival log rank p-values among virtu…

2 weeks, 2 days назад @ microsoft.com
Ideas: Community building, machine learning, and the future of AI
Ideas: Community building, machine learning, and the future of AI Ideas: Community building, machine learning, and the future of AI

This week, machine learning researchers around the world will be attending the annual Conference on Neural Information Processing Systems, or NeurIPS.

In this series, we’ll explore the technologies that are shaping our future and the big ideas that propel them forward.

So around that time when I started my PhD at Penn, I was working in machine learning theory and algorithmic economics.

How had you experienced a lack of community or network of women in machine learning before the founding of WiML?

So particularly when working on topics related to fairness, I’ve ended up focusing a bunch on stuff to do with marginalized groups as part of my responsible AI work.

3 weeks, 3 days назад @ microsoft.com
Reducing Privacy leaks in AI: Two approaches to contextual integrity
Reducing Privacy leaks in AI: Two approaches to contextual integrity Reducing Privacy leaks in AI: Two approaches to contextual integrity

The theory of contextual integrity frames privacy as the appropriateness of information flow within specific social contexts.

Each tackles contextual integrity from a different angle, but both aim to build directly into AI systems a greater sensitivity to information-sharing norms.

Contextual Integrity in LLMs via Reasoning and Reinforcement Learning, accepted at NeurIPS 2025, takes a different approach to applying contextual integrity.

Contextual integrity through reasoning and reinforcement learningIn our second paper, we explore whether contextual integrity can be built into the model itself rather than enforced through external checks at inference time.

To address this trade-off, we int…

1 month назад @ microsoft.com
Fara-7B: An Efficient Agentic Model for Computer Use
Fara-7B: An Efficient Agentic Model for Computer Use Fara-7B: An Efficient Agentic Model for Computer Use

Today, we are pleased to announce Fara-7B, our first agentic SLM designed specifically for computer use.

Unlike traditional chat models that generate text-based responses, Computer Use Agent (CUA) models like Fara-7B leverage computer interfaces, such as a mouse and keyboard, to complete tasks on behalf of users.

This results in reduced latency and improved privacy, as user data remains local.

Fara-7B breaks ground on a new pareto frontier, showing that on-device computer use agents are approaching the capabilities of frontier models.

For guidance on how to use our model safely, and the security considerations to be mindful of when using our model, please refer to our Model card (opens in n…

1 month назад @ microsoft.com
MMCTAgent: Enabling multimodal reasoning over large video and image collections
MMCTAgent: Enabling multimodal reasoning over large video and image collections MMCTAgent: Enabling multimodal reasoning over large video and image collections

Real-world reasoning increasingly involves analyzing long-form video content, where context spans minutes or hours, far beyond the context limits of most models.

The Planner agent decomposes a user query, identifies the appropriate reasoning tools, performs multimodal operations, and drafts a preliminary answer.

MMCTAgent’s Planner–Critic architecture enables multimodal reasoning over long-form video through structured ingestion, retrieval, and iterative feedback.

The VideoAgent extends this architecture to long-form video reasoning.

Takeaways and next stepsMMCTAgent demonstrates a scalable agentic approach to multimodal reasoning with a Planner–Critic architecture.

1 month, 1 week назад @ microsoft.com
BlueCodeAgent: A blue teaming agent enabled by automated red teaming for CodeGen AI
BlueCodeAgent: A blue teaming agent enabled by automated red teaming for CodeGen AI BlueCodeAgent: A blue teaming agent enabled by automated red teaming for CodeGen AI

Many studies have explored red teaming code LLMs, testing whether the models can reject unsafe requests and whether their generated code exhibits insecure patterns.

Knowledge-enhanced blue teaming: Building on the foundation of red-teaming knowledge, BlueCodeAgent significantly improves blue-teaming performance by leveraging constitutions derived from knowledge and dynamic testing.

Generalization to seen and unseen risks: Empowered by comprehensive red-teaming knowledge, BlueCodeAgent generalizes effectively to unseen risks.

A blue teaming agent enabled by red teamingFigure 2: Overview of BlueCodeAgent, an end-to-end blue teaming framework powered by automated red teaming for code security.…

1 month, 2 weeks назад @ microsoft.com
When industry knowledge meets PIKE-RAG: The innovation behind Signify’s customer service boost
When industry knowledge meets PIKE-RAG: The innovation behind Signify’s customer service boost When industry knowledge meets PIKE-RAG: The innovation behind Signify’s customer service boost

Spotlight: Event Series Microsoft Research Forum Join us for a continuous exchange of ideas about research in the era of general AI.

These differentiated advantages stem from PIKE-RAG’s unique approach to understanding and processing professional knowledge.

“It’s also worth noting that the researchers at Microsoft Research Asia demonstrated strong industry knowledge and rigorous scientific methodology.

Through this collaboration, we validated that PIKE-RAG’s general approach can greatly improve the accuracy of professional knowledge Q&A and accelerate scenario customization.

Our researchers also gained valuable experience in handling domain-specific data,” explained Jiang Bian, partner rese…

1 month, 2 weeks назад @ microsoft.com
Magentic Marketplace: an open-source simulation environment for studying agentic markets
Magentic Marketplace: an open-source simulation environment for studying agentic markets Magentic Marketplace: an open-source simulation environment for studying agentic markets

To help navigate this uncertainty, we built Magentic Marketplace (opens in new tab)— an open-source simulation environment for exploring the numerous possibilities of agentic markets and their societal implications at scale.

To explore these dynamics in depth, the Magentic Marketplace platform enables controlled experimentation across diverse agentic marketplace scenarios.

With Magentic Marketplace, researchers can model how agents representing customers and businesses interact—shedding light on the dynamics that could shape future digital markets.

Magentic Marketplace includes two agent types: Assistant Agents (customers) and Service Agents (businesses).

Unlike traditional markets, which d…

1 month, 2 weeks назад @ microsoft.com
RedCodeAgent: Automatic red-teaming agent against diverse code agents
RedCodeAgent: Automatic red-teaming agent against diverse code agents RedCodeAgent: Automatic red-teaming agent against diverse code agents

In the context of code, effective red-teaming requires more than simply checking whether the target code agent rejects unsafe requests.

After the second request was rejected by the code agent, RedCodeAgent invoked both Code Substitution and GCG to optimize the prompt.

Ultimately, RedCodeAgent successfully combined the suggestion from Code Substitution (i.e., using pathlib) with the adversarial suffix generated by GCG, making the target code agent delete the specified file.

In the context of code, it is not enough for the target code agent to simply avoid rejecting the request; the target code agent must also generate and execute code that performs the intended function.

Quantitatively, we f…

1 month, 3 weeks назад @ microsoft.com
Tell me when: Building agents that can wait, monitor, and act
Tell me when: Building agents that can wait, monitor, and act Tell me when: Building agents that can wait, monitor, and act

This matters because monitoring tasks are everywhere.

To address this, we are introducing SentinelStep (opens in new tab), a mechanism that enables agents to complete long-running monitoring tasks.

Most real-world monitoring tasks share this limitation, making systematic bench marking very challenging.

In response, we are developing SentinelBench, a suite of synthetic web environments for evaluating monitoring tasks.

By embedding patience into plans, agents can responsibly monitor conditions and act when it matters—staying proactive without wasting resources.

2 months назад @ microsoft.com
Ideas: More AI-resilient biosecurity with the Paraphrase Project
Ideas: More AI-resilient biosecurity with the Paraphrase Project Ideas: More AI-resilient biosecurity with the Paraphrase Project

Today, I’m excited to talk about the Paraphrase Project, an effort I co-led exploring how advances in AI tools for protein design might impact biosecurity.

These “patches,” akin to those in cybersecurity, have now been shared with organizations globally to strengthen biosecurity screening.

The project highlights that the same AI tools capable of incredible good can also be misused, requiring us to be vigilant, thoughtful, and creative so we continue to get the most benefit out of AI tools while working to ensure that we avoid costly misuses.

So things like, how similar is this to that template, wild-type protein structure that we used as our conditioning information?

But I feel like broadly…

2 months, 2 weeks назад @ microsoft.com
Ideas: More AI-resilient biosecurity with the Paraphrase Project
Ideas: More AI-resilient biosecurity with the Paraphrase Project Ideas: More AI-resilient biosecurity with the Paraphrase Project

Today, I’m excited to talk about the Paraphrase Project, an effort I co-led exploring how advances in AI tools for protein design might impact biosecurity.

These “patches,” akin to those in cybersecurity, have now been shared with organizations globally to strengthen biosecurity screening.

The project highlights that the same AI tools capable of incredible good can also be misused, requiring us to be vigilant, thoughtful, and creative so we continue to get the most benefit out of AI tools while working to ensure that we avoid costly misuses.

So things like, how similar is this to that template, wild-type protein structure that we used as our conditioning information?

But I feel like broadly…

2 months, 2 weeks назад @ microsoft.com
When AI Meets Biology: Promise, Risk, and Responsibility
When AI Meets Biology: Promise, Risk, and Responsibility When AI Meets Biology: Promise, Risk, and Responsibility

In computer-based studies, we found that AI protein design (AIPD) tools could generate modified versions of proteins of concern, such as ricin.

Azure AI Foundry Labs Get a glimpse of potential future directions for AI, with these experimental technologies from Microsoft Research.

Stratified tiers of information : Data and code are classified into several tiers according to their potential hazard, from low-risk summaries through sensitive technical data to critical software pipelines.

The Age of AI in the Life Sciences: Benefits and Biosecurity Considerations, National Academies of Science, Engineering, and Medicine, 2025.

(opens in new tab)Protecting scientific integrity in an age of genera…

2 months, 2 weeks назад @ microsoft.com
MIT AI MIT AI
последний пост 3 days назад
MIT in the media: 2025 in review
MIT in the media: 2025 in review MIT in the media: 2025 in review

“At MIT, innovation ranges from awe-inspiring technology to down-to-Earth creativity,” noted Chronicle, during a campus visit this year for an episode of the program.

In 2025, MIT researchers made headlines across print publications, podcasts, and video platforms for key scientific advances, from breakthroughs in quantum and artificial intelligence to new efforts aimed at improving pediatric health care and cancer diagnosis.

Neha Narula, director of the MIT Digital Currency Initiative, examines the future of cash as the use of digital currencies expands.

Full story via Michigan Farm NewsBug-sized robots could help pollination on future farmsInsect-sized robots crafted by MIT researchers cou…

3 days назад @ news.mit.edu
Guided learning lets “untrainable” neural networks realize their potential
Guided learning lets “untrainable” neural networks realize their potential Guided learning lets “untrainable” neural networks realize their potential

Remarkably, even untrained networks contain architectural biases that can be transferred, while trained guides additionally convey learned patterns.

This result underscores a key insight: Untrained networks already encode valuable architectural biases that can steer other networks toward effective learning.

By aligning with a guide network, it’s possible to separate the contributions of architectural biases from those of learned knowledge.

By revealing the hidden potential of even the most stubborn networks, guidance provides a powerful new tool for understanding — and hopefully shaping — the foundations of machine learning.

Remarkably, the authors show this can be done using small, untrain…

1 week назад @ news.mit.edu
A new way to increase the capabilities of large language models
A new way to increase the capabilities of large language models A new way to increase the capabilities of large language models

Kim’s co-authors include lead author Songlin Yang, an EECS graduate student and former MIT-IBM Watson AI Lab Summer Program intern; Kaiyue Wen of Stanford University; Liliang Ren of Microsoft; and Yikang Shen, Shawn Tan, Mayank Mishra, and Rameswar Panda of IBM Research and the MIT-IBM Watson AI Lab.

The cumulative effect lets the system model how the meaning changes along the path between words, not just how far apart they are.

PaTH Attention improved perplexity and outcompeted other methods on reasoning benchmarks it wasn’t trained on.

PaTH Attention consistently proved capable of content-awareness.

In this way, PaTH Attention extends the expressive power of transformer architectures.

1 week назад @ news.mit.edu
A “scientific sandbox” lets researchers explore the evolution of vision systems
A “scientific sandbox” lets researchers explore the evolution of vision systems A “scientific sandbox” lets researchers explore the evolution of vision systems

This framework could enable scientists to probe “what-if” questions about vision systems that are difficult to study experimentally.

Building a scientific sandboxThe paper began as a conversation among the researchers about discovering new vision systems that could be useful in different fields, like robotics.

Over many generations, agents evolve different elements of vision systems that maximize rewards.

Testing hypothesesWhen the researchers set up experiments in this framework, they found that tasks had a major influence on the vision systems the agents evolved.

In the future, the researchers want to use this simulator to explore the best vision systems for specific applications, which c…

1 week, 1 day назад @ news.mit.edu
“Robot, make me a chair”
“Robot, make me a chair” “Robot, make me a chair”

The researchers tackled these challenges using a vision-language model (VLM), a powerful generative AI model that has been pre-trained to understand images and text.

By serving as both the eyes and brain of the robot, the VLM enables the robot to do this,” Kyaw says.

A user prompts the system with text, perhaps by typing “make me a chair,” and gives it an AI-generated image of a chair to start.

For instance, the model can determine that the seat and backrest should have panels to have surfaces for someone sitting and leaning on the chair.

Then the VLM chooses the labels that correspond to the geometric parts of the chair that should receive panels on the 3D mesh to complete the design.

1 week, 2 days назад @ news.mit.edu
3 Questions: Using computation to study the world’s best single-celled chemists
3 Questions: Using computation to study the world’s best single-celled chemists 3 Questions: Using computation to study the world’s best single-celled chemists

Q: What drew you to research microbes in extreme environments, and what are the challenges in studying them?

I wanted to be an astronaut growing up, and the closest thing to astrobiology is examining extreme environments on Earth.

And the only thing that lives in those extreme environments are microbes.

My latest work is genomic language modeling.

A genomic language model is technically a large language model, except the language is DNA as opposed to human language.

1 week, 3 days назад @ news.mit.edu
Working to eliminate barriers to adopting nuclear energy
Working to eliminate barriers to adopting nuclear energy Working to eliminate barriers to adopting nuclear energy

What if there were a way to solve one of the most significant obstacles to the use of nuclear energy — the disposal of high-level nuclear waste (HLW)?

Such a move would be especially important for the public’s acceptance of nuclear energy.

“We’re reframing the problem of nuclear waste, transforming it from a liability to an energy source,” Sarsenbayev says.

The nuances of nuclearSarsenbayev had to do a bit of reframing himself in how he perceived nuclear energy.

Removing the bottleneck for nuclear energy adoption by producing carbon-free power and ensuring the safe disposal of radioactive waste.

1 week, 3 days назад @ news.mit.edu
Deep-learning model predicts how fruit flies form, cell by cell
Deep-learning model predicts how fruit flies form, cell by cell Deep-learning model predicts how fruit flies form, cell by cell

In a study appearing today in the journal Nature Methods, the team presents a new deep-learning model that learns, then predicts, how certain geometric properties of individual cells will change as a fruit fly develops.

The team applied the model to videos of developing fruit fly embryos, each of which starts as a cluster of about 5,000 cells.

As a proof of principle, the team trained the new model to “learn” how individual cells change over time during fruit fly gastrulation.

“The overall shape of the fruit fly at this stage is roughly an ellipsoid, but there are gigantic dynamics going on at the surface during gastrulation,” Guo says.

What’s more, the videos contain labels of individual c…

1 week, 3 days назад @ news.mit.edu
Enabling small language models to solve complex reasoning tasks
Enabling small language models to solve complex reasoning tasks Enabling small language models to solve complex reasoning tasks

As language models (LMs) improve at tasks like image generation, trivia questions, and simple math, you might think that human-like reasoning is around the corner.

Small LMs can’t do this reliably on their own; large language models (LLMs) sometimes can, particularly if they’re optimized for reasoning tasks, but they take a while to respond, and they use a lot of computing power.

Then, the LLM relays these instructions and guidelines in a clear way to smaller models.

For instance, whereas existing reasoning models like OpenAI’s o1 perform reasoning in text, DisCIPL “reasons” by writing Python code, which is more compact.

DisCIPL’s efficiency gains stem partly from using small Llama models a…

1 week, 6 days назад @ news.mit.edu
New MIT program to train military leaders for the AI age
New MIT program to train military leaders for the AI age New MIT program to train military leaders for the AI age

Artificial intelligence can enhance decision-making and enable action with reduced risk and greater precision, making it a critical tool for national security.

“The potential for artificial intelligence is just starting to be fully realized.

The 2N6 curriculum is application focused, and the content is built to satisfy the U.S. Navy’s sub-specialty code for Applied Artificial Intelligence.

“The admiral made the connection, envisioning an applied AI program similar to 2N.”2N6 will run as a pilot program for at least two years.

The program’s first cohort will comprise only U.S. Navy officers, with plans to expand more broadly.

1 week, 6 days назад @ news.mit.edu
New method improves the reliability of statistical estimations
New method improves the reliability of statistical estimations New method improves the reliability of statistical estimations

Let’s say an environmental scientist is studying whether exposure to air pollution is associated with lower birth weights in a particular county.

In simulations and experiments with real data, their method was the only technique that consistently generated accurate confidence intervals.

Finally, they assume the source data are similar to the target data where one wants to estimate.

A smooth solutionThe new method for generating confidence intervals explicitly accounts for this potential bias.

When they compared their method to other common techniques, they found it was the only one that could consistently produce reliable confidence intervals for spatial analyses.

1 week, 6 days назад @ news.mit.edu
New materials could boost the energy efficiency of microelectronics
New materials could boost the energy efficiency of microelectronics New materials could boost the energy efficiency of microelectronics

MIT researchers have developed a new fabrication method that could enable the production of more energy efficient electronics by stacking multiple functional components on top of one existing circuit.

This new electronics integration platform allows scientists to fabricate transistors and memory devices in one compact stack on a semiconductor chip.

Stacking active components would reduce the distance data must travel and improve a chip’s energy efficiency.

These compact memory transistors demonstrated switching speeds of only 10 nanoseconds, hitting the limit of the team’s measurement instruments.

In the future, they want to build upon these demonstrations by integrating back-end memory tra…

2 weeks назад @ news.mit.edu
MIT affiliates named 2025 Schmidt Sciences AI2050 Fellows
MIT affiliates named 2025 Schmidt Sciences AI2050 Fellows MIT affiliates named 2025 Schmidt Sciences AI2050 Fellows

Two current MIT affiliates and seven additional alumni are among those named to the 2025 cohort of AI2050 Fellows.

Zongyi Li, a postdoc in the MIT Computer Science and Artificial Intelligence Lab, and Tess Smidt ’12, an associate professor of electrical engineering and computer science (EECS), were both named as AI2050 Early Career Fellows.

He received his PhD in computing and mathematical sciences from Caltech, where he was advised by Anima Anandkumar and Andrew Stuart.

Li's work has been supported by a Kortschak Scholarship, PIMCO Fellowship, Amazon AI4Science Fellowship, Nvidia Fellowship, and MIT-Novo Nordisk AI Fellowship.

Besides the AI2050 fellowship, she has received an Air Force Yo…

2 weeks, 3 days назад @ news.mit.edu
MIT researchers “speak objects into existence” using AI and robotics
MIT researchers “speak objects into existence” using AI and robotics MIT researchers “speak objects into existence” using AI and robotics

“We’re connecting natural language processing, 3D generative AI, and robotic assembly,” says Alexander Htet Kyaw, an MIT graduate student and Morningside Academy for Design (MAD) fellow.

Generative AI and robotics are moving us ever closer to the day when we can ask for an object and have it created within a few minutes.

In fact, MIT researchers have developed a speech-to-reality system, an AI-driven workflow that allows them to provide input to a robotic arm and “speak objects into existence,” creating things like furniture in as little as five minutes.

This is followed by creation of a feasible assembly sequence and automated path planning for the robotic arm to assemble physical objects …

2 weeks, 6 days назад @ news.mit.edu
Robots that spare warehouse workers the heavy lifting
Robots that spare warehouse workers the heavy lifting Robots that spare warehouse workers the heavy lifting

The company’s unloading robots combine generative AI and machine-learning algorithms with sensors, cameras, and machine-vision software to navigate new environments on day one and improve performance over time.

The Pickle Robot Company wants its machines to do the heavy lifting.

The robots can unload anywhere from 400 to 1,500 cases per hour depending on size and weight.

“Our immediate product roadmap is load and unload,” Meyer says.

What does it mean for the robot unloading a truck to talk to the robot palletizing, or for the forklift to talk to the inventory drone?

2 weeks, 6 days назад @ news.mit.edu
Berkeley AI
последний пост 1 month, 3 weeks назад
RL without TD learning
RL without TD learning RL without TD learning

RL without TD learningIn this post, I’ll introduce a reinforcement learning (RL) algorithm based on an “alternative” paradigm: divide and conquer.

We can do Reinforcement Learning (RL) based on divide and conquer, instead of temporal difference (TD) learning.

There are two classes of algorithms in RL: on-policy RL and off-policy RL.

We compared TRL with $n$-step TD learning with different values of $n$, from $1$ (pure TD) to $\infty$ (pure MC).

I still think one of the most important problems in RL (and even in machine learning) is to find a scalable off-policy RL algorithm.

1 month, 3 weeks назад @ bair.berkeley.edu
What exactly does word2vec learn?
What exactly does word2vec learn? What exactly does word2vec learn?

What exactly does word2vec learn?

What exactly does word2vec learn, and how?

In this framing, it’s clear that word2vec is a minimal neural language model.

As a result, the theory predicts exactly what features are learned in terms of the corpus statistics and the algorithmic hyperparameters.

We find that over the course of learning, word2vec builds these linear representations in a sequence of noisy learning steps, and their geometry is well-described by a spiked random matrix model.

3 months, 3 weeks назад @ bair.berkeley.edu
Whole-Body Conditioned Egocentric Video Prediction
Whole-Body Conditioned Egocentric Video Prediction Whole-Body Conditioned Egocentric Video Prediction

Whole-Body Conditioned Egocentric Video Prediction×Predicting Ego-centric Video from human Actions (PEVA).

We trained a model to Predict Ego-centric Video from human Actions (PEVA) for Whole-Body-Conditioned Egocentric Video Prediction.

We train an autoregressive conditional diffusion transformer on Nymeria, a large-scale dataset pairing real-world egocentric video with body pose capture.

We include some samples here:Body Movement Actions Move Forward Rotate Left Rotate Right Left Hand Actions Move Left Hand Up Move Left Hand Down Move Left Hand Left Move Left Hand Right Right Hand Actions Move Right Hand Up Move Right Hand Down Move Right Hand Left Move Right Hand RightLong RolloutHere you…

5 months, 3 weeks назад @ bair.berkeley.edu
Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign)
Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign) Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign)

Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign)Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated applications.

To mitigate the imminent prompt injection threat, we propose two fine-tuning-defenses, StruQ and SecAlign.

Prompt Injection Attack: CausesBelow is the threat model of prompt injection attacks.

Prompt injection threat model in LLM-integrated applicationsWe propose that prompt injection has two causes.

Below are resources to learn more and keep updated on prompt injection attacks and defenses.

8 months, 2 weeks назад @ bair.berkeley.edu
Repurposing Protein Folding Models for Generation with Latent Diffusion
Repurposing Protein Folding Models for Generation with Latent Diffusion Repurposing Protein Folding Models for Generation with Latent Diffusion

Repurposing Protein Folding Models for Generation with Latent DiffusionPLAID is a multimodal generative model that simultaneously generates protein 1D sequence and 3D structure, by learning the latent space of protein folding models.

In PLAID, we develop a method that learns to sample from the latent space of protein folding models to generate new proteins.

Unlike many previous protein structure generative models, PLAID addresses the multimodal co-generation problem setting: simultaneously generating both discrete sequence and continuous all-atom structural coordinates.

In this way, we can use structural understanding information in the weights of pretrained protein folding models for the p…

8 months, 3 weeks назад @ bair.berkeley.edu
Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment
Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment

Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway DeploymentTraining Diffusion Models with Reinforcement LearningWe deployed 100 reinforcement learning (RL)-controlled cars into rush-hour highway traffic to smooth congestion and reduce fuel consumption for everyone.

The challenges of phantom jamsA stop-and-go wave moving backwards through highway traffic.

Smoothing behavior of RL AVs.

Overall, the steps towards deployment involved:Training in data-driven simulations: We used highway traffic data from I-24 to create a training environment with realistic wave dynamics, then validate the trained agent’s performance and robustness in a variety of new traffic scenarios.…

9 months назад @ bair.berkeley.edu
AWS Machine Learning AWS Machine Learning
последний пост 1 day, 5 hours назад
Programmatically creating an IDP solution with Amazon Bedrock Data Automation
Programmatically creating an IDP solution with Amazon Bedrock Data Automation Programmatically creating an IDP solution with Amazon Bedrock Data Automation

Today, we explore how to programmatically create an IDP solution that uses Strands SDK, Amazon Bedrock AgentCore, Amazon Bedrock Knowledge Base, and Bedrock Data Automation (BDA).

Amazon Bedrock Data Automation can be used as a standalone feature or as a parser when setting up a knowledge base for Retrieval-Augmented Generation (RAG) workflows.

Amazon Bedrock AgentCore is a fully managed service that allows you to build and configure autonomous agents.

Here’s an overview of how you can setup Bedrock Knowledge Bases with data automation as a parser with Bedrock AgentCore.

With Amazon Bedrock Data Automation, we can enhance the RAG experience for more complex data formats including visual ric…

1 day, 5 hours назад @ aws.amazon.com
AI agent-driven browser automation for enterprise workflow management
AI agent-driven browser automation for enterprise workflow management AI agent-driven browser automation for enterprise workflow management

This workflow demonstrates the full capabilities of AI-powered browser automation, from initial navigation through complex decision-making to human-in-the-loop intervention.

Amazon Bedrock AgentCore Browser provides a secure, cloud-based browser that enables the AI agent (Amazon Nova Act and Strands agent in this case) to interact with websites.

ConclusionAI agent-driven browser automation represents a fundamental shift in how enterprises approach workflow management.

Veda Raman is a Sr Solutions Architect for Generative AI for Amazon Nova and Agentic AI at AWS.

She helps customers design and build Agentic AI solutions using Amazon Nova models and Bedrock AgentCore.

1 day, 5 hours назад @ aws.amazon.com
Agentic QA automation using Amazon Bedrock AgentCore Browser and Amazon Nova Act
Agentic QA automation using Amazon Bedrock AgentCore Browser and Amazon Nova Act Agentic QA automation using Amazon Bedrock AgentCore Browser and Amazon Nova Act

In this post, we explore how agentic QA automation addresses these challenges and walk through a practical example using Amazon Bedrock AgentCore Browser and Amazon Nova Act to automate testing for a sample retail application.

Benefits of agentic QA testingAgentic AI shifts QA testing from rule-based automation to intelligent, autonomous testing systems.

AgentCore Browser for large-scale agentic QA testingTo realize the potential of agentic AI testing at enterprise scale, organizations need robust infrastructure that can support intelligent, autonomous testing agents.

Agentic QA with the Amazon Nova Act SDKThe infrastructure capabilities of AgentCore Browser become truly powerful when combi…

1 day, 5 hours назад @ aws.amazon.com
Optimizing LLM inference on Amazon SageMaker AI with BentoML’s LLM- Optimizer
Optimizing LLM inference on Amazon SageMaker AI with BentoML’s LLM- Optimizer Optimizing LLM inference on Amazon SageMaker AI with BentoML’s LLM- Optimizer

This post illustrates this process by finding an optimal deployment for a Qwen-3-4B model on an Amazon SageMaker AI endpoint.

This makes deployment & inference straightforward, or an interactive development environment (IDE) such as PyCharm or Visual Studio Code.

For a complete end-to-end sample on deploying an LMI container for real time inference on SageMaker AI, refer to this example.

By combining BentoML’s LLM-Optimizer with Amazon SageMaker AI, organizations can now move from hypothesis to deployment through a data-driven, automated optimization loop.

With BentoML’s LLM-Optimizer and Amazon SageMaker AI, that balance can be discovered systematically, reproduced consistently, and deploy…

1 day, 5 hours назад @ aws.amazon.com
Exploring the zero operator access design of Mantle
Exploring the zero operator access design of Mantle Exploring the zero operator access design of Mantle

This has been central to our business from the start, and it was particularly in focus from the earliest days of Amazon Bedrock.

Model providers have no mechanism to access customer data, because inferencing is done only within the Amazon Bedrock-owned account that model providers don’t have access to.

Following the approach of the AWS Nitro System, we have designed Mantle from the ground up to be zero operator access (ZOA), where we have intentionally excluded any technical means for AWS operators to access customer data.

Interactive communication tools like Secure Shell (SSH), AWS Systems Manager Session Manager, and serial consoles aren’t installed anywhere in Mantle.

Throughout the enti…

2 days назад @ aws.amazon.com
AWS AI League: Model customization and agentic showdown
AWS AI League: Model customization and agentic showdown AWS AI League: Model customization and agentic showdown

The AWS AI League provides an innovative program to help enterprises overcome the challenges of building advanced AI capabilities through exciting competitions that drive innovation in agentic AI and model customization.

In 2025, the first AWS AI League competition captured the attention of developers, data scientists, and business leaders globally.

The AWS AI League experience begins with a hands-on, 2-hour workshop led by AWS experts, followed by self-paced experimentation.

ConclusionIn this post, we explored the new AWS AI League challenges and how they are transforming how organizations approach AI development.

To learn more about hosting an AWS AI League within your organization visit …

2 days, 4 hours назад @ aws.amazon.com
Accelerate Enterprise AI Development using Weights & Biases and Amazon Bedrock AgentCore
Accelerate Enterprise AI Development using Weights & Biases and Amazon Bedrock AgentCore Accelerate Enterprise AI Development using Weights & Biases and Amazon Bedrock AgentCore

In this post, we demonstrate how to use Foundation Models (FMs) from Amazon Bedrock and the newly launched Amazon Bedrock AgentCore alongside W&B Weave to help build, evaluate, and monitor enterprise AI solutions.

Tracking Amazon Bedrock FMs with W&B Weave SDKW&B Weave integrates seamlessly with Amazon Bedrock through Python and TypeScript SDKs.

Experimenting with Amazon Bedrock FMs in W&B Weave PlaygroundThe W&B Weave Playground accelerates prompt engineering with an intuitive interface for testing and comparing Bedrock models.

When working with AgentCore and W&B Weave together, teams can use AgentCore’s built-in operational monitoring and security foundations while also using W&B Weave if…

2 days, 5 hours назад @ aws.amazon.com
How dLocal automated compliance reviews using Amazon Quick Automate
How dLocal automated compliance reviews using Amazon Quick Automate How dLocal automated compliance reviews using Amazon Quick Automate

dLocal decided to partner with AWS to implement Amazon Quick Automate, a capability of Amazon Quick Suite, as one of the service’s earliest adopters.

Using Quick Automate, dLocal automated its merchant compliance website review process, enabling large-scale, efficient, and consistent policy enforcement.

Through specialized AI agents, Quick Automate helps organizations automate complex processes across applications and departments while reducing operational costs through usage-based pricing.

Automated website analysis using the UI AgentThe UI Agent is a feature in Quick Automate to automate complex browser-based actions based on natural language instructions.

We integrated Amazon Quick Autom…

2 days, 5 hours назад @ aws.amazon.com
Advancing ADHD diagnosis: How Qbtech built a mobile AI assessment Model Using Amazon SageMaker AI
Advancing ADHD diagnosis: How Qbtech built a mobile AI assessment Model Using Amazon SageMaker AI Advancing ADHD diagnosis: How Qbtech built a mobile AI assessment Model Using Amazon SageMaker AI

The assessment and diagnosis of attention deficit hyperactive disorder (ADHD) has traditionally relied on clinical observations and behavioral evaluations.

Qbtech developed and deployed a model that efficiently processes data from smartphone cameras, motion sensors, and test results.

Building the artificial intelligence (AI) model: From raw data to clinical insightsQbtech’s approach to mobile ADHD assessment utilizes machine learning techniques to process and analyze multiple data streams simultaneously.

The team selected Binary LightGBM as their primary algorithm for the ADHD assessment model.

Clinical impact: Comparative clinical performanceThe clinical validation of QbMobile against Qbte…

2 days, 5 hours назад @ aws.amazon.com
Accelerating your marketing ideation with generative AI – Part 1: From idea to generation with the Amazon Nova foundation models
Accelerating your marketing ideation with generative AI – Part 1: From idea to generation with the Amazon Nova foundation models Accelerating your marketing ideation with generative AI – Part 1: From idea to generation with the Amazon Nova foundation models

Composed of four models: Amazon Nova Micro, Amazon Nova Lite, Amazon Nova Pro, Amazon Nova Premier.

Composed of four models: Amazon Nova Micro, Amazon Nova Lite, Amazon Nova Pro, Amazon Nova Premier.

Integrated by two models: Amazon Nova Canvas (image generation) and Amazon Nova Reel (video generation).

Integrated by two models: Amazon Nova Canvas (image generation) and Amazon Nova Reel (video generation).

You can learn about prompt engineering for Amazon Nova Canvas and Amazon Nova Reel at Image and video prompt engineering for Amazon Nova Canvas and Amazon Nova Reel in the AWS Artificial Intelligence Blog.

2 days, 5 hours назад @ aws.amazon.com
Introducing Visa Intelligent Commerce on AWS: Enabling agentic commerce with Amazon Bedrock AgentCore
Introducing Visa Intelligent Commerce on AWS: Enabling agentic commerce with Amazon Bedrock AgentCore Introducing Visa Intelligent Commerce on AWS: Enabling agentic commerce with Amazon Bedrock AgentCore

Introducing Visa Intelligent Commerce on AWSVisa Intelligent Commerce empowers businesses and developers to build the next generation of agentic payment experiences.

How Amazon Bedrock AgentCore powers these solutionsBefore diving into the specific use cases, it’s important to understand the role Amazon Bedrock AgentCore plays as the foundational infrastructure enabling these agentic commerce experiences.

The value Amazon Bedrock AgentCore adds:The core of this solution is Amazon Bedrock AgentCore Runtime, a secure, serverless hosting environment purpose-built for AI agents and MCP servers.

Amazon Bedrock AgentCore Memory maintains long-duration context over extended, multistep journeys lik…

2 days, 5 hours назад @ aws.amazon.com
Move Beyond Chain-of-Thought with Chain-of-Draft on Amazon Bedrock
Move Beyond Chain-of-Thought with Chain-of-Draft on Amazon Bedrock Move Beyond Chain-of-Thought with Chain-of-Draft on Amazon Bedrock

The key innovation of CoD lies in its constraint: each reasoning step is limited to five words or less.

For instance, when solving a mathematical word problem, instead of generating full sentences explaining each step, CoD produces concise numerical operations and key logical markers.

Implementation and evaluation on AWSTo evaluate the efficiency of CoD prompting techniques, we run a test in Amazon Bedrock and solve the “Red, Blue, and Green Balls” puzzle using an LLM.

Box 1 is labelled “Red Balls Only.” Box 2 is labelled “Blue Balls Only.” Box 3 is labelled “Red and Blue Balls Only.” The labels on the boxes are all incorrect.

Small language models : CoD underperformed on models with fewer …

3 days, 3 hours назад @ aws.amazon.com
Deploy Mistral AI’s Voxtral on Amazon SageMaker AI
Deploy Mistral AI’s Voxtral on Amazon SageMaker AI Deploy Mistral AI’s Voxtral on Amazon SageMaker AI

Mistral AI’s Voxtral models combine text and audio processing capabilities in a single framework.

In this post, we demonstrate hosting Voxtral models on Amazon SageMaker AI endpoints using vLLM and the Bring Your Own Container (BYOC) approach.

Text-only processing uses the standard chat completion API for traditional conversational AI where audio processing isn’t required.

This post also demonstrates integrating the Voxtral model deployed on SageMaker with Strands Agents to build agentic applications with minimal code.

The following sections provide a complete implementation guide to get your Voxtral model running on SageMaker endpoints.

3 days, 4 hours назад @ aws.amazon.com
Enhance document analytics with Strands AI Agents for the GenAI IDP Accelerator
Enhance document analytics with Strands AI Agents for the GenAI IDP Accelerator Enhance document analytics with Strands AI Agents for the GenAI IDP Accelerator

To address this need, we are announcing Analytics Agent, a new feature that is seamlessly integrated into the GenAI IDP Accelerator.

GenAI IDP AcceleratorThe GenAI IDP Accelerator, an open source solution, helps organizations use generative AI to automatically extract information from various document types.

To learn more about the GenAI IDP Accelerator, see Accelerate intelligent document processing with generative AI on AWS.

The Analytics Agent acts as an intelligent interface between business users and their processed document data.

Analytics AgentThe Analytics Agent is built using Strands Agents, an open source SDK with a model-driven approach for building AI agents.

3 days, 4 hours назад @ aws.amazon.com
Build a multimodal generative AI assistant for root cause diagnosis in predictive maintenance using Amazon Bedrock
Build a multimodal generative AI assistant for root cause diagnosis in predictive maintenance using Amazon Bedrock Build a multimodal generative AI assistant for root cause diagnosis in predictive maintenance using Amazon Bedrock

Predictive maintenance overviewPredictive maintenance can be broken down into two key phases: sensor alarm generation and root cause diagnosis.

This is accomplished using Claude 3 Haiku and Claude 3 Sonnet, powerful multimodal models available through Amazon Bedrock.

This is accomplished using Claude 3 Haiku and Claude 3 Sonnet, powerful multimodal models available through Amazon Bedrock.

This flexibility makes sure that technicians can communicate using the most suitable format for their needs, whether that be an image, audio, or video.

By combining multiple data types (video, audio, and images), the chatbot provides comprehensive diagnostic support, significantly reducing downtime and imp…

3 days, 4 hours назад @ aws.amazon.com
NVIDIA
последний пост 8 часов назад
Make Spirits Bright With Holiday Hits on GeForce NOW
Make Spirits Bright With Holiday Hits on GeForce NOW Make Spirits Bright With Holiday Hits on GeForce NOW

Holiday lights are twinkling, hot cocoa’s on the stove and gamers are settling in for a well-earned break.

Whether staying in or heading on a winter getaway, GeForce NOW makes it easy to keep gaming from anywhere.

With NVIDIA Blackwell RTX power everywhere, Ultimate members can stream even the most graphically demanding adventures at GeForce RTX 5080-power without needing the latest hardware.

For those who want to share their holiday adventures, ARC Raiders offers thrilling teamwork and dramatic battles under electric skies.

Seasonal festivals in Stardew Valley and the rotating holiday events in Fortnite help set a cozy mood in just a few clicks.

8 часов назад @ blogs.nvidia.com
Marine Biological Laboratory Explores Human Memory With AI and Virtual Reality
Marine Biological Laboratory Explores Human Memory With AI and Virtual Reality Marine Biological Laboratory Explores Human Memory With AI and Virtual Reality

The lab in Massachusetts is studying molecular mechanisms of human memory function powered by NVIDIA RTX GPUs, HP Z Workstations and virtual-reality technology.

The works of Plato state that when humans have an experience, some level of change occurs in their brain, which is powered by memory — specifically long-term memory.

The team is studying a small portion of these “leaves” — representing protein markers: an incredibly tedious task due to their length, at about a micrometer each.

A researcher must search through the forest of brain cells to find the correct protein markers, which make up only about 1% of all protein markers in the hippocampus.

Collecting and analyzing enough 3D volumet…

3 days, 6 hours назад @ blogs.nvidia.com
NVIDIA, US Government to Boost AI Infrastructure and R&D Investments Through Landmark Genesis Mission
NVIDIA, US Government to Boost AI Infrastructure and R&D Investments Through Landmark Genesis Mission NVIDIA, US Government to Boost AI Infrastructure and R&D Investments Through Landmark Genesis Mission

NVIDIA and the US Department of Energy outline priorities for collaboration in support of accelerating scientific discovery.

NVIDIA will join the U.S. Department of Energy’s (DOE) Genesis Mission as a private industry partner to keep U.S. AI both the leader and the standard in technology around the world.

The Genesis Mission, which is part of an Executive Order recently signed by President Trump, aims to redefine American leadership in AI across three key areas: energy, scientific research and national security.

Together, these priorities focus on using advanced AI, robotics and high‑performance computing to transform energy, manufacturing and scientific discovery across the Department of E…

1 week назад @ blogs.nvidia.com
Now Generally Available, NVIDIA RTX PRO 5000 72GB Blackwell GPU Expands Memory Options for Desktop Agentic AI
Now Generally Available, NVIDIA RTX PRO 5000 72GB Blackwell GPU Expands Memory Options for Desktop Agentic AI Now Generally Available, NVIDIA RTX PRO 5000 72GB Blackwell GPU Expands Memory Options for Desktop Agentic AI

The NVIDIA RTX PRO 5000 72GB Blackwell GPU is now generally available, bringing robust agentic and generative AI capabilities powered by the NVIDIA Blackwell architecture to more desktops and professionals across the world.

Fueling the Next Generation of AI DevelopmentAs generative AI evolves into complex, multimodal agentic AI, more demand is placed on the hardware required to develop and deploy these technologies.

And for computer-aided engineering and product design, the RTX PRO 5000 72GB offers more than 2x graphics performance.

As industries race to integrate AI into every facet of operation — from generative design to coding copilots — RTX PRO 5000 72GB is equipped to meet the moment.…

1 week назад @ blogs.nvidia.com
Deck the Vaults: ‘Fallout: New Vegas’ Joins the Cloud This Holiday Season
Deck the Vaults: ‘Fallout: New Vegas’ Joins the Cloud This Holiday Season Deck the Vaults: ‘Fallout: New Vegas’ Joins the Cloud This Holiday Season

The ‘Fallout’ series leads this week’s five new games, along with ‘Hogwarts Legacy’ and ‘LEGO Harry Potter,’ for a magical lineup this GFN Thursday.

To mark the occasion, GeForce NOW members can claim Fallout 3 and Fallout 4 as special rewards, completing a wasteland-ready trilogy in the cloud.

Have Yourself a Merry Little ‘Fallout’Bethesda’s Fallout: New Vegas is rolling onto GeForce NOW with a suitcase full of wasteland wit.

To sweeten the deal, GeForce NOW Ultimate members can claim a stack of classics — including Fallout 3 and Fallout 4 — while supplies last.

In Hogwarts Legacy, players can chart their own path as a fifth-year student at Hogwarts in the 1800s.

1 week назад @ blogs.nvidia.com
Migrate Apache Spark Workloads to GPUs at Scale on Amazon EMR with Project Aether
Migrate Apache Spark Workloads to GPUs at Scale on Amazon EMR with Project Aether Migrate Apache Spark Workloads to GPUs at Scale on Amazon EMR with Project Aether

Users can use the services provided to migrate existing EMR CPU Spark workloads to GPUs.

Configure Aether for EMROnce the Aether package is installed, configure the Aether client for the EMR platform using the following commands:# Initialize and list config $ aether config init $ aether config list # Select EMR platform and region $ aether config set core.selected_platform emr $ aether config set platform.emr.region # Set required EMR s3 paths $ aether config set platform.emr.spark_event_log_dir $ aether config set platform.emr.cluster.artifacts_path $ aether config set platform.emr.cluster.log_path Example Aether EMR migration workflowThe Aether CLI tool provides several modular command…

1 week, 1 day назад @ developer.nvidia.com
Solving Large-Scale Linear Sparse Problems with NVIDIA cuDSS
Solving Large-Scale Linear Sparse Problems with NVIDIA cuDSS Solving Large-Scale Linear Sparse Problems with NVIDIA cuDSS

Hybrid memory mode—blurring the line between CPU and GPUcuDSS hybrid memory mode is designed to overcome the memory limitations of a single GPU when solving extremely large sparse linear problems by using the GPU and CPU memories.

Hybrid memory mode is not on by default, so the first step to enable it is to call the function cudssConfigSet() to set CUDSS_CONFIG_HYBRID_MODE, which tells cuDSS to use hybrid memory mode.

The first cuDSS function, called cudssConfigSet() , enables hybrid memory mode before calling the first analysis step, symbolic factorization.

This is followed by using cudssDataGet() to find the minimal amount of device memory sufficient for hybrid memory mode.

NVIDIA cuDSS p…

1 week, 1 day назад @ developer.nvidia.com
Into the Omniverse: OpenUSD and NVIDIA Halos Accelerate Safety for Robotaxis, Physical AI Systems
Into the Omniverse: OpenUSD and NVIDIA Halos Accelerate Safety for Robotaxis, Physical AI Systems Into the Omniverse: OpenUSD and NVIDIA Halos Accelerate Safety for Robotaxis, Physical AI Systems

New NVIDIA safety frameworks and technologies are advancing how developers build safe physical AI.

To align these advances with rigorous global standards, the NVIDIA Halos AI Systems Inspection Lab — accredited by ANAB — provides impartial inspection and certification of Halos elements across robotaxi fleets, AV stacks, sensors and manufacturer platforms through the Halos Certification Program.

AV Ecosystem Leaders Putting Physical AI Safety to WorkBosch, Nuro and Wayve are among the first participants in the NVIDIA Halos AI Systems Inspection Lab, which aims to accelerate the safe, large-scale deployment of robotaxi fleets.

Onsemi, which makes sensor systems for AVs, industrial automation …

1 week, 1 day назад @ blogs.nvidia.com
UC San Diego Lab Advances Generative AI Research With NVIDIA DGX B200 System
UC San Diego Lab Advances Generative AI Research With NVIDIA DGX B200 System UC San Diego Lab Advances Generative AI Research With NVIDIA DGX B200 System

The Hao AI Lab research team at the University of California San Diego — at the forefront of pioneering AI model innovation — recently received an NVIDIA DGX B200 system to elevate their critical work in large language model inference.

How Is Hao AI Lab Using the DGX B200?

With the DGX B200 now fully accessible to the Hao AI Lab and broader UC San Diego community at the School of Computing, Information and Data Sciences’ San Diego Supercomputer Center, the research opportunities are boundless.

The research phase of FastVideo taps into NVIDIA H200 GPUs in addition to the DGX B200 system.

Learn more about the NVIDIA DGX B200 system.

1 week, 1 day назад @ blogs.nvidia.com
AI Factories, Physical AI, and Advances in Models, Agents, and Infrastructure That Shaped 2025
AI Factories, Physical AI, and Advances in Models, Agents, and Infrastructure That Shaped 2025 AI Factories, Physical AI, and Advances in Models, Agents, and Infrastructure That Shaped 2025

2025 was another milestone year for developers and researchers working with NVIDIA technologies.

Progress in data center power and compute design, AI infrastructure, model optimization, open models, AI agents, and physical AI redefined how intelligent systems are trained, deployed, and moved into the real world.

Looking aheadStay tuned for more transformative innovations in 2026.

Subscribe to the Developer Newsletter and stay in the loop on content tailored to your interests.

Follow us on Instagram, LinkedIn, Twitter, YouTube, and Discord for the latest developer news.

1 week, 2 days назад @ developer.nvidia.com
Reducing CUDA Binary Size to Distribute cuML on PyPI
Reducing CUDA Binary Size to Distribute cuML on PyPI Reducing CUDA Binary Size to Distribute cuML on PyPI

One of the biggest challenges has been managing the binary size of our CUDA C++ libraries, which affects user experience as well as the ability to pip install from PyPI.

PyPI limits binary size to keep costs for the Python Software Foundation (PSF) under control and protect users from downloading unexpectedly large binaries.

The complexity of the cuML library has historically required a larger binary than PyPI could host, but we’ve worked closely with PSF to overcome this by reducing binary size.

This post walks you through the new pip install path for cuML and a tutorial on the steps the team used to drop the CUDA C++ library binary size, which enabled the availability of cuML wheels on Py…

1 week, 3 days назад @ developer.nvidia.com
NVIDIA GPU-Accelerated Sirius Achieves Record-Setting ClickBench Record
NVIDIA GPU-Accelerated Sirius Achieves Record-Setting ClickBench Record NVIDIA GPU-Accelerated Sirius Achieves Record-Setting ClickBench Record

NVIDIA is partnering with the University of Wisconsin-Madison to bring GPU-accelerated analytics to DuckDB through the open-source Sirius engine.

This blog post outlines the Sirius architecture and demonstrates how it achieved record-breaking performance on ClickBench, a widely used analytics benchmark.

Sirius Query on CPU and GPUsAs illustrated in Figure 2, the process begins when Sirius receives an already optimized query plan from DuckDB’s internal format, ensuring robust logical and physical optimizations are preserved.

In benchmarks like ClickBench, Sirius can cache frequently accessed tables on the GPU, accelerating repeated query execution.

ClickBench cost and relative runtimeFigure …

1 week, 3 days назад @ developer.nvidia.com
NVIDIA Acquires Open-Source Workload Management Provider SchedMD
NVIDIA Acquires Open-Source Workload Management Provider SchedMD NVIDIA Acquires Open-Source Workload Management Provider SchedMD

NVIDIA today announced it has acquired SchedMD — the leading developer of Slurm, an open-source workload management system for high-performance computing (HPC) and AI — to help strengthen the open-source software ecosystem and drive AI innovation for researchers, developers and enterprises.

NVIDIA will continue to develop and distribute Slurm as open-source, vendor-neutral software, making it widely available to and supported by the broader HPC and AI community across diverse hardware and software environments.

HPC and AI workloads involve complex computations running parallel tasks on clusters that require queuing, scheduling and allocating computational resources.

As HPC and AI clusters g…

1 week, 3 days назад @ blogs.nvidia.com
How to Fine-Tune an LLM on NVIDIA GPUs With Unsloth
How to Fine-Tune an LLM on NVIDIA GPUs With Unsloth How to Fine-Tune an LLM on NVIDIA GPUs With Unsloth

Another powerful starting point for fine-tuning is the just-announced NVIDIA Nemotron 3 family of open models, data and libraries.

Teaching AI New TricksFine-tuning is like giving an AI model a focused training session.

Check out some of these Unsloth guides:Learn how to install Unsloth on NVIDIA DGX Spark.

#ICYMI — The Latest Advancements in NVIDIA RTX AI PCs🚀 FLUX.2 Image-Generation Models Now Released, Optimized for NVIDIA RTX GPUsThe new models from Black Forest Labs are available in FP8 quantizations that reduce VRAM and increase performance by 40%.

Plug in to NVIDIA AI PC on Facebook, Instagram, TikTok and X — and stay informed by subscribing to the RTX AI PC newsletter.

1 week, 3 days назад @ blogs.nvidia.com
How to Train Scientific Agents with Reinforcement Learning
How to Train Scientific Agents with Reinforcement Learning How to Train Scientific Agents with Reinforcement Learning

One developer using NeMo Gym and NeMo RL is Edison Scientific, which is working on automating scientific discovery.

Through RL, scientific agents can compose skills learned in pre-training and SFT to build new workflows and achieve specific scientific goals.

NeMo Gym in practice: Training scientific reasoning agents at Edison ScientificEdison Scientific is using NeMo Gym and NeMo RL to scale AI agents that automate scientific discovery.

High-level architecture of NeMo Gym and NeMo RL for training reinforcement learning agents for scientific tasks.

This sketch illustrates how NeMo Gym-managed training environments connect to NeMo RL training infrastructure to train agents that will support d…

1 week, 3 days назад @ developer.nvidia.com
Facebook
последний пост 6 days, 4 hours назад
DrP: Meta’s Root Cause Analysis Platform at Scale
DrP: Meta’s Root Cause Analysis Platform at Scale DrP: Meta’s Root Cause Analysis Platform at Scale

DrP’s key components include:Expressive SDK : The DrP SDK allows engineers to codify investigation workflows into analyzers.

Post-processing system : After an investigation, the post-processing system can take automated actions based on the analysis results.

Bootstrap code : The DrP SDK provides bootstrap code to create a template analyzer with pre-populated boilerplate code.

Data access and analysis : The SDK includes libraries for data access and analysis, such as dimension analysis and time series correlation.

This provides immediate analysis results to on-call engineers.

6 days, 4 hours назад @ engineering.fb.com
How AI Is Transforming the Adoption of Secure-by-Default Mobile Frameworks
How AI Is Transforming the Adoption of Secure-by-Default Mobile Frameworks How AI Is Transforming the Adoption of Secure-by-Default Mobile Frameworks

Generative AI and automation accelerate the adoption of secure frameworks at scale, enabling consistent security enforcement and efficient migration across Meta’s vast codebase.

How We Design Secure-by-Default Frameworks at MetaDesigning secure-by-default frameworks for use by a large number of developers shipping vastly different features across multiple apps is an interesting challenge.

There shouldn’t be one security framework that covers all security issues, and not every security issue is general enough to deserve its own framework.

Now that we’ve looked at the design philosophy behind our frameworks, let’s look at one of our most widely used Android security frameworks, SecureLinkLaun…

1 week, 3 days назад @ engineering.fb.com
Zoomer: Powering AI Performance at Meta’s Scale Through Intelligent Debugging and Optimization
Zoomer: Powering AI Performance at Meta’s Scale Through Intelligent Debugging and Optimization Zoomer: Powering AI Performance at Meta’s Scale Through Intelligent Debugging and Optimization

Zoomer has delivered training time reductions, and significant QPS improvements, making it the de-facto tool for AI performance optimization across Meta’s entire AI infrastructure.

Zoomer is Meta’s automated, one-stop-shop platform for performance profiling, debugging, analysis, and optimization of AI training and inference workloads.

AI Performance Optimization Using ZoomerZoomer is an automated debugging and optimization platform that works across all of our AI model types (ads recommendations, GenAI, computer vision, etc.)

Memory Analysis : Comprehensive analysis of GPU memory usage patterns, allocation tracking, and leak detection.

Realtime Memory Profiling : GPU memory allocation track…

1 month назад @ engineering.fb.com
Open Source Is Good for the Environment
Open Source Is Good for the Environment Open Source Is Good for the Environment

But have you heard about open hardware?

And did you know open source can have a positive impact on the environment?

On this episode of the Meta Tech Podcast, Pascal Hartig sits down with Dharmesh and Lisa to talk about all things open hardware, and Meta’s biggest announcements from the 2025 Open Compute Project (OCP) Summit – including a new open methodology for leveraging AI to understand Scope 3 emissions.

You’ll also hear how AI and open hardware are helping Meta push to achieve net zero emissions in 2030, including how AI is being used to develop new concrete mixes for data center construction.

And if you’re interested in learning more about career opportunities at Meta visit the Meta C…

1 month, 1 week назад @ engineering.fb.com
Meta’s Generative Ads Model (GEM): The Central Brain Accelerating Ads Recommendation AI Innovation
Meta’s Generative Ads Model (GEM): The Central Brain Accelerating Ads Recommendation AI Innovation Meta’s Generative Ads Model (GEM): The Central Brain Accelerating Ads Recommendation AI Innovation

We’re sharing details about Meta’s Generative Ads Recommendation Model (GEM), a new foundation model that delivers increased ad performance and advertiser ROI by enhancing other ads recommendation models’ ability to serve relevant ads.

GEM propagates its learnings, leveraging a suite of post-training techniques across the entire ads model fleet, enabling a paradigm shift in Meta’s Ads Recommendation system.

GEM leverages enhanced training scalability that efficiently utilizes thousands of GPUs for building and iterating an LLM-scale ads foundation model.

The Generative Ads Recommendation Model (GEM) is Meta’s most advanced ads foundation model, built on an LLM-inspired paradigm and trained …

1 month, 2 weeks назад @ engineering.fb.com
Scaling LLM Inference: Innovations in Tensor Parallelism, Context Parallelism, and Expert Parallelism
Scaling LLM Inference: Innovations in Tensor Parallelism, Context Parallelism, and Expert Parallelism Scaling LLM Inference: Innovations in Tensor Parallelism, Context Parallelism, and Expert Parallelism

At Meta, we are constantly pushing the boundaries of LLM inference systems to power applications such as the Meta AI App.

These metrics highlight the distinct computational demands of LLM inference: Prefill is compute-intensive, while decoding is memory bandwidth-intensive.

Communication: Communication latency increases when parallelizing across multiple hosts.

In EP-based inference, we utilize a two-shot, all-to-all communication pattern to exchange tokens between data parallelism and expert parallelism ranks based on routing.

We are committed to continuous innovation to ensure efficient and scalable LLM inference for millions of users worldwide.

2 months, 1 week назад @ engineering.fb.com
How Meta Is Leveraging AI To Improve the Quality of Scope 3 Emission Estimates for IT Hardware
How Meta Is Leveraging AI To Improve the Quality of Scope 3 Emission Estimates for IT Hardware How Meta Is Leveraging AI To Improve the Quality of Scope 3 Emission Estimates for IT Hardware

We leveraged AI to help us improve this database and understand our Scope 3 emissions associated with IT hardware by:Identifying similar components and applying existing PCFs to similar components that lack these carbon estimates.

Understanding the carbon footprint of IT racks and applying generative AI (GenAI) as a categorization algorithm to create a new and standard taxonomy .

If these similar components are not identified their carbon footprint estimates will remain at a lower data quality.

These similar components can be mapped to a representative proxy PCF, allowing us to use high-quality PCF data in similar components.

For example, we can scale the carbon footprint calculation for a …

2 months, 1 week назад @ engineering.fb.com
OCP Summit 2025: The Open Future of Networking Hardware for AI
OCP Summit 2025: The Open Future of Networking Hardware for AI OCP Summit 2025: The Open Future of Networking Hardware for AI

At Open Compute Project Summit (OCP) 2025, we’re sharing details about the direction of next-generation network fabrics for our AI training clusters.

At Meta, we believe that open hardware is a catalyst for innovation — especially as data center infrastructure increasingly supports new and emerging AI technologies.

Open hardware plays a crucial role in enabling disaggregation, allowing us to break down traditional data center technologies into their core components.

Today, through OCP, we continue to advance open network technologies for the next generation of AI applications.

Ethernet for Scale-Up Networking in OCP: Meta’s Industry LeadershipAt Meta, we recognize that the future of AI and …

2 months, 1 week назад @ engineering.fb.com
LLMs Are the Key to Mutation Testing and Better Compliance
LLMs Are the Key to Mutation Testing and Better Compliance LLMs Are the Key to Mutation Testing and Better Compliance

By leveraging LLMs we’ve been able to overcome the barriers that have prevented mutation testing from being efficiently deployed at scale.

Our presentations shared insights into how we’ve used LLMs to solve the major barriers that have prevented mutation testing at scale and highlighted new areas in automated software testing where LLMs can have a significant impact.

Mutation Testing Isn’t ScalableTraditional mutation testing generates a very large number of mutants, making it computationally expensive and difficult to scale to large industrial codebases.

Mutation Testing Requires a Lot of Computational ResourcesMutation testing is costly in terms of computational resources and developer ef…

2 months, 3 weeks назад @ engineering.fb.com
AssetGen: Generating 3D Worlds With AI
AssetGen: Generating 3D Worlds With AI AssetGen: Generating 3D Worlds With AI

Imagine being able to use AI to create 3D virtual worlds using prompts as easily as you can generate images.

In his keynote, Mark Zuckerberg shared his vision of a future where anyone can create virtual worlds using AI-powered tools like the ones available in the upcoming Meta Horizon Studio.

But AI is already making it easier than ever to create 3D assets.

On this episode of the Meta Tech Podcast, Pascal Hartig is joined by Mahima and Rakesh from Meta’s XR Tech team to discuss AssetGen, a new foundation model for 3D assets.

They talk about how they built and trained AssetGen, the important role LLMs have to play in the future of VR, and how they’re tackling the ambitious goal of generating…

2 months, 3 weeks назад @ engineering.fb.com
Meta’s Infrastructure Evolution and the Advent of AI
Meta’s Infrastructure Evolution and the Advent of AI Meta’s Infrastructure Evolution and the Advent of AI

As our user base grew globally, we scaled beyond single data center buildings and into data center regions consisting of multiple buildings.

Enter AI Workloads (2020)While we were navigating the challenges of scaling, we were also seeing glimpses of how AI workloads would impact our infrastructure.

To build out our AI infrastructure, we’ve leveraged solutions from partners like AMD and NVIDIA as well as our own custom silicon.

Constructing Prometheus has been a monumental engineering feat, with infrastructure spanning five or more data center buildings in a single data center region.

We are still early in the evolution and adoption of AI workloads.

2 months, 3 weeks назад @ engineering.fb.com
Networking at the Heart of AI — @Scale: Networking 2025 Recap
Networking at the Heart of AI — @Scale: Networking 2025 Recap Networking at the Heart of AI — @Scale: Networking 2025 Recap

AI is everywhere and, as network engineers, we are right in the thick of it: building the network infrastructure for AI.

Setting Context: Rapid Changes and EvolutionGiven AI continues to drive so much innovation in networking and general infrastructure, we once again focused @Scale: Networking on AI networking, sharing the new insights and progress in the field.

The Models and the Primary AI Workloads Are Rapidly Evolving.

More from @Scale:Networking 2025Please visit the @Scale YouTube channel to check out all the talks from this year’s Networking @Scale.

We look forward to what promises to be another rapid year of network and AI innovation that we’ll cover at the next @Scale: Networking in…

3 months назад @ engineering.fb.com
A New Ranking Framework for Better Notification Quality on Instagram
A New Ranking Framework for Better Notification Quality on Instagram A New Ranking Framework for Better Notification Quality on Instagram

We’ve introduced a diversity-aware notification ranking framework to reduce uniformity and deliver a more varied and engaging mix of notifications.

Instagram leverages machine learning (ML) models to decide who should get a notification, when to send it, and what content to include.

To tackle this, we’ve introduced a diversity-aware notification ranking framework that helps deliver more diverse, better curated, and less repetitive notifications.

Introducing Instagram’s Diversity-Aware Notification Ranking FrameworkInstagram’s diversity-aware notification ranking framework is designed to enhance the notification experience by balancing the predicted potential for user engagement with the nee…

3 months, 3 weeks назад @ engineering.fb.com
Federation Platform and Privacy Waves: How Meta distributes compliance-related tasks at scale
Federation Platform and Privacy Waves: How Meta distributes compliance-related tasks at scale Federation Platform and Privacy Waves: How Meta distributes compliance-related tasks at scale

We’re exploring Meta’s Federation Platform, a scalable set of tools for managing compliance-related tasks, along with Privacy Waves, our method for batching these tasks and ensuring accountability.

To facilitate this, we developed the Federation Platform and Privacy Waves program:The Federation Platform breaks down large compliance-related initiatives into smaller, manageable workstreams.

Internal surveys reveal significantly higher positive sentiment for Privacy Waves tasks compared to ad-hoc tasks.

Step 6: Reporting and recognitionThe centralized distribution of tasks via Federation Platform and Privacy Waves streamline operational effectiveness and verification.

Expansions for the Federa…

4 months, 2 weeks назад @ engineering.fb.com
Diff Risk Score: AI-driven risk-aware software development
Diff Risk Score: AI-driven risk-aware software development Diff Risk Score: AI-driven risk-aware software development

Built on a fine-tuned Llama LLM, DRS evaluates code changes and metadata to produce a risk score and highlight potentially risky code snippets.

Production risk was one of the areas we tackled first.

The demand to build such features also led us to build the Risk Awareness Platform to provide risk analysis APIs and tool integrations.

We believe code risk can play a significant role in improving this tradeoff, so we will build more risk-aware features while improving their quality.

While code changes cause the plurality of SEVs at Meta, configuration changes are another large category.

4 months, 3 weeks назад @ engineering.fb.com
Uber Engineering
последний пост None
neptune.ai neptune.ai
последний пост 3 weeks, 1 day назад
We are joining OpenAI
We are joining OpenAI We are joining OpenAI

Piotr Niedźwiedź, CEO/CTO and founder of neptune.aiI’m excited to share that we’ve entered into a definitive agreement to be acquired by OpenAI, subject to closing conditions.

We are thrilled to join the OpenAI team and help their AI researchers build better models faster.

Neptune is a metrics dashboard company.”We’ve worked closely with OpenAI to create the metrics dashboard that helps teams building foundation models.

Our future with OpenAINeptune will join OpenAI and continue to support AI researchers with tools to monitor, debug, and evaluate frontier models.

We are looking forward to working with top AI researchers and supporting OpenAI’s mission of ensuring that AGI benefits all of hu…

3 weeks, 1 day назад @ neptune.ai
Synthetic Data for LLM Training
Synthetic Data for LLM Training Synthetic Data for LLM Training

For instance, financial data is highly sensitive and protected by very strict regulations, and synthetic data mimics the real data distribution without revealing customer information.

Read more about how leading foundation model teams curate their training data and other topics in the State of Foundation Model Training Report 2025.

Choosing the right synthetic data generation technique depends on the type of data and its complexity.

Synthetic tabular data generation is a promising direction to overcome these challenges by learning the distribution of the tabular data.

Post-processingAs the distribution of tabular data is highly complex, it makes the synthetic tabular data generation very ch…

1 month, 1 week назад @ neptune.ai
What are LLM Embeddings: All you Need to Know
What are LLM Embeddings: All you Need to Know What are LLM Embeddings: All you Need to Know

TL;DR LLM embeddings are the numerical, vector representations of text that Large Language Models (LLMs) use to process information.

Unlike their predecessor word embeddings, LLM embeddings are context-aware and dynamically change to capture semantic and syntactic relationships based on the surrounding text.

What are the applications of LLM embeddings?

Word EmbeddingsSparse Word Embeddings One-Hot Vectors 1970s TF-IDF1980s Co-Occurrence MatrixStatic Word Embeddings Word2Vec 2013 GloVe 2014Contextualized word embeddings ELMo 2018 GPT-1 2018 BERT 2018 LLAMA 2023 DeepSeek-V1 2023 GPT-4 2023Static word embeddingsStatic word embeddings, such as word2vec in 2013, marked a significant development.…

1 month, 2 weeks назад @ neptune.ai
Detecting and Fixing ‘Dead Neurons’ in Foundation Models
Detecting and Fixing ‘Dead Neurons’ in Foundation Models Detecting and Fixing ‘Dead Neurons’ in Foundation Models

TL;DR Dead neurons silently waste compute and reduce effective model capacity in foundation models.

Dead neurons’ impactRecent studies into dead neurons in the context of foundation models show interesting, albeit worrying, results.

These large reported fractions of dead neurons in foundation models are a concern from a computational perspective.

Before we move on to discuss how to detect and fix dead neurons, let’s touch upon an important distinction between dead neurons and vanishing gradients.

Further reading How to Monitor, Diagnose, and Solve Gradient Issues in Foundation Models Read moreVisualizing activation distributionsIs your foundation model suffering from dead neurons?

1 month, 4 weeks назад @ neptune.ai
Part 2: Instruction Fine-Tuning: Evaluation and Advanced Techniques for Efficient Training
Part 2: Instruction Fine-Tuning: Evaluation and Advanced Techniques for Efficient Training Part 2: Instruction Fine-Tuning: Evaluation and Advanced Techniques for Efficient Training

In the first part of this series, we covered the fundamentals of instruction fine-tuning (IFT).

def calculate_irs(instruction, output, reference_model): evaluation_prompt = f""" Instruction: {instruction} Model Output: {output} Rate how well the output follows the instruction on these criteria: 1.

| SourceHINT addresses a computational inefficiency in standard instruction fine-tuning: repeatedly reprocessing the same task instruction with every input example.

Read more about foundation model training infrastructure and other topics in Neptune’s 2025 State of Foundation Model Training Report.

First, during initial instruction fine-tuning across multiple diverse tasks, the model learns genera…

2 months назад @ neptune.ai
How to Optimize LLM Inference
How to Optimize LLM Inference How to Optimize LLM Inference

Large Language Model (LLM) inference at scale is challenging as it involves transferring massive amounts of model parameters and data and performing computations on large tensors.

In the following, we’ll use the Llama model family architecture as a specific example to understand the LLM workload at inference.

For a far more detailed analysis of the LLM workload at inference, see the chapter All About Transformer Inference in the book How to Scale Your Model, published by Google DeepMind.

See also How to Run LLMs Locally Read moreA quick primer on hardware for LLM inferenceA typical LLM inference cluster consists of several nodes, each with a multi-core CPU and multiple accelerator devices, …

2 months, 1 week назад @ neptune.ai
A Researcher’s Guide to LLM Grounding
A Researcher’s Guide to LLM Grounding A Researcher’s Guide to LLM Grounding

In this article, we’ll explore the fundamental concepts of LLM grounding as well as strategies for optimally grounding models.

What is LLM grounding?

LLM grounding is analogous.

If relevant knowledge cannot be inferred from the data, then LLM grounding cannot yield more relevant responses.

When grounding LLMs using RAG, consider retaining only a few of the top hits (i.e., top-k) for your retrieval queries.

3 months назад @ neptune.ai
Instruction Fine-Tuning: Fundamentals, Architecture Modifications, and Loss Functions
Instruction Fine-Tuning: Fundamentals, Architecture Modifications, and Loss Functions Instruction Fine-Tuning: Fundamentals, Architecture Modifications, and Loss Functions

TL;DR Instruction fine-tuning (IFT) refines pre-trained large language models (LLMs) to follow specific task instructions by training on prompt-response pairs.

Instruction fine-tuning in a nutshellIFT tailors LLMs to follow user instructions by bridging their inherent next-word prediction with human-defined objectives.

Related LLM Fine-Tuning and Model Selection Using Neptune and Transformers Read moreParameter-efficient instruction fine-tuningWhile major foundation models like GPT-4 or Llama-2 undergo full parameter instruction fine-tuning during development, parameter-efficient fine-tuning (PEFT) methods have become widely adopted for instruction fine-tuning since the LoRA paper was publi…

3 months, 1 week назад @ neptune.ai
Understanding Prompt Injection: Risks, Methods, and Defense Measures
Understanding Prompt Injection: Risks, Methods, and Defense Measures Understanding Prompt Injection: Risks, Methods, and Defense Measures

Prompt injection 101: When prompts go rogueThe term ‘Prompt Injection’ comes from SQL injection attacks.

There is another claim of the independent discovery of prompt injection attacks, which suggests that Riley Goodside publicly exhibited a prompt injection in a tweet back in September 2022.

The indirect prompt injection attacks are classified into active, passive, user-driven and virtual prompt attacks.

Virtual prompt injection attacksThis injection type is closely related to passive injection attacks previously described.

Prompt injection: current challenges & lessons learnedThe arms race between prompt injection attacks and defenses is a challenge for researchers, developers, and users.

4 months, 2 weeks назад @ neptune.ai
SabiYarn: Advancing Low-Resource Languages With Multitask NLP Pre-Training [Paper Reflections]
SabiYarn: Advancing Low-Resource Languages With Multitask NLP Pre-Training [Paper Reflections] SabiYarn: Advancing Low-Resource Languages With Multitask NLP Pre-Training [Paper Reflections]

This simple idea avoids computing loss on input prompt tokens the model already knows.

Prompt tokens are (too) expensive in low-resource settingsDuring pre-training, LLMs are trained in causal language modeling through a next-token prediction task.

=> Mo fẹ́ràn ìrẹsì,” the model is trained to predict every token, from the prompt to the actual answer:Step Prompt Next token 1 Translate English Static prompt 2 Translate English to Static prompt 3 Translate English to Yoruba: Static prompt 4 Translate English to Yoruba: I 5 Translate English to Yoruba: I love 6 Translate English to Yoruba: I love rice.

This is straightforward to implement in PyTorch by masking out the prompt tokens in the label …

4 months, 3 weeks назад @ neptune.ai
How to Monitor, Diagnose, and Solve Gradient Issues in Foundation Models
How to Monitor, Diagnose, and Solve Gradient Issues in Foundation Models How to Monitor, Diagnose, and Solve Gradient Issues in Foundation Models

What gradient issues occur during foundation model training?

During training, gradient descent updates model parameters by computing the gradients of the loss function via forward and backward passes.

The green line corresponds to a learning rate of 10, while the orange line has a learning rate of 0.1.

The gradient norm for the orange line with LR = 0.1 is very high in the first steps, while the gradient norm of the green line with LR = 10 diverges to NaN after a few steps.

Techniques for gradient stabilizationMonitoring gradient norms and training loss provides insights into the learning dynamics of the foundation models.

5 months, 3 weeks назад @ neptune.ai
STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning [Paper Reflection]
STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning [Paper Reflection] STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning [Paper Reflection]

Unstructured pruning removes individual weights, while structured pruning removes entire model components.

In the context of MoEs, as expert structures from training MoEs correspond to such patterns, pruning experts is a natural fit for structured pruning.

Thus, structured pruning does not significantly decrease kurtosis, leaving plenty of margin for unstructured pruning.

Since structured pruning primarily reduces architectural redundancy rather than reshaping the underlying weight distribution, our two-phase approach—leveraging unstructured pruning after structured pruning—outperforms unstructured-only pruning.

Since STUN does not make any assumption about base MoE models, it is generaliza…

6 months, 3 weeks назад @ neptune.ai
Evaluating RAG Pipelines
Evaluating RAG Pipelines Evaluating RAG Pipelines

Related Building LLM Applications With Vector Databases Read moreDimensions of RAG evaluationEvaluating a RAG pipeline means assessing its behavior across three dimensions:1.

The evaluation of the RAG pipeline is a multi-step process, starting with creating an evaluation dataset, then evaluating the individual components (retriever, generator, etc.

Curating an evaluation datasetThe first step in the RAG evaluation process is the creation of a ground truth dataset.

MAP considers both the presence and rank of relevant chunks but fails to consider the relative position of relevant chunks.

However, not all retrieved chunks are equally relevant and sometimes, the most relevant chunks might not b…

7 months, 2 weeks назад @ neptune.ai
How to Build an LLM Agent With AutoGen: Step-by-Step Guide
How to Build an LLM Agent With AutoGen: Step-by-Step Guide How to Build an LLM Agent With AutoGen: Step-by-Step Guide

The efficiency of an LLM agent depends on the selection of the right LLM model.

In this article, we’ll introduce the fundamental building blocks of LLM agents and then walk through the process of building an LLM agent step by step.

Building an LLM agent from scratchIn the following, we’ll build a trip-planning LLM agent from scratch.

Using AutoGen’s OpenAI Assistant Agent, we instantiate a prompt that the LLM agent will follow throughout its interactions.

Related Ethical Considerations and Best Practices in LLM Development Read moreEnhancing LLM agent performanceWhile architecting an LLM agent, you have to keep in mind opportunities to improve the performance of the LLM agent.

9 months, 1 week назад @ neptune.ai
Bayesian Deep Learning is Needed in the Age of Large-Scale AI [Paper Reflection]
Bayesian Deep Learning is Needed in the Age of Large-Scale AI [Paper Reflection] Bayesian Deep Learning is Needed in the Age of Large-Scale AI [Paper Reflection]

Moreover, I will make the case for why Bayesian deep learning can satisfy these desiderata and briefly review recent advances in the field.

The case for Bayesian deep learningBayesian deep learning uses the foundational statistical principles of Bayesian inference to endow deep learning systems with the ability to make probabilistic predictions.

However, Bayesian deep learning is unfortunately still not as easy to use as standard deep learning, which you can do these days in a few lines of PyTorch code.

If you want to use a Bayesian deep learning model, first, you have to think about specifying the prior.

If this is the case, trying out Bayesian deep learning is likely worth your while.

9 months, 2 weeks назад @ neptune.ai
▶️ YouTube
Yannic Kilcher Yannic Kilcher
последний пост 1 week, 4 days назад
Titans: Learning to Memorize at Test Time (Paper Analysis)
Titans: Learning to Memorize at Test Time (Paper Analysis) Titans: Learning to Memorize at Test Time (Paper Analysis)

Paper: https://arxiv.org/abs/2501.00663 Abstract:

Over more than a decade there has been an extensive research effort on how to effectively utilize recurrent models and attention. While recurrent models aim to compress the data into a fixed-size memory (called hidden state), attention allows attending to the entire context window, capturing the direct dependencies of all tokens. This more accurate modeling of dependencies, however, comes with a quadratic cost, limiting the model to a fixed-length context. We present a new neural long-term memory module that learns to memorize historical context and helps attention to attend to the current context while utilizing long past information. We sh…

1 week, 4 days назад @ youtube.com
[Paper Analysis] The Free Transformer (and some Variational Autoencoder stuff)
[Paper Analysis] The Free Transformer (and some Variational Autoencoder stuff) [Paper Analysis] The Free Transformer (and some Variational Autoencoder stuff)

https://arxiv.org/abs/2510.17558 Abstract:

We propose an extension of the decoder Transformer that conditions its generative process on random latent variables which are learned without supervision thanks to a variational procedure. Experimental evaluations show that allowing such a conditioning translates into substantial improvements on downstream tasks. Author: François Fleuret Links:

Homepage: https://ykilcher.com

Merch: https://ykilcher.com/merch

YouTube: https://www.youtube.com/c/yannickilcher

Twitter: https://twitter.com/ykilcher

Discord: https://ykilcher.com/discord

LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the con…

1 month, 3 weeks назад @ youtube.com
[Video Response] What Cloudflare's code mode misses about MCP and tool calling
[Video Response] What Cloudflare's code mode misses about MCP and tool calling [Video Response] What Cloudflare's code mode misses about MCP and tool calling

Theo's Video: https://www.youtube.com/watch?v=bAYZjVAodoo

Cloudflare article: https://blog.cloudflare.com/code-mode/ Links:

Homepage: https://ykilcher.com

Merch: https://ykilcher.com/merch

YouTube: https://www.youtube.com/c/yannickilcher

Twitter: https://twitter.com/ykilcher

Discord: https://ykilcher.com/discord

LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):

SubscribeStar: https://www.subscribestar.com/yannickilcher

Patreon: https://www.patreon.com/yannickilcher

Bitcoin (BTC): bc1q49lsw3q325tr58ygf8…

2 months, 1 week назад @ youtube.com
[Paper Analysis] On the Theoretical Limitations of Embedding-Based Retrieval (Warning: Rant)
[Paper Analysis] On the Theoretical Limitations of Embedding-Based Retrieval (Warning: Rant) [Paper Analysis] On the Theoretical Limitations of Embedding-Based Retrieval (Warning: Rant)

Paper: https://arxiv.org/abs/2508.21038 Abstract:

Vector embeddings have been tasked with an ever-increasing set of retrieval tasks over the years, with a nascent rise in using them for reasoning, instruction-following, coding, and more. These new benchmarks push embeddings to work for any query and any notion of relevance that could be given. While prior works have pointed out theoretical limitations of vector embeddings, there is a common assumption that these difficulties are exclusively due to unrealistic queries, and those that are not can be overcome with better training data and larger models. In this work, we demonstrate that we may encounter these theoretical limitations in realist…

2 months, 2 weeks назад @ youtube.com
AGI is not coming!
AGI is not coming! AGI is not coming!

jack Morris's investigation into GPT-OSS training data https://x.com/jxmnop/status/1953899426075816164?t=3YRhVQDwQLk2gouTSACoqA&s=09

4 months, 2 weeks назад @ youtube.com
Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis)
Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis) Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis)

Paper: https://research.trychroma.com/context-rot Abstract:

Large Language Models (LLMs) are typically presumed to process context uniformly—that is, the model should handle the 10,000th token just as reliably as the 100th. However, in practice, this assumption does not hold. We observe that model performance varies significantly as input length changes, even on simple tasks.

In this report, we evaluate 18 LLMs, including the state-of-the-art GPT-4.1, Claude 4, Gemini 2.5, and Qwen3 models. Our results reveal that models do not use their context uniformly; instead, their performance grows increasingly unreliable as input length grows. Authors: Kelly Hong, Anton Troynikov, Jeff Huber Links:

5 months назад @ youtube.com
Energy-Based Transformers are Scalable Learners and Thinkers (Paper Review)
Energy-Based Transformers are Scalable Learners and Thinkers (Paper Review) Energy-Based Transformers are Scalable Learners and Thinkers (Paper Review)

Paper: https://arxiv.org/abs/2507.02092

Code: https://github.com/alexiglad/EBT

Website: https://energy-based-transformers.github.io/ Abstract:

Inference-time computation techniques, analogous to human System 2 Thinking, have recently become popular for improving model performances. However, most existing approaches suffer from several limitations: they are modality-specific (e.g., working only in text), problem-specific (e.g., verifiable domains like math and coding), or require additional supervision/training on top of unsupervised pretraining (e.g., verifiers or verifiable rewards). In this paper, we ask the question "Is it possible to generalize these System 2 Thinking approaches, and de…

5 months, 1 week назад @ youtube.com
On the Biology of a Large Language Model (Part 2)
On the Biology of a Large Language Model (Part 2) On the Biology of a Large Language Model (Part 2)

An in-depth look at Anthropic's Transformer Circuit Blog Post

Part 1 here: https://youtu.be/mU3g2YPKlsA

Discord here: https;//ykilcher.com/discord https://transformer-circuits.pub/2025/attribution-graphs/biology.html Abstract:

We investigate the internal mechanisms used by Claude 3.5 Haiku — Anthropic's lightweight production model — in a variety of contexts, using our circuit tracing methodology. Authors:

Jack Lindsey†, Wes Gurnee*, Emmanuel Ameisen*, Brian Chen*, Adam Pearce*, Nicholas L. Turner*, Craig Citro*,

David Abrahams, Shan Carter, Basil Hosmer, Jonathan Marcus, Michael Sklar, Adly Templeton,

Trenton Bricken, Callum McDougall◊, Hoagy Cunningham, Thomas Henighan, Adam Jermyn, Andy …

7 months, 3 weeks назад @ youtube.com
On the Biology of a Large Language Model (Part 1)
On the Biology of a Large Language Model (Part 1) On the Biology of a Large Language Model (Part 1)

An in-depth look at Anthropic's Transformer Circuit Blog Post https://transformer-circuits.pub/2025/attribution-graphs/biology.html Abstract:

We investigate the internal mechanisms used by Claude 3.5 Haiku — Anthropic's lightweight production model — in a variety of contexts, using our circuit tracing methodology. Authors:

Jack Lindsey†, Wes Gurnee*, Emmanuel Ameisen*, Brian Chen*, Adam Pearce*, Nicholas L. Turner*, Craig Citro*,

David Abrahams, Shan Carter, Basil Hosmer, Jonathan Marcus, Michael Sklar, Adly Templeton,

Trenton Bricken, Callum McDougall◊, Hoagy Cunningham, Thomas Henighan, Adam Jermyn, Andy Jones, Andrew Persic, Zhenyi Qi, T. Ben Thompson,

Sam Zimmerman, Kelley Rivoire, Thom…

8 months, 3 weeks назад @ youtube.com
Henry AI Labs Henry AI Labs
последний пост None
3blue1brown 3blue1brown
последний пост 1 month назад
The most absurd product I've made
The most absurd product I've made The most absurd product I've made

Because why not make a pi creature neck pillow?

Available at 3b1b.co/store

1 month назад @ youtube.com
How Laplace transforms solve differential equations
How Laplace transforms solve differential equations How Laplace transforms solve differential equations

Studying the forced harmonic oscillator by taking a Laplace transform and studying its poles.

Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support

An equally valuable form of support is to simply share the videos.

Home page: https://www.3blue1brown.com Chapter on the Laplace Transform:

https://youtu.be/j0wJBEZdwLs Chapter on the S-plane and Simple Harmonic Motion:

https://youtu.be/-j8PzkZ70Lg Timestamps:

0:00 - Opening puzzle

1:06 - Key properties of a Laplace Transform

3:29 - Qualitative analysis with Laplace Transforms

4:29 - The Laplace Transforms of a Derivative

6:06 - The forced oscillator

11:59 - Intuition from the transformed solution

1…

1 month, 2 weeks назад @ youtube.com
The dynamics of e^(πi)
The dynamics of e^(πi) The dynamics of e^(πi)

A fuller version of this explanation, also including the reason we care about complex exponents in the first place: https://youtu.be/-j8PzkZ70Lg

2 months, 2 weeks назад @ youtube.com
But what is a Laplace Transform?
But what is a Laplace Transform? But what is a Laplace Transform?

Visualizing the most important tool for differential equations.

Previous chapter: https://youtu.be/-j8PzkZ70Lg

Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support

An equally valuable form of support is to simply share the videos.

Home page: https://www.3blue1brown.com Artwork by Kurt Bruns Engine animation borrowed with permission from this (excellent) blog: https://ciechanow.ski/internal-combustion-engine/ Timestamps:

0:00 - Understanding the engine

1:16 - Key background ideas

5:41 - Definition and intuition

10:43 - Complex integration

20:43 - Analytic continuation

23:52 - The transform of exponentials

26:15 - A deep look at cos(t)

32:59 - W…

2 months, 2 weeks назад @ youtube.com
The dynamics of e^(πi)
The dynamics of e^(πi) The dynamics of e^(πi)

A fuller version of this explanation, also including the reason we care about complex exponents in the first place: https://youtu.be/-j8PzkZ70Lg

2 months, 2 weeks назад @ youtube.com
Why complex exponents matter | Laplace Transform Prelude
Why complex exponents matter | Laplace Transform Prelude Why complex exponents matter | Laplace Transform Prelude

How dynamics explain Euler's formula, and vice versa.

Early view of the Laplace Transform video: https://www.patreon.com/posts/laplace-early-140428165

Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support

An equally valuable form of support is to simply share the videos.

Home page: https://www.3blue1brown.com Timestamps:

0:00 - Intro

1:51 - Euler's formula explained dynamically

9:27 - The harmonic oscillator

21:08 - General linear equations

22:47 - Motivating the Laplace Transform ------------------ These animations are largely made using a custom Python library, manim. See the FAQ comments here:

https://3b1b.co/faq#manim Music by Vincent Rubin…

2 months, 3 weeks назад @ youtube.com
Why ruler and compass? | Guest video by ⁨@bensyversen⁩
Why ruler and compass? | Guest video by ⁨@bensyversen⁩ Why ruler and compass? | Guest video by ⁨@bensyversen⁩

What role were ruler and compass constructions really serving?

Check out Ben's channel: @bensyversen Interview with the author of this video: https://youtu.be/VohYM99j8e0

Supporters get early views of new videos: https://3b1b.co/support Written, produced, edited, and animated by Ben Syversen

Additional editing: Jack Saxon

3d Blender model: Jan-Hendrik Müller

Additional Blender help: Thibaut Modrzyk (@Deepia)

Illustrations: Alex Zepherin/DonDada Studio

Drums: Jeremy Gustin

Additional music from Epidemic Sound Special thanks to Viktor Blåsjö: https://intellectualmathematics.com/opinionated-history-of-mathematics/ References/Recommended reading: Euclid’s Elements:

Visual edition of Book 1: htt…

3 months, 1 week назад @ youtube.com
Incomplete open cubes
Incomplete open cubes Incomplete open cubes

Full video: https://youtu.be/_BrFKp-U8GI

3 months, 2 weeks назад @ youtube.com
Exploration & Epiphany
Exploration & Epiphany Exploration & Epiphany

Sol Lewitt's "Incomplete Open Cubes" and rediscovering Burnside's lemma in group theory

This is a guest video by Paul Dancstep: https://youtu.be/JEeM2ABUMoo

Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support

An equally valuable form of support is to share the videos.

Home page: https://www.3blue1brown.com Thanks to the Wadsworth Atheneum for granting permission to use LeWitt's notebooks. Talks by Paul you can find online: What is Category Theory:

https://www.youtube.com/watch?app=desktop&v=eXBwU9ieLL0 How to Predict Eclipses:

https://www.exploratorium.edu/eclipse/video/how-predict-eclipses Theo Jansen's Strandbeests

https://www.youtube.com/w…

3 months, 2 weeks назад @ youtube.com
Simulating Phase Change | Guest video by Vilas Winstein
Simulating Phase Change | Guest video by Vilas Winstein Simulating Phase Change | Guest video by Vilas Winstein

Deriving the Boltzmann formula, defining temperature, and simulating liquid/vapor.

@SpectralCollective has the second part: https://youtu.be/yEcysu5xZH0

You can play with a simulation of this model here: https://vilas.us/simulations/liquidvapor/

These lessons are funded directly by viewers: https://3b1b.co/support

Home page: https://www.3blue1brown.com Notes from Vilas:

1) This open problem is to prove the ergodicity of the deterministic dynamical systems that are used to model the molecule-level physics. A good example of such a dynamical system is the box with particles evolving according to Newton's laws with elastic collisions, like in the video. 2) This video assumes that all probabili…

3 months, 4 weeks назад @ youtube.com
How AI connects text and images
How AI connects text and images How AI connects text and images

From this guest video by @WelchLabsVideo on how diffusion models work: https://youtu.be/iv-5mZ_9CPY

4 months назад @ youtube.com
The AI that solved IMO Geometry Problems | Guest video by @Aleph0
The AI that solved IMO Geometry Problems | Guest video by @Aleph0 The AI that solved IMO Geometry Problems | Guest video by @Aleph0

How AlphaGeometry combines logic and intuition.

Share stories about AI in math research for an upcoming video: https://forms.gle/gr9aZVdUrW5T3yDg9

Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support

An equally valuable form of support is to simply share the videos.

Home page: https://www.3blue1brown.com AlphaGeometry announcement:

https://deepmind.google/discover/blog/alphageometry-an-olympiad-level-ai-system-for-geometry/ Similar open-source model, Newclid, by Harmonic:

https://harmonic.fun/news#blog-post-geometry Timestamps:

0:00 - What's surprising

1:33 - Solve without AI

7:10 - Where AI comes in

12:48 - Grant's comments ------------------…

4 months, 1 week назад @ youtube.com
But how do AI videos actually work? | Guest video by @WelchLabsVideo
But how do AI videos actually work? | Guest video by @WelchLabsVideo But how do AI videos actually work? | Guest video by @WelchLabsVideo

Diffusion models, CLIP, and the math of turning text into images

Welch Labs Book: https://www.welchlabs.com/resources/imaginary-numbers-book Sections

0:00 - Intro

3:37 - CLIP

6:25 - Shared Embedding Space

8:16 - Diffusion Models & DDPM

11:44 - Learning Vector Fields

22:00 - DDIM

25:25 Dall E 2

26:37 - Conditioning

30:02 - Guidance

33:39 - Negative Prompts

34:27 - Outro

35:32 - About guest videos + Grant’s Reaction Special Thanks to:

Jonathan Ho - Jonathan is the Author of the DDPM paper and the Classifier Free Guidance Paper.

https://arxiv.org/pdf/2006.11239

https://arxiv.org/pdf/2207.12598 Preetum Nakkiran - Preetum has an excellent introductory diffusion tutorial:

https://arxiv.org/pdf/24…

5 months назад @ youtube.com
Summer of Math Exposition #4 | Teachers, I'd love to hear from you
Summer of Math Exposition #4 | Teachers, I'd love to hear from you Summer of Math Exposition #4 | Teachers, I'd love to hear from you

Make a math explainer, get feedback, and receive prizes: https://some.3b1b.co

Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support

An equally valuable form of support is to simply share the videos. ------------------ These animations are largely made using a custom Python library, manim. See the FAQ comments here:

https://3b1b.co/faq#manim

https://github.com/3b1b/manim

https://github.com/ManimCommunity/manim/ All code for specific videos is visible here:

https://github.com/3b1b/videos/ The music is by Vincent Rubinetti.

https://www.vincentrubinetti.com

https://vincerubinetti.bandcamp.com/album/the-music-of-3blue1brown

https://open.spotify.com/…

7 months, 3 weeks назад @ youtube.com
Where my explanation of Grover’s algorithm failed
Where my explanation of Grover’s algorithm failed Where my explanation of Grover’s algorithm failed

Addressing viewer questions from the last video.

These lessons are funded directly by viewers: https://3b1b.co/support

An equally valuable form of support is to share the videos. ------------------ These animations are largely made using a custom Python library, manim. See the FAQ comments here:

https://3b1b.co/faq#manim

https://github.com/3b1b/manim

https://github.com/ManimCommunity/manim/ All code for specific videos is visible here:

https://github.com/3b1b/videos/ The music is by Vincent Rubinetti.

https://www.vincentrubinetti.com

https://vincerubinetti.bandcamp.com/album/the-music-of-3blue1brown

https://open.spotify.com/album/1dVyjwS8FBqXhRunaG5W5u ------------------ 3blue1brown is a ch…

7 months, 3 weeks назад @ youtube.com
Two Minute Papers Two Minute Papers
последний пост 4 days, 7 hours назад
NVIDIA’s AI Learns To Walk…Painfully
NVIDIA’s AI Learns To Walk…Painfully NVIDIA’s AI Learns To Walk…Painfully

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Using DeepSeek on Lambda:

https://lambda.ai/inference-models/deepseek-r1 📝 The paper is available here:

https://research.nvidia.com/labs/toronto-ai/trace-pace/ 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Benji Rabhan, B Shang, Christian Ahlin, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Steef, Taras B…

4 days, 7 hours назад @ youtube.com
This Is The Physics Tech Games Have Been Waiting For
This Is The Physics Tech Games Have Been Waiting For This Is The Physics Tech Games Have Been Waiting For

❤️ Check out Weights & Biases and sign up for a free demo here: https://wandb.me/papers 📝 The paper is available here:

https://wanghmin.github.io/publication/wu-2022-gbm/ 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Benji Rabhan, B Shang, Christian Ahlin, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Steef, Taras Bobrovytsky, Tybie Fitzhugh, Ueli Gallizzi

If you wish to appear here or …

1 week назад @ youtube.com
The AI That Built An Economy… And Went Bankrupt
The AI That Built An Economy… And Went Bankrupt The AI That Built An Economy… And Went Bankrupt

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Using DeepSeek on Lambda:

https://lambda.ai/inference-models/deepseek-r1 📝 The paper is available here:

https://simworld.org/ 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Benji Rabhan, B Shang, Christian Ahlin, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Steef, Taras Bobrovytsky, Tybie Fitzhugh, Ueli G…

1 week, 4 days назад @ youtube.com
DeepMind’s Crazy New AI Masters Games That Don’t Exist
DeepMind’s Crazy New AI Masters Games That Don’t Exist DeepMind’s Crazy New AI Masters Games That Don’t Exist

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Using DeepSeek on Lambda:

https://lambda.ai/inference-models/deepseek-r1 📝 The SIMA 2 paper is available here:

https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/ 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Benji Rabhan, B Shang, Christian Ahlin, Fred R, Gordon Child, Juan Benet, Michael…

2 weeks назад @ youtube.com
AlphaFold - The Most Important AI Breakthrough Ever Made
AlphaFold - The Most Important AI Breakthrough Ever Made AlphaFold - The Most Important AI Breakthrough Ever Made

Full interview: https://www.youtube.com/watch?v=Vhcwjzeukts

2 weeks назад @ youtube.com
30x Better Physics: Why Everyone Missed This Genius Solution
30x Better Physics: Why Everyone Missed This Genius Solution 30x Better Physics: Why Everyone Missed This Genius Solution

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Using DeepSeek on Lambda:

https://lambda.ai/inference-models/deepseek-r1 My hobby channel with guitars and labcoats 🥼:

https://www.youtube.com/watch?v=GjMMhn4pS38

https://www.youtube.com/watch?v=BxS62W6V48E 📝 The paper is available here:

https://arxiv.org/abs/2505.21946 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Benji Rabhan, B Shang, Chri…

2 weeks, 4 days назад @ youtube.com
He Kinda Solved Biology - Nobel Prize Winner John Jumper Interview
He Kinda Solved Biology - Nobel Prize Winner John Jumper Interview He Kinda Solved Biology - Nobel Prize Winner John Jumper Interview

Thank you so much to John for being so kind and insightful, and to the film crew as well - they all did an incredible job. To celebrate the 5th anniversary of #AlphaFold, I was invited by Google DeepMind to interview Nobel Prize Winner and Distinguished Scientist, John Jumper. Note that we have no business ties with them. AlphaFold: https://deepmind.google/science/alphafold/

The full Thinking Game Movie: https://www.youtube.com/watch?v=d95J8yzvjbQ My research: https://cg.tuwien.ac.at/~zsolnai/

Thumbnail design: Felícia Zsolnai-Fehér - http://felicia.hu

3 weeks, 2 days назад @ youtube.com
Unreal Engine 5.7: Billions Of Triangles, In Real Time
Unreal Engine 5.7: Billions Of Triangles, In Real Time Unreal Engine 5.7: Billions Of Triangles, In Real Time

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The Unreal Engine 5.7 is available here:

https://www.unrealengine.com/en-US/news/unreal-engine-5-7-is-now-available Sources:

https://www.youtube.com/watch?v=Mj_-2SdsYLw

https://www.youtube.com/watch?v=ngzPTqtZWo4

https://advances.realtimerendering.com/s2023/2023%20Siggraph%20-%20Substrate.pdf 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Be…

1 month назад @ youtube.com
Blender 5.0 Is Here - A Revolution…For Free!
Blender 5.0 Is Here - A Revolution…For Free! Blender 5.0 Is Here - A Revolution…For Free!

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Get Blender 5.0 here: https://www.blender.org/

Example scenes: https://www.blender.org/download/demo-files/

Multiple scattering paper: https://cg.iit.bme.hu/~szirmay/volreuse_link.htm 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Benji Rabhan, B Shang, Christian Ahlin, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall…

1 month назад @ youtube.com
DeepMind’s New AI Mastered Minecraft… Without Ever Playing It
DeepMind’s New AI Mastered Minecraft… Without Ever Playing It DeepMind’s New AI Mastered Minecraft… Without Ever Playing It

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Guide:

Rent one of their GPUs with over 16GB of VRAM

Open a terminal

Just get Ollama following the command from here - https://ollama.com/download/linux

Then run ollama run gpt-oss:120b - https://ollama.com/library/gpt-oss:120b 📝 The paper is available here:

https://danijar.com/project/dreamer4/ Source:

https://www.youtube.com/watch?v=6bnM84xGxbg 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patre…

1 month, 1 week назад @ youtube.com
Games Have Never Simulated Clothing Like This Before
Games Have Never Simulated Clothing Like This Before Games Have Never Simulated Clothing Like This Before

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Guide:

Rent one of their GPUs with over 16GB of VRAM

Open a terminal

Just get Ollama with this command - https://ollama.com/download/linux

Then run ollama run gpt-oss:120b - https://ollama.com/library/gpt-oss:120b 📝 The paper "Fast Physics-Based Modeling of Knots and Ties Using Templates" is available here:

https://wanghmin.github.io/publication/guo-2025-fpb/ Sources:

https://www.youtube.com/watch?v=2RQcoLV_bVk

https://www.youtube.com/watch?v=7d158rQ1R3k

https://www.youtube.com/watch?v=qirVdKg3qgs

https://www.youtube.com/watch?v=TPokJdN2bkw

https://www.youtube.com/watch?v=DRzT3c1jk14

https://www.youtube.com/w…

1 month, 1 week назад @ youtube.com
The Secret Behind Those Perfect Chocolate Commercials
The Secret Behind Those Perfect Chocolate Commercials The Secret Behind Those Perfect Chocolate Commercials

❤️ Check out Weights & Biases and sign up for a free demo here: https://wandb.me/papers 📝 The paper "A practical octree liquid simulator with adaptive surface resolution" is available here:

https://cs.uwaterloo.ca/~c2batty/papers/Ando2020/Ando2020.pdf Sources:

https://www.youtube.com/watch?v=kdt5Cs1VYJA

https://www.youtube.com/watch?v=YmmSDZ6dBdY

https://www.youtube.com/shorts/FVIDRU9-FW8

https://www.youtube.com/watch?v=gNZtx3ijjpo&pp=ygUHb2N0cmVlcw%3D%3D

https://www.youtube.com/shorts/1Euba1QvhW0

https://www.youtube.com/shorts/k2P9yWSMaXE

https://www.youtube.com/watch?v=Z5qbxQI6dgw

https://www.youtube.com/watch?v=laoGmqNtUMI 📝 My paper on simulations that look almost like reality is availa…

1 month, 1 week назад @ youtube.com
The Physics Glitch Everyone Gave Up On… Finally Fixed
The Physics Glitch Everyone Gave Up On… Finally Fixed The Physics Glitch Everyone Gave Up On… Finally Fixed

❤️ Check out Weights & Biases and sign up for a free demo here: https://wandb.me/papers 📝 The paper "Multi-Material Mesh-Based Surface Tracking with Implicit Topology Changes" is available here under one of these links hopefully:

https://pub.ista.ac.at/group_wojtan/projects/2024_MultimatMeshing/SuperDuperTopoFixer.pdf

https://dl.acm.org/doi/10.1145/3658223 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 Sources:

https://www.youtube.com/watch?v=dtBqv-qIFLo

https://www.youtube.com/watch?v=EZul6DR-fHc

https://www.youtube…

1 month, 2 weeks назад @ youtube.com
NVIDIA’s New AI Just Made Real Physics Look Slow
NVIDIA’s New AI Just Made Real Physics Look Slow NVIDIA’s New AI Just Made Real Physics Look Slow

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Guide:

Rent one of their GPUs with over 16GB of VRAM

Open a terminal

Just get Ollama with this command - https://ollama.com/download/linux

Then run ollama run gpt-oss:120b - https://ollama.com/library/gpt-oss:120b 📝 The paper "Neural Robot Dynamics" is available here:

https://neural-robot-dynamics.github.io/

https://github.com/NVlabs/neural-robot-dynamics 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our gener…

1 month, 2 weeks назад @ youtube.com
They Said It Was Impossible… Weta FX Just Solved It
They Said It Was Impossible… Weta FX Just Solved It They Said It Was Impossible… Weta FX Just Solved It

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Guide:

Rent one of their GPUs with over 16GB of VRAM

Open a terminal

Just get Ollama with this command - https://ollama.com/download/linux

Then run ollama run gpt-oss:120b - https://ollama.com/library/gpt-oss:120b 📝 The paper "A unified multi-scale method for simulating immersed bubbles" is available here:

https://alexey.stomakhin.com/research/unibubbles.html 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our g…

1 month, 3 weeks назад @ youtube.com
DataFest Video DataFest Video
последний пост None
Семинары JetBrains Research Семинары JetBrains Research
последний пост None
Яндекс. Компьютерные науки Яндекс. Компьютерные науки
последний пост 11 часов назад
Тренды в NLP, обзор ICLR и ACL / Александр Юшкевич
Тренды в NLP, обзор ICLR и ACL / Александр Юшкевич Тренды в NLP, обзор ICLR и ACL / Александр Юшкевич

ML Global Recap 2025 — митап для ML-сообщества, на котором мы рассказали о главных международных конференциях года и самых интересных трендах в рекомендательных технологиях, компьютерном зрении, распознавании речи и NLP. С докладом на ивенте выступил Александр Юшкевич, руководитель команды развития моделей базового качества в Поисковых сервисах и ИИ. Он показал, как конференции отражают тренды в NLP: растёт закрытость топовых LLM-моделей, а также спрос на alignment & safety, инференс, интерпретируемость и оптимизацию. А ещё появляются новые бенчмарки (куда без них). Больше ML-контента по ссылке: https://t.me/+Ug9D4CjJrJxmZGRi #ML, #MachineLearning, #AI, #DataScience, #MLGlobalRecap2025, #не…

11 часов назад @ youtube.com
Голосовые технологии на Interspeech и ICASSP 2025 / Борис Шелудько
Голосовые технологии на Interspeech и ICASSP 2025 / Борис Шелудько Голосовые технологии на Interspeech и ICASSP 2025 / Борис Шелудько

ML Global Recap 2025 — митап для ML-сообщества, на котором мы рассказали о главных международных конференциях года и самых интересных трендах в рекомендательных технологиях, компьютерном зрении, распознавании речи и NLP. С докладом на ивенте выступил Борис Шелудько, руководитель команды качества звука в Яндекс R&D. Он рассказал про итоги двух главных международных конференций по голосовым технологиям: Interspeech в Нидерландах и ICASSP в Индии. Больше ML-контента по ссылке: https://t.me/+Ug9D4CjJrJxmZGRi #ML, #MachineLearning, #AI, #DataScience, #MLGlobalRecap2025, #нейросети, #искусственныйинтеллект, #рекомендательныесистемы, #NLP, #ComputerVision, #SpeechRecognition, #LLM, #DeepLearning, …

1 day, 11 hours назад @ youtube.com
Главные тренды рекомендательных систем / Николай Савушкин
Главные тренды рекомендательных систем / Николай Савушкин Главные тренды рекомендательных систем / Николай Савушкин

ML Global Recap 2025 — митап для ML-сообщества, на котором мы рассказали о главных международных конференциях года и самых интересных трендах в рекомендательных технологиях, компьютерном зрении, распознавании речи и NLP. С докладом на ивенте выступил Николай Савушкин, руководитель команды рекомендательных технологий в Яндекс R&D. Он поделился инсайтами с CIKM и RecSys и рассказал про ключевые тренды в рекомендательных системах: фундаментальные и End2End-модели, масштабирование, мультимодальность, attention-based ranking и другие. Больше ML-контента по ссылке: https://t.me/+Ug9D4CjJrJxmZGRi #ML, #MachineLearning, #AI, #DataScience, #MLGlobalRecap2025, #нейросети, #искусственныйинтеллект, #ре…

2 days, 11 hours назад @ youtube.com
Открытие ML Global Recap 2025 / Алексей Гусаков
Открытие ML Global Recap 2025 / Алексей Гусаков Открытие ML Global Recap 2025 / Алексей Гусаков

ML Global Recap 2025 — митап для ML-сообщества, на котором мы рассказали о главных международных конференциях года и самых интересных трендах в рекомендательных технологиях, компьютерном зрении, распознавании речи и NLP. Ивент открыл Алексей Гусаков, CTO Поисковых сервисов и ИИ. В докладе — вступительное слово, краткий обзор NeurIPS и немного про кризис бенчмарков. Больше ML-контента по ссылке: https://t.me/+Ug9D4CjJrJxmZGRi #ML, #MachineLearning, #AI, #DataScience, #MLGlobalRecap2025, #нейросети, #искусственныйинтеллект, #рекомендательныесистемы, #NLP, #ComputerVision, #SpeechRecognition, #LLM, #DeepLearning, #MLтренды, #ITконференция, #Yandex, #MLсообщество

3 days, 11 hours назад @ youtube.com
Как решить проблему разнообразия ответов LLM
Как решить проблему разнообразия ответов LLM Как решить проблему разнообразия ответов LLM

Это отрывок из доклада Алексея Колесова, CTO в Яндекс R&D. На Practical ML Conf 2025 он рассказал, как ребята учили YandexGPT 5.1 лучше помнить факты и применять знания о них. А ещё показал, как у нас стабильно заработал online RL. Полная запись уже на канале! #YandexGPT #LLM #AI #ArtificialIntelligence #MachineLearning #DeepLearning #ReinforcementLearning #OnlineRL #NLP #GenerativeAI #YandexForDevelopers #YandexForML #Яндекс #AIDevDay #TechConference #DataScience #ML #AIResearch #LanguageModel #YandexTech

3 weeks назад @ youtube.com
ML Global Recap'25
ML Global Recap'25 ML Global Recap'25

Митап Яндекса для ML-сообщества, на котором расскажем о шести международных конференциях и главных трендах в рекомендательных технологиях, компьютерном зрении, технологиях распознавания речи и NLP.

3 weeks, 1 day назад @ youtube.com
Секция на проверку базовых технических навыков ML-инженеров
Секция на проверку базовых технических навыков ML-инженеров Секция на проверку базовых технических навыков ML-инженеров

Мок-интервью: как проходит секция на базовые алгоритмы для ML-инженеров Подробнее: https://yandex.ru/jobs/interview/mldev

Вакансии: https://yandex.ru/jobs/vacancies?professions=ml-developer Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/+Ug9D4CjJrJxmZGRi #ml, #machinelearning, #mlengineer, #mlinterview, #datascience, #яндекс, #yandex, #itcareer, #mldeveloper, #techinterview, #algorithms, #машиннообучение, #вакансии

3 weeks, 3 days назад @ youtube.com
Какие есть виды LLM-аугментаций
Какие есть виды LLM-аугментаций Какие есть виды LLM-аугментаций

Это отрывок из доклада Алексея Колесова, CTO в Яндекс R&D. На Practical ML Conf 2025 он рассказал, как ребята учили YandexGPT 5.1 лучше помнить факты и применять знания о них. А ещё показал, как у нас стабильно заработал online RL. Полная запись уже на канале! #YandexGPT #LLM #AI #ArtificialIntelligence #MachineLearning #DeepLearning #ReinforcementLearning #OnlineRL #NLP #GenerativeAI #YandexForDevelopers #YandexForML #Яндекс #AIDevDay #TechConference #DataScience #ML #AIResearch #LanguageModel #YandexTech

3 weeks, 5 days назад @ youtube.com
Как LLM предсказывает токены
Как LLM предсказывает токены Как LLM предсказывает токены

Это отрывок из доклада Алексея Колесова, CTO в Яндекс R&D. На Practical ML Conf 2025 он рассказал, как ребята учили YandexGPT 5.1 лучше помнить факты и применять знания о них. А ещё показал, как у нас стабильно заработал online RL. Полная запись уже на канале! #YandexGPT #LLM #AI #ArtificialIntelligence #MachineLearning #DeepLearning #ReinforcementLearning #OnlineRL #NLP #GenerativeAI #YandexForDevelopers #YandexForML #Яндекс #AIDevDay #TechConference #DataScience #ML #AIResearch #LanguageModel #YandexTech

1 month назад @ youtube.com
«Реал Мадрид» или «Барселона»: кто круче по мнению LLM
«Реал Мадрид» или «Барселона»: кто круче по мнению LLM «Реал Мадрид» или «Барселона»: кто круче по мнению LLM

Это отрывок из доклада Алексея Колесова, CTO в Яндекс R&D. На Practical ML Conf 2025 он рассказал, как ребята учили YandexGPT 5.1 лучше помнить факты и применять знания о них. А ещё показал, как у нас стабильно заработал online RL. Полная запись уже на канале! #YandexGPT #LLM #AI #MachineLearning #GenerativeAI #YandexForML #YandexForDevelopers #Яндекс #AIDevDay #NeuralNetworks #ArtificialIntelligence #Football #Soccer #RealMadrid #Barcelona #ElClasico #LaLiga #Messi #Ronaldo #TechAndSports #AIInSports #FootballFans #SportsAnalytics #YandexTech #AIComparison

1 month назад @ youtube.com
Что должен знать AI-ассистент
Что должен знать AI-ассистент Что должен знать AI-ассистент

Это отрывок из доклада Алексея Колесова, CTO в Яндекс R&D. На Practical ML Conf 2025он рассказал, как ребята учили YandexGPT 5.1 лучше помнить факты и применять знания о них. А ещё показал, как у нас стабильно заработал online RL. Полная запись уже на канале! #YandexGPT #LLM #AI #ArtificialIntelligence #MachineLearning #DeepLearning #ReinforcementLearning #OnlineRL #NLP #GenerativeAI #YandexForDevelopers #YandexForML #Яндекс #AIDevDay #TechConference #DataScience #ML #AIResearch #LanguageModel #YandexTech

1 month назад @ youtube.com
Data Dojo — встреча ML-сообщества в Москве
Data Dojo — встреча ML-сообщества в Москве Data Dojo — встреча ML-сообщества в Москве

Data Dojo — это сообщество ML-экспертов. Здесь обсуждают тренды, разбирают реальные задачи, делятся опытом и практикуются. Додзё в японской культуре — место Пути, где совершенствуют не только мастерство, но и дух. Мы перенесли этот принцип в мир данных. Программа: Приветственное слово | Владислав Офицеров, модератор встречи, руководитель команды развития нейронных технологий международного поиска, и Пётр Ермаков, ML-бренд-директор Лекция: Обзор трендов и предварительный итоги года | Сергей Овчаренко, руководитель отдела мультимодального анализа и генерации Лекция: Научить AI не бредить, сдать физику и получить права: как мы готовили задачи ML‑квалификации Yandex Cup | Сергей Фиронов, ведущи…

1 month, 1 week назад @ youtube.com
Как обучать LLM: процесс в двух частях
Как обучать LLM: процесс в двух частях Как обучать LLM: процесс в двух частях

Это отрывок из доклада Алексея Колесова, CTO в Яндекс R&D. На Practical ML Conf 2025 он рассказал, как ребята учили YandexGPT 5.1 лучше помнить факты и применять знания о них. А ещё показал, как у нас стабильно заработал online RL. Полная запись уже на канале! #YandexGPT #LLM #AI #ArtificialIntelligence #MachineLearning #DeepLearning #ReinforcementLearning #OnlineRL #NLP #GenerativeAI #YandexForDevelopers #YandexForML #Яндекс #AIDevDay #TechConference #DataScience #ML #AIResearch #LanguageModel #YandexTech

1 month, 1 week назад @ youtube.com
Визуально-языковые модели (VLM) в Яндексе: подходы, данные, подводные камни / Сергей Овчаренко
Визуально-языковые модели (VLM) в Яндексе: подходы, данные, подводные камни / Сергей Овчаренко Визуально-языковые модели (VLM) в Яндексе: подходы, данные, подводные камни / Сергей Овчаренко

Это Сергей Овчаренко, руководитель отдела мультимодальных анализа и генерации в Яндекс R&D. В своём докладе Сергей рассказал о VLM в Яндексе: какие подходы мы используем и с какими подводными камнями сталкиваемся. А еще — о претрейне и о том, почему добиться хорошего качества бывает непросто, даже когда, казалось бы, всё делаешь правильно. Узнать больше о мероприятиях для разработчиков можно тут: https://events.yandex.ru Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/yandexforml #ML #AI #MachineLearning #DeepLearning #LLM #VLM #NeuralNetworks #Transformers #GenerativeAI #NLP #ComputerVision #DataScience #BigData #MLOps #ModelTraining #AIResearch #ArtificialIntellig…

1 month, 1 week назад @ youtube.com
Релиз: что может пойти не так? / Алексей Колесов
Релиз: что может пойти не так? / Алексей Колесов Релиз: что может пойти не так? / Алексей Колесов

Это Алексей Колесов, CTO в Яндекс R&D. Поговорили честно и без прикрас об обратной стороне релизов — о нюансах и неожиданном поведении LLM, с которыми сталкивались на своём опыте, и о том, как решали такие кейсы. Узнать больше о мероприятиях для разработчиков можно тут: https://events.yandex.ru Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/yandexforml #ML #AI #MachineLearning #DeepLearning #LLM #VLM #NeuralNetworks #Transformers #GenerativeAI #NLP #ComputerVision #DataScience #BigData #MLOps #ModelTraining #AIResearch #ArtificialIntelligence #AIDevelopment #AIFuture #Tech #Engineering #Yandex #SberAI #AvitoTech #TBank #AIConference #YandexML #DataEngineering #Reco…

1 month, 1 week назад @ youtube.com
ML Trainings ML Trainings
последний пост 15 часов назад
Проблемы коммуникации в большой организации
Проблемы коммуникации в большой организации Проблемы коммуникации в большой организации 15 часов назад @ youtube.com
Морская пехота США: спецназ сантехников
Морская пехота США: спецназ сантехников Морская пехота США: спецназ сантехников 1 day, 15 hours назад @ youtube.com
Лпр и искусственный интеллект: барьеры и адаптация
Лпр и искусственный интеллект: барьеры и адаптация Лпр и искусственный интеллект: барьеры и адаптация 2 days, 15 hours назад @ youtube.com
Искусственный интеллект США, Китая и СССР объединили
Искусственный интеллект США, Китая и СССР объединили Искусственный интеллект США, Китая и СССР объединили 3 days, 14 hours назад @ youtube.com
Капитанский мостик №25: Альтман разочаровался в ИИ | Microsoft доит программистов | Пьяный робот
Капитанский мостик №25: Альтман разочаровался в ИИ | Microsoft доит программистов | Пьяный робот Капитанский мостик №25: Альтман разочаровался в ИИ | Microsoft доит программистов | Пьяный робот

0:00:00 Начало

0:00:27 Альтман разочаровался в ИИ

0:15:27 Непонятный стандарт для ИИ

0:24:55 ИИ от Маска в 2026

0:32:20 Китайские статьи про ИИ

0:38:57 Microsoft доит программистов

0:41:27 ИИ пишет плохой код

0:47:22 VISA и платежи для агентов

0:52:35 Стор навыков для роботов

1:01:06 X5 про ИИ-риски

1:12:45 Крошечный компьютер для ИИ

1:19:28 Пьяный робот не падает ИИ-саммари:

В этом выпуске обсуждаются ключевые темы, связанные с искусственным интеллектом, включая концепцию сверхсильного интеллекта, проблемы взаимодействия с ИИ, необходимость разработки стандартов и регулирования, а также оптимизм Илона Маска в отношении будущего AGI. Участники подчеркивают важность терминологии и коммуникац…

4 days, 15 hours назад @ youtube.com
Изменение системы оценки из за роста скорости написания
Изменение системы оценки из за роста скорости написания Изменение системы оценки из за роста скорости написания 1 week, 4 days назад @ youtube.com
Подростки предпочитают чат боты вместо родителей
Подростки предпочитают чат боты вместо родителей Подростки предпочитают чат боты вместо родителей 1 week, 4 days назад @ youtube.com
Дмитрий и Валентин обсуждают бомбардировку дата центра
Дмитрий и Валентин обсуждают бомбардировку дата центра Дмитрий и Валентин обсуждают бомбардировку дата центра 1 week, 4 days назад @ youtube.com
Валентин Малых о сложностях кино в космосе
Валентин Малых о сложностях кино в космосе Валентин Малых о сложностях кино в космосе 1 week, 4 days назад @ youtube.com
Optimus не так умён, как кажется
Optimus не так умён, как кажется Optimus не так умён, как кажется 1 week, 4 days назад @ youtube.com
Генерация послания землян
Генерация послания землян Генерация послания землян 1 week, 4 days назад @ youtube.com
Капитанский мостик №24: Optimus упал | Китайцы захватили NeurIPS | Электромобиль от Яндекса
Капитанский мостик №24: Optimus упал | Китайцы захватили NeurIPS | Электромобиль от Яндекса Капитанский мостик №24: Optimus упал | Китайцы захватили NeurIPS | Электромобиль от Яндекса

0:00:00 Начало

0:00:35 Optimus упал

0:07:38 США ставят на роботов

0:11:25 Китайцы захватили NeurIPS

0:31:07 Nvidia следит за твоей GPU

0:37:26 Llama раздора

0:40:53 Электромобиль от Яндекса

0:46:14 ИИ-модели в космосе

0:50:59 Альтман хочет в космос

0:52:42 Китай собрал ЦОД

0:58:16 ChatGPT для подростков

1:03:32 Пентагон за AGI

1:09:14 100 триллионов токенов

1:14:29 Газовая колонка с ИИ ИИ-саммари:

В этом подкасте обсуждаются актуальные темы в области робототехники и науки, включая возможности роботов Optimus, финансирование исследований в США, качество научных публикаций и необходимость пересмотра системы оценки научных результатов. Участники подчеркивают важность связи науки с практикой и …

1 week, 4 days назад @ youtube.com
Капитанский мостик №23: Терминатор знает кунфу | ИИ находит дыру | Доминирование на исходе
Капитанский мостик №23: Терминатор знает кунфу | ИИ находит дыру | Доминирование на исходе Капитанский мостик №23: Терминатор знает кунфу | ИИ находит дыру | Доминирование на исходе

0:00:00 Начало

0:00:36 Темные фабрики

0:06:44 Терминатор знает кунфу

0:12:08 Они не верят в роботов

0:17:07 OpenAI на Нептуне

0:20:48 ИИ находит дыру

0:26:59 ИИ-ландшафт России

0:36:54 Trainium от Amazon

0:39:47 TPU лучше

0:46:04 Baidu тоже так думает

0:48:01 ИИ сожрал память

0:53:34 LoRA на смартфоне

0:58:42 Доминирование на исходе

1:04:09 ИИ против коррупции ИИ-саммари: В этом подкасте обсуждаются последние новости в области робототехники и автоматизации, включая соревнование по машинному переводу, появление темных фабрик, развитие роботов, таких как Т-800, и влияние OpenAI на экосистему разработки AI. Также рассматриваются вопросы безопасности смарт-контрактов и их уязвимости. В этом раз…

2 weeks, 4 days назад @ youtube.com
Стем навыки и будущее LLM: мнение Дмитрия Колодезева
Стем навыки и будущее LLM: мнение Дмитрия Колодезева Стем навыки и будущее LLM: мнение Дмитрия Колодезева 3 weeks, 3 days назад @ youtube.com
Расизм и биотехнологии: где провести черту
Расизм и биотехнологии: где провести черту Расизм и биотехнологии: где провести черту 3 weeks, 3 days назад @ youtube.com
🎧 Podcasts
Lex Fridman AI Podcast Lex Fridman AI Podcast
последний пост 1 week, 6 days назад
#487 – Irving Finkel: Deciphering Secrets of Ancient Civilizations & Flood Myths
#487 – Irving Finkel: Deciphering Secrets of Ancient Civilizations & Flood Myths #487 – Irving Finkel: Deciphering Secrets of Ancient Civilizations & Flood Myths

Irving Finkel is a scholar of ancient languages and a longtime curator at the British Museum, renowned for his expertise in Mesopotamian history and cuneiform writing.

He specializes in reading and interpreting cuneiform inscriptions, including tablets from Sumerian, Akkadian, Babylonian, and Assyrian contexts.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep487-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://shopify.com/lexMiro: Online collaborative whiteboard platform.

Go to https://miro.com/Chevron: Reliable energy for data centers.

1 week, 6 days назад @ lexfridman.com
#486 – Michael Levin: Hidden Reality of Alien Intelligence & Biological Life
#486 – Michael Levin: Hidden Reality of Alien Intelligence & Biological Life #486 – Michael Levin: Hidden Reality of Alien Intelligence & Biological Life

Michael Levin is a biologist at Tufts University working on novel ways to understand and control complex pattern formation in biological systems.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep486-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://upliftdesk.com/lexMiro: Online collaborative whiteboard platform.

Go to https://miro.com/MasterClass: Online classes from world-class experts.

(2:42:41) – Mind uploading(3:01:22) – Alien intelligence(3:16:17) – Advice for young people(3:22:46) – Questions for AGI

3 weeks, 4 days назад @ lexfridman.com
#485 – David Kirtley: Nuclear Fusion, Plasma Physics, and the Future of Energy
#485 – David Kirtley: Nuclear Fusion, Plasma Physics, and the Future of Energy #485 – David Kirtley: Nuclear Fusion, Plasma Physics, and the Future of Energy

David Kirtley is a nuclear fusion engineer and CEO of Helion Energy, a company working on building the world's first commercial fusion power plant by 2028.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep485-sc

See below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc. Transcript:

https://lexfridman.com/david-kirtley-transcript CONTACT LEX:

Feedback - give feedback to Lex: https://lexfridman.com/survey

AMA - submit questions, videos or call-in: https://lexfridman.com/ama

Hiring - join our team: https://lexfridman.com/hiring

Other - other ways to get in touch: https://lexfridman.com/contact EPISODE LINKS:

David's X: htt…

1 month, 1 week назад @ lexfridman.com
#484 – Dan Houser: GTA, Red Dead Redemption, Rockstar, Absurd & Future of Gaming
#484 – Dan Houser: GTA, Red Dead Redemption, Rockstar, Absurd & Future of Gaming #484 – Dan Houser: GTA, Red Dead Redemption, Rockstar, Absurd & Future of Gaming

Dan Houser is co-founder of Rockstar Games and is a legendary creative mind behind Grand Theft Auto (GTA) and Red Dead Redemption series of video games.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep484-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://box.com/aiUPLIFT Desk: Standing desks and office ergonomics.

Go to https://drinkLMNT.com/lexOUTLINE:(00:00) – Introduction(01:29) – Sponsors, Comments, and Reflections(11:32) – Greatest films of all time(23:45) – Making video games(26:36) – GTA 3(29:55) – Open world video games(32:42) – Character creation(36:09) – Superintelligent AI in A Bette…

1 month, 3 weeks назад @ lexfridman.com
#483 – Julia Shaw: Criminal Psychology of Murder, Serial Killers, Memory & Sex
#483 – Julia Shaw: Criminal Psychology of Murder, Serial Killers, Memory & Sex #483 – Julia Shaw: Criminal Psychology of Murder, Serial Killers, Memory & Sex

Julia Shaw is a criminal psychologist and author who in her books explores human nature, including psychopathy, violent crime, the psychology of evil, police interrogation, false memory manipulation, deception detection, and human sexuality.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep483-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://shopify.com/lexBetterHelp: Online therapy and counseling.

Go to https://betterhelp.com/lexLMNT: Zero-sugar electrolyte drink mix.

Go to https://drinkLMNT.com/lexAG1: All-in-one daily nutrition drink.

2 months, 1 week назад @ lexfridman.com
#482 – Pavel Durov: Telegram, Freedom, Censorship, Money, Power & Human Nature
#482 – Pavel Durov: Telegram, Freedom, Censorship, Money, Power & Human Nature #482 – Pavel Durov: Telegram, Freedom, Censorship, Money, Power & Human Nature

Pavel Durov is the founder and CEO of Telegram.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep482-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Transcript:https://lexfridman.com/pavel-durov-transcriptCONTACT LEX:Feedback – give feedback to Lex: https://lexfridman.com/surveyAMA – submit questions, videos or call-in: https://lexfridman.com/amaHiring – join our team: https://lexfridman.com/hiringOther – other ways to get in touch: https://lexfridman.com/contactEPISODE LINKS:Pavel’s Telegram: https://t.me/durovPavel’s X: https://x.com/durovTelegram: https://telegram.org/Telegram Contests: https://contest.c…

2 months, 3 weeks назад @ lexfridman.com
#481 – Norman Ohler: Hitler, Nazis, Drugs, WW2, Blitzkrieg, LSD, MKUltra & CIA
#481 – Norman Ohler: Hitler, Nazis, Drugs, WW2, Blitzkrieg, LSD, MKUltra & CIA #481 – Norman Ohler: Hitler, Nazis, Drugs, WW2, Blitzkrieg, LSD, MKUltra & CIA

Norman Ohler is a historian and author of “Blitzed: Drugs in the Third Reich,” a book that investigates the role of psychoactive drugs, particularly stimulants such as methamphetamine, in the military history of World War II.

It is a book that two legendary historians Ian Kershaw and Antony Beevor give very high praise for its depth of research.

Norman also wrote “Tripped: Nazi Germany, the CIA, and the Dawn of the Psychedelic Age”, and he is working on a new book “Stoned Sapiens” looking at the history of human civilization through the lens of drugs.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep481-scSee below for timestamps, transcript, and to give f…

3 months, 1 week назад @ lexfridman.com
#480 – Dave Hone: T-Rex, Dinosaurs, Extinction, Evolution, and Jurassic Park
#480 – Dave Hone: T-Rex, Dinosaurs, Extinction, Evolution, and Jurassic Park #480 – Dave Hone: T-Rex, Dinosaurs, Extinction, Evolution, and Jurassic Park

Dave Hone is a paleontologist, expert on dinosaurs, co-host of the Terrible Lizards podcast, and author of numerous scientific papers and books on the behavior and ecology of dinosaurs.

He lectures at Queen Mary University of London on topics of Ecology, Zoology, Biology, and Evolution.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep480-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://go.lindy.ai/lexBetterHelp: Online therapy and counseling.

Go to https://shopify.com/lexLMNT: Zero-sugar electrolyte drink mix.

3 months, 3 weeks назад @ lexfridman.com
#479 – Dave Plummer: Programming, Autism, and Old-School Microsoft Stories
#479 – Dave Plummer: Programming, Autism, and Old-School Microsoft Stories #479 – Dave Plummer: Programming, Autism, and Old-School Microsoft Stories

Dave Plummer is a programmer, former Microsoft software engineer (Windows 95, NT, XP), creator of Task Manager, author of two books on autism, and host of the Dave’s Garage YouTube channel, where he shares stories from his career, insights on software development, and deep dives into technology.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep479-scSee below for timestamps, and to give feedback, submit questions, contact Lex, etc.

Go to https://upliftdesk.com/lexZocDoc: App that helps patients find healthcare providers.

Go to https://zocdoc.com/lexFin: AI agent for customer service.

Go to https://fin.ai/lexAllio Capital: AI-powered investment app that use…

3 months, 3 weeks назад @ lexfridman.com
#478 – Scott Horton: The Case Against War and the Military Industrial Complex
#478 – Scott Horton: The Case Against War and the Military Industrial Complex #478 – Scott Horton: The Case Against War and the Military Industrial Complex

Scott Horton is the director of the Libertarian Institute, editorial director of Antiwar.com, host of The Scott Horton Show, co-host of Provoked, and for the past three decades a staunch critic of U.S. military interventionism.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep478-scSee below for timestamps, and to give feedback, submit questions, contact Lex, etc.

Go to https://alliocapital.com/Hampton: Community for high-growth founders and CEOs.

Go to https://joinhampton.com/lexBetterHelp: Online therapy and counseling.

Go to https://drinkag1.com/lexOUTLINE:(00:00) – Introduction(00:35) – Sponsors, Comments, and Reflections(09:14) – From the Cold War to …

4 months назад @ lexfridman.com
#477 – Keyu Jin: China’s Economy, Tariffs, Trade, Trump, Communism & Capitalism
#477 – Keyu Jin: China’s Economy, Tariffs, Trade, Trump, Communism & Capitalism #477 – Keyu Jin: China’s Economy, Tariffs, Trade, Trump, Communism & Capitalism

Keyu Jin is an economist specializing in China’s economy, international macroeconomics, global trade imbalances, and financial policy.

She is the author of The New China Playbook: Beyond Socialism and Capitalism.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep477-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://alliocapital.com/UPLIFT Desk: Standing desks and office ergonomics.

Go to https://upliftdesk.com/lexHampton: Community for high-growth founders and CEOs.

4 months, 2 weeks назад @ lexfridman.com
#476 – Jack Weatherford: Genghis Khan and the Mongol Empire
#476 – Jack Weatherford: Genghis Khan and the Mongol Empire #476 – Jack Weatherford: Genghis Khan and the Mongol Empire

Jack Weatherford is an anthropologist and historian specializing in Genghis Khan and the Mongol Empire.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep476-scSee below for timestamps, and to give feedback, submit questions, contact Lex, etc.

Go to https://alliocapital.com/ZocDoc: App that helps patients find healthcare providers.

Go to https://zocdoc.com/lexFin: AI agent for customer service.

Go to https://shopify.com/lexMasterClass: Online classes from world-class experts.

4 months, 3 weeks назад @ lexfridman.com
#475 – Demis Hassabis: Future of AI, Simulating Reality, Physics and Video Games
#475 – Demis Hassabis: Future of AI, Simulating Reality, Physics and Video Games #475 – Demis Hassabis: Future of AI, Simulating Reality, Physics and Video Games

Demis Hassabis is the CEO of Google DeepMind and Nobel Prize winner for his groundbreaking work in protein structure prediction using AI.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep475-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://joinhampton.com/lexFin: AI agent for customer service.

Go to https://shopify.com/lexLMNT: Zero-sugar electrolyte drink mix.

Go to https://drinkLMNT.com/lexAG1: All-in-one daily nutrition drink.

5 months назад @ lexfridman.com
#474 – DHH: Future of Programming, AI, Ruby on Rails, Productivity & Parenting
#474 – DHH: Future of Programming, AI, Ruby on Rails, Productivity & Parenting #474 – DHH: Future of Programming, AI, Ruby on Rails, Productivity & Parenting

David Heinemeier Hansson (aka DHH) is a legendary programmer, creator of Ruby on Rails, co-owner & CTO of 37signals that created Basecamp, HEY, & ONCE, and is a NYT-best-selling author (with Jason Fried) of 4 books: REWORK, REMOTE, Getting Real, and It Doesn’t Have To Be Crazy At Work.

He is also a race car driver, including a class-winning performance at the 24 hour Le Mans race.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep474-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://upliftdesk.com/lexLindy: No-code AI agent builder.

Go to https://go.lindy.ai/lexLMNT: Zero-sugar electrolyte drink …

5 months, 2 weeks назад @ lexfridman.com
#473 – Iran War Debate: Nuclear Weapons, Trump, Peace, Power & the Middle East
#473 – Iran War Debate: Nuclear Weapons, Trump, Peace, Power & the Middle East #473 – Iran War Debate: Nuclear Weapons, Trump, Peace, Power & the Middle East

Debate on Iran war between Scott Horton and Mark Dubowitz.

Scott Horton is the author and director of the Libertarian Institute, editorial director of Antiwar.com, host of The Scott Horton Show, and for the past three decades, a staunch critic of U.S. foreign policy and military interventionism.

Mark Dubowitz is the chief executive of the Foundation for Defense of Democracies, host of the Iran Breakdown podcast, and a leading expert on Iran and its nuclear program for over 20 years.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep473-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://drinkLMNT.c…

6 months назад @ lexfridman.com
Microsoft Research Podcast Microsoft Research Podcast
последний пост 3 weeks, 3 days назад
Ideas: Community building, machine learning, and the future of AI
Ideas: Community building, machine learning, and the future of AI Ideas: Community building, machine learning, and the future of AI

This week, machine learning researchers around the world will be attending the annual Conference on Neural Information Processing Systems, or NeurIPS.

In this series, we’ll explore the technologies that are shaping our future and the big ideas that propel them forward.

So around that time when I started my PhD at Penn, I was working in machine learning theory and algorithmic economics.

How had you experienced a lack of community or network of women in machine learning before the founding of WiML?

So particularly when working on topics related to fairness, I’ve ended up focusing a bunch on stuff to do with marginalized groups as part of my responsible AI work.

3 weeks, 3 days назад @ microsoft.com
Ideas: More AI-resilient biosecurity with the Paraphrase Project
Ideas: More AI-resilient biosecurity with the Paraphrase Project Ideas: More AI-resilient biosecurity with the Paraphrase Project

Today, I’m excited to talk about the Paraphrase Project, an effort I co-led exploring how advances in AI tools for protein design might impact biosecurity.

These “patches,” akin to those in cybersecurity, have now been shared with organizations globally to strengthen biosecurity screening.

The project highlights that the same AI tools capable of incredible good can also be misused, requiring us to be vigilant, thoughtful, and creative so we continue to get the most benefit out of AI tools while working to ensure that we avoid costly misuses.

So things like, how similar is this to that template, wild-type protein structure that we used as our conditioning information?

But I feel like broadly…

2 months, 2 weeks назад @ microsoft.com
Coauthor roundtable: Reflecting on healthcare economics, biomedical research, and medical education
Coauthor roundtable: Reflecting on healthcare economics, biomedical research, and medical education Coauthor roundtable: Reflecting on healthcare economics, biomedical research, and medical education

KOHANE: So I think you’ve “nerd sniped” me because you [LAUGHTER]—which is all too easy—but I think there’s a central issue here.

But I actually think this is dark matter of human organizational technology that is not well understood.

AZEEM AZHAR: We didn’t talk about, you know, AI in its ability to potentially do this, which is to extend the clinician’s presence throughout the week.

And so I think there’s always going to be an opening for either differences of opinion or agreeing with you too much.

And this gets into whether AI is really going to get almost to the ab initio understanding of human biology.

4 months назад @ microsoft.com
Reimagining healthcare delivery and public health with AI
Reimagining healthcare delivery and public health with AI Reimagining healthcare delivery and public health with AI

We are sorry, the page you requested cannot be found.

The page you are looking for could not be found or is no longer available.

4 months, 2 weeks назад @ microsoft.com
Navigating medical education in the era of generative AI
Navigating medical education in the era of generative AI Navigating medical education in the era of generative AI

Prior to med school, Daniel pursued experiences that cultivated his interest in the application of AI in medical practice and education.

Really, really looking forward to this chat.

There’s AI before ChatGPT and before, you know, generative AI really became a big thing, and then afterwards.

And then after we talk about what’s really happening, what do you think should happen in medical education given the reality of generative AI?

And I do agree [that] AI really gives us real hope that we can make it true.

5 months назад @ microsoft.com
AI Testing and Evaluation: Reflections
AI Testing and Evaluation: Reflections AI Testing and Evaluation: Reflections

Our goal is to learn from their successes and their stumbles to move the science and practice of AI testing forward.

We have examples, like the pharmaceutical or medical device industry experts with whom you spoke, that’s really, you know, testing … there is a pre-deployment requirement.

And the third is just how rigid versus adaptive these testing and evaluation regimes or frameworks are in these different domains.

I really agree that there has been a lot of emphasis to date on, sort of, testing models upstream, the AI model evaluation.

You know, I think there’s been real progress already in the AI evaluation and testing ecosystem in the public-private partnership context.

5 months, 1 week назад @ microsoft.com
AI Testing and Evaluation: Learnings from cybersecurity
AI Testing and Evaluation: Learnings from cybersecurity AI Testing and Evaluation: Learnings from cybersecurity

Absolutely, I really, really was.

As a principal director on the Microsoft AI Red Team, Tori leads all AI security and safety red team operations, as well as dangerous capability testing, to directly inform C-suite decision-makers.

This year, we’ve pulled a lot of those assets and insights into the Azure [AI] Foundry AI Red Teaming Agent (opens in new tab).

So you can get a little taste of what we do day to day in the AI Red Teaming Agent.

WESTERHOFF: I think the most important takeaway from those lessons is that AI security is truly a team sport.

5 months, 2 weeks назад @ microsoft.com
How AI will accelerate biomedical research and discovery
How AI will accelerate biomedical research and discovery How AI will accelerate biomedical research and discovery

Dr. Eric Topol is the executive vice president of the biomedical research non-profit Scripps Research, where he founded and now directs the Scripps Research Translational Institute.

Let’s continue our deep dive on AI and biomedical research with this conversation with Noubar Afeyan:LEE: Noubar, thanks so much for joining.

And there’s the origin story of contact with AI, you know, before the emergence of generative AI and afterwards.

What is going on today with respect to AI really being used for something meaningful in the design and development of drugs?

TOPOL: You would read about how, you know, data is the new oil and, you know, gold and whatnot.

5 months, 2 weeks назад @ microsoft.com
AI Testing and Evaluation: Learnings from pharmaceuticals and medical devices
AI Testing and Evaluation: Learnings from pharmaceuticals and medical devices AI Testing and Evaluation: Learnings from pharmaceuticals and medical devices

Our goal is to learn from their successes and their stumbles to move the science and practice of AI testing forward.

During the pre-market phase, medical testing establishes baseline safety and effectiveness metrics through bench testing, performance standards, and clinical studies.

SULLIVAN: So medical devices face a pretty prescriptive multi-level testing path before they hit the market.

We are looking into medical devices, as well, obviously, but also other technologies in advanced medical computing.

So we see Phase 3 trials as something that occurs in the medical devices and pharmaceuticals field.

5 months, 3 weeks назад @ microsoft.com
AI Testing and Evaluation: Learnings from genome editing
AI Testing and Evaluation: Learnings from genome editing AI Testing and Evaluation: Learnings from genome editing

As generative AI continues to advance, Microsoft has gathered a range of experts—from genome editing to cybersecurity—to share how their fields approach evaluation and risk assessment.

CHARO: Well, you know, genome editing is both very old and very new.

Now the earliest forms of genome editing were very inefficient, and so we didn’t worry that much.

But the bottom-line thing to remember, the way to really think about it is, we don’t regulate genome editing; we regulate the things that use genome editing.

And she said, you know, we don’t regulate genome editing; we regulate the things that use genome editing.

5 months, 4 weeks назад @ microsoft.com
AI Testing and Evaluation: Learnings from Science and Industry
AI Testing and Evaluation: Learnings from Science and Industry AI Testing and Evaluation: Learnings from Science and Industry

Our goal is to learn from their successes and their stumbles to move the science and practice of AI testing forward.

And I think, really, there are two reasons why tech is so, kind of, representative of that kind of challenge that I’ve always found fascinating.

Continues to be a really important topic in the AI policy conversation right now, I think, for really good reason.

Testing is an important component for governance and AI and, of course, in all of these other domains, as well.

I think about almost, like, in the near to mid-term, like three issues that we need to address in the AI, kind of, policy and testing context.

6 months назад @ microsoft.com
The AI Revolution in Medicine, Revisited: How AI is reshaping the future of healthcare and medical research
The AI Revolution in Medicine, Revisited: How AI is reshaping the future of healthcare and medical research The AI Revolution in Medicine, Revisited: How AI is reshaping the future of healthcare and medical research

LEE: Yeah, yeah.

It cannot—as, you know, Bill was saying—it cannot learn from your document.

And I don’t know if the two of you remember, but I ended up doing a lot of tests.

I don’t know if you know, but just recently, there was a paper that was published on a scientific discovery using o3- mini (opens in new tab).

Like, if you have a human trained for one task and you put them into another task, then you don’t … you often don’t know.

6 months, 2 weeks назад @ microsoft.com
What AI's impact on individuals means for the health workforce and industry
What AI's impact on individuals means for the health workforce and industry What AI's impact on individuals means for the health workforce and industry

So I don’t think we should be surprised that business schools matter on this because we care about management.

That’s really going to change the way, like, middle school works, was my thinking at the time.

We’ve gone from AI being highly discriminative to AI that’s able to explore the world in particular ways.

The symptoms that they’re showing are quite different, and also their compliance is really, really different.

LEE: Yeah, really, really interesting.

7 months назад @ microsoft.com
Abstracts: Zero-shot models in single-cell biology with Alex Lu
Abstracts: Zero-shot models in single-cell biology with Alex Lu Abstracts: Zero-shot models in single-cell biology with Alex Lu

And single-cell foundation models claim to be capable of unraveling deeper insights than ever before.

Basically, we showed that single-cell foundation models perform worse in settings that are fundamental to biological discovery than much simpler machine learning and statistical methods that were used in the field before single-cell foundation models emerged and are the go-to standard for unpacking meaning from these complicated experiments.

And the way to understand this is because single-cell foundation models are trained in a way that tries to expose these models to millions of single-cells.

But let’s also talk about the impact for methodologists, people who are trying to improve these s…

7 months, 1 week назад @ microsoft.com
Abstracts: Aurora with Megan Stanley and Wessel Bruinsma
Abstracts: Aurora with Megan Stanley and Wessel Bruinsma Abstracts: Aurora with Megan Stanley and Wessel Bruinsma

This is such exciting work about environmental forecasting, so we’re happy to have the two of you join us today.

Mostly because AI weather forecasting models are computationally much more efficient and can even be more accurate.

What’s unfortunate though, about this big step forward, is that these developments are mostly limited to the setting of weather forecasting.

Weather forecasting is very important, obviously, but there are many other important environmental forecasting problems out there, such as air pollution forecasting or ocean wave forecasting.

STANLEY: Current approaches have really focused training very specifically on weather forecasting models.

7 months, 1 week назад @ microsoft.com
NLP Highlights NLP Highlights
последний пост None
Data Skeptic
последний пост 1 week назад
Eye Tracking in Recommender Systems
Eye Tracking in Recommender Systems Eye Tracking in Recommender Systems

In this episode, Santiago de Leon takes us deep into the world of eye tracking and its revolutionary applications in recommender systems. As a researcher at the Kempelin Institute and Brno University, Santiago explains the mechanics of eye tracking technology—how it captures gaze data and processes it into fixations and saccades to reveal user browsing patterns. He introduces the groundbreaking RecGaze dataset, the first eye tracking dataset specifically designed for recommender systems research, which opens new possibilities for understanding how users interact with carousel interfaces like Netflix. Through collaboration between psychologists and AI researchers, Santiago's work demonstrate…

1 week назад @ dataskeptic.com
Cracking the Cold Start Problem
Cracking the Cold Start Problem Cracking the Cold Start Problem

In this episode of Data Skeptic, we dive deep into the technical foundations of building modern recommender systems. Unlike traditional machine learning classification problems where you can simply apply XGBoost to tabular data, recommender systems require sophisticated hybrid approaches that combine multiple techniques. Our guest, Boya Xu, an assistant professor of marketing at Virginia Tech, walks us through a cutting-edge method that integrates three key components: collaborative filtering for dimensionality reduction, embeddings to represent users and items in latent space, and bandit learning to balance exploration and exploitation when deploying new recommendations. Boya shares insigh…

2 weeks, 3 days назад @ dataskeptic.com
Designing Recommender Systems for Digital Humanities
Designing Recommender Systems for Digital Humanities Designing Recommender Systems for Digital Humanities

In this episode of Data Skeptic, we explore the fascinating intersection of recommender systems and digital humanities with guest Florian Atzenhofer-Baumgartner, a PhD student at Graz University of Technology. Florian is working on Monasterium.net, Europe's largest online collection of historical charters, containing millions of medieval and early modern documents from across the continent. The conversation delves into why traditional recommender systems fall short in the digital humanities space, where users range from expert historians and genealogists to art historians and linguists, each with unique research needs and information-seeking behaviors. Florian explains the technical challen…

1 month назад @ dataskeptic.com
DataRec Library for Reproducible in Recommend Systems
DataRec Library for Reproducible in Recommend Systems DataRec Library for Reproducible in Recommend Systems

In this episode of Data Skeptic's Recommender Systems series, host Kyle Polich explores DataRec, a new Python library designed to bring reproducibility and standardization to recommender systems research. Guest Alberto Carlo Mario Mancino, a postdoc researcher from Politecnico di Bari, Italy, discusses the challenges of dataset management in recommendation research—from version control issues to preprocessing inconsistencies—and how DataRec provides automated downloads, checksum verification, and standardized filtering strategies for popular datasets like MovieLens, Last.fm, and Amazon reviews. The conversation covers Alberto's research journey through knowledge graphs, graph-based recommen…

1 month, 1 week назад @ dataskeptic.com
Shilling Attacks on Recommender Systems
Shilling Attacks on Recommender Systems Shilling Attacks on Recommender Systems

In this episode of Data Skeptic's Recommender Systems series, Kyle sits down with Aditya Chichani, a senior machine learning engineer at Walmart, to explore the darker side of recommendation algorithms. The conversation centers on shilling attacks—a form of manipulation where malicious actors create multiple fake profiles to game recommender systems, either to promote specific items or sabotage competitors. Aditya, who researched these attacks during his undergraduate studies at SPIT before completing his master's in computer science with a data science specialization at UC Berkeley, explains how these vulnerabilities emerge particularly in collaborative filtering systems. From promoting a …

1 month, 2 weeks назад @ dataskeptic.com
Music Playlist Recommendations
Music Playlist Recommendations Music Playlist Recommendations

In this episode, Rebecca Salganik, a PhD student at the University of Rochester with a background in vocal performance and composition, discusses her research on fairness in music recommendation systems. She explores three key types of fairness—group, individual, and counterfactual—and examines how algorithms create challenges like popularity bias (favoring mainstream content) and multi-interest bias (underserving users with diverse tastes). Rebecca introduces LARP, her multi-stage multimodal framework for playlist continuation that uses contrastive learning to align text and audio representations, learn song relationships, and create playlist-level embeddings to address the cold start prob…

1 month, 3 weeks назад @ dataskeptic.com
Bypassing the Popularity Bias
Bypassing the Popularity Bias Bypassing the Popularity Bias 2 months, 1 week назад @ dataskeptic.com
Sustainable Recommender Systems for Tourism
Sustainable Recommender Systems for Tourism Sustainable Recommender Systems for Tourism

In this episode, we speak with Ashmi Banerjee, a doctoral candidate at the Technical University of Munich, about her pioneering research on AI-powered recommender systems in tourism. Ashmi illuminates how these systems can address exposure bias while promoting more sustainable tourism practices through innovative approaches to data acquisition and algorithm design. Key highlights include leveraging large language models for synthetic data generation, developing recommendation architectures that balance user satisfaction with environmental concerns, and creating frameworks that distribute tourism more equitably across destinations. Ashmi's insights offer valuable perspectives for both AI res…

2 months, 2 weeks назад @ dataskeptic.com
Interpretable Real Estate Recommendations
Interpretable Real Estate Recommendations Interpretable Real Estate Recommendations

In this episode of Data Skeptic's Recommender Systems series, host Kyle Polich interviews Dr. Kunal Mukherjee, a postdoctoral research associate at Virginia Tech, about the paper "Z-REx: Human-Interpretable GNN Explanations for Real Estate Recommendations" The discussion explores how the post-COVID real estate landscape has created a need for better recommendation systems that can introduce home buyers to emerging neighborhoods they might not know about. Dr. Mukherjee, explains how his team developed a graph neural network approach that not only recommends properties but provides human-interpretable explanations for why certain regions are suggested. The conversation covers the advantages o…

3 months назад @ dataskeptic.com
Why Am I Seeing This?
Why Am I Seeing This? Why Am I Seeing This?

In this episode of Data Skeptic, we explore the challenges of studying social media recommender systems when exposure data isn't accessible. Our guests Sabrina Guidotti, Gregor Donabauer, and Dimitri Ognibene introduce their innovative "recommender neutral user model" for inferring the influence of opaque algorithms.

3 months, 2 weeks назад @ dataskeptic.com
Eco-aware GNN Recommenders
Eco-aware GNN Recommenders Eco-aware GNN Recommenders

In this episode of Data Skeptic, we dive into eco-friendly AI with Antonio Purificato, a PhD student from Sapienza University of Rome. Antonio discusses his research on "EcoAware Graph Neural Networks for Sustainable Recommendations" and explores how we can measure and reduce the environmental impact of recommender systems without sacrificing performance.

3 months, 3 weeks назад @ dataskeptic.com
Networks and Recommender Systems
Networks and Recommender Systems Networks and Recommender Systems

Kyle reveals the next season's topic will be "Recommender Systems". Asaf shares insights on how network science contributes to the recommender system field.

4 months, 1 week назад @ dataskeptic.com
Network of Past Guests Collaborations
Network of Past Guests Collaborations Network of Past Guests Collaborations

Kyle and Asaf discuss a project in which we link former guests of the podcast based on their co-authorship of academic papers.

5 months назад @ dataskeptic.com
The Network Diversion Problem
The Network Diversion Problem The Network Diversion Problem

In this episode, Professor Pål Grønås Drange from the University of Bergen, introduces the field of Parameterized Complexity - a powerful framework for tackling hard computational problems by focusing on specific structural aspects of the input. This framework allows researchers to solve NP-complete problems more efficiently when certain parameters, like the structure of the graph, are "well-behaved". At the center of the discussion is the network diversion problem, where the goal isn’t to block all routes between two points in a network, but to force flow - such as traffic, electricity, or data - through a specific path. While this problem appears deceptively similar to the classic "Min.Cu…

5 months, 3 weeks назад @ dataskeptic.com
Complex Dynamic in Networks
Complex Dynamic in Networks Complex Dynamic in Networks

In this episode, we learn why simply analyzing the structure of a network is not enough, and how the dynamics - the actual mechanisms of interaction between components - can drastically change how information or influence spreads. Our guest, Professor Baruch Barzel of Bar-Ilan University, is a leading researcher in network dynamics and complex systems ranging from biology to infrastructure and beyond. BarzelLab BarzelLab on Youtube Paper in focus: Universality in network dynamics, 2013

6 months назад @ dataskeptic.com
SuperDataScience SuperDataScience
последний пост 2 days, 10 hours назад
951: Context Engineering, Multiplayer AI and Effective Search, with Dropbox’s Josh Clemm
951: Context Engineering, Multiplayer AI and Effective Search, with Dropbox’s Josh Clemm 951: Context Engineering, Multiplayer AI and Effective Search, with Dropbox’s Josh Clemm

VP of Engineering at Dropbox Josh Clemm speaks to Jon Krohn about consolidating search tools across apps with the AI-powered workspace, Dropbox Dash, the new collaborative AI systems that enhance interoperability between team members and their projects, and how to avoid “context rot”. Dropbox Dash gives users the best of Dropbox’s cloud storage and search functions, plus a “universal search” ability to locate information across multimedia and apps. “AI really needs to understand you and your team, first and foremost, and all that connected data,” says Josh. This episode is brought to you by the ⁠⁠Dell⁠⁠, by ⁠⁠Intel⁠⁠, by Airia and by MongoDB. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www…

2 days, 10 hours назад @ podtrac.com
950: Happy Holidays from All of Us at the SuperDataScience Podcast
950: Happy Holidays from All of Us at the SuperDataScience Podcast 950: Happy Holidays from All of Us at the SuperDataScience Podcast

In this special holiday episode, the SuperDataScience Podcast team comes together to wish you happy holidays and thank you for listening throughout the year. Team members from around the world share warm greetings in their own voices and languages as we reflect on another year of learning, curiosity, and community. From all of us at SDS, we wish you a joyful holiday season and look forward to bringing you more data science, machine learning, and AI content in the year ahead. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/950⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

6 days, 10 hours назад @ podtrac.com
949: Why AI Keeps Failing Society, with Stanford professor Alex “Sandy” Pentland
949: Why AI Keeps Failing Society, with Stanford professor Alex “Sandy” Pentland 949: Why AI Keeps Failing Society, with Stanford professor Alex “Sandy” Pentland

Alex “Sandy” Pentland, Toshiba Professor of Media Arts & Science at MIT and Fellow at Stanford, speaks to Jon Krohn about his new book, Shared Wisdom, why he attributes AI to the collapse of the Soviet Union, and why those risks to society could still be relevant today. We can only achieve better system performance, Alex says, when we build tools that keep step with the way that people make decisions. Listen to the episode to hear Alex talk about how he is helping make AI agents work for individuals rather than the companies that develop them, and his work in making sure that systems operate consistently and fairly across the world. This episode is brought to you by the⁠ ⁠⁠⁠⁠Dell⁠⁠⁠, by⁠ ⁠⁠…

1 week, 2 days назад @ podtrac.com
948: In Case You Missed It in November 2025
948: In Case You Missed It in November 2025 948: In Case You Missed It in November 2025

In this November episode of “In Case You Missed It” series, Jon Krohn selects his favorite clips from the month. Hear from Shirish Gupta and Tyler Cox (Episode 939), Vikoy Pandey (Episode 941), Marc Dupuis (Episode 937), and Maya Ackerman (Episode 943) on getting back to human motivation and the importance of evaluating the tools and data we use. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/948⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

1 week, 6 days назад @ podtrac.com
947: How to Get Hired at Top Firms like Netflix and Spotify, with Jeff Li
947: How to Get Hired at Top Firms like Netflix and Spotify, with Jeff Li 947: How to Get Hired at Top Firms like Netflix and Spotify, with Jeff Li

Jeff Li tells Jon Krohn what it's like to work at scale as a data scientist and a machine learning engineer at Netflix, Spotify and DoorDash, as well as how to get a foot in the door at these companies. Jeff also discusses how to run forecasts and trends, and how to read their results. Listen to hear Jeff Li discuss how Spotify became a podcast powerhouse, his startup move.ai, and the tools he uses every day. This episode is brought to you by the ⁠⁠Dell⁠⁠, by ⁠⁠Intel⁠⁠, by Fabi, and by Airia. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/947⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship informat…

2 weeks, 2 days назад @ podtrac.com
946: How Robotaxis Are Transforming Cities
946: How Robotaxis Are Transforming Cities 946: How Robotaxis Are Transforming Cities

Jon Krohn looks into the benefits of robotaxis, from safety to affordability, in this Five-Minute Friday. Hear about Waymo’s partnership with Jaguar Land Rover, the latest safety studies concerning driverless vehicles, and a case for robotaxis becoming the preferred method of transport in the US, where households spend roughly 15% of their budget on vehicle ownership. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/946⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

2 weeks, 6 days назад @ podtrac.com
945: AI is a Joke, with Joel Beasley
945: AI is a Joke, with Joel Beasley 945: AI is a Joke, with Joel Beasley

Is there humor in data? Joel Beasley, host of Modern CTO, tells Jon Krohn how he used AI to turn his sights to stand-up comedy. He also shares his tips on tech leadership that he learned from his popular podcast, Modern CTO, and how he is using generative AI as a collaborative partner in his creative work. This episode is brought to you by the ⁠⁠Dell⁠⁠, by ⁠⁠Intel⁠⁠, by Fabi, and by Gurobi⁠⁠⁠. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/945⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information. In this episode you will learn: (02:14) Joel Beasley on his comedy career (19:04) Applying the ‘me…

3 weeks, 2 days назад @ podtrac.com
944: Gemini 3 Pro: Google’s Back on Top
944: Gemini 3 Pro: Google’s Back on Top 944: Gemini 3 Pro: Google’s Back on Top

Google is steaming ahead with launching its top-league new Gemini 3 Pro model across their product suite, from Google Search to Vertex AI cloud services. The multinational tech company is also letting eager early adopters like Wayfair and GitHub. Get all the detailed data, its performance across hard-to-game industry benchmarks, and what this all means for the way you use generative AI, in this week’s episode. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/944⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

3 weeks, 6 days назад @ podtrac.com
943: Creative Machines: AI in Music and Art, with Prof. Maya Ackerman
943: Creative Machines: AI in Music and Art, with Prof. Maya Ackerman 943: Creative Machines: AI in Music and Art, with Prof. Maya Ackerman

Creative human-AI partnerships and AI-generated music: WaveAI CEO and co-founder Maya Ackerman speaks with Jon Krohn about learning to see – and accept – AI’s potential as a creative partner in a human-centric, AI-forward future. Listen to the episode to hear Maya Ackerman discuss reframing hallucination as a creative force, her work at WaveAI, and how to push the boundaries of creativity using generative AI. This episode is brought to you by the ⁠⁠Dell⁠⁠, by ⁠⁠Intel⁠⁠, by Gurobi⁠⁠⁠ and by Airia. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/943⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship informat…

1 month назад @ podtrac.com
942: Odds of AGI by 2040? LEAP Expert Forecasts and Workforce Implications
942: Odds of AGI by 2040? LEAP Expert Forecasts and Workforce Implications 942: Odds of AGI by 2040? LEAP Expert Forecasts and Workforce Implications

What’s on the horizon for AI? Jon Krohn wades through opinions from more than experts, curated by the Longitudinal Expert AI Panel (LEAP), about what we can expect from the industry. From estimates on AI-assisted workers through energy consumption to AI performance in highly skilled domains, find out just how much LEAP thinkers believe AI is permeating our daily work and life in this Five-Minute Friday. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/942⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

1 month назад @ podtrac.com
941: Multi-Agent Human Societies, with Dr. Vijoy Pandey
941: Multi-Agent Human Societies, with Dr. Vijoy Pandey 941: Multi-Agent Human Societies, with Dr. Vijoy Pandey

Vijoy Pandey imagines a bold new society in which agents and humans make scientific discoveries and complete physical tasks together, and he tells Jon Krohn about his work at AGNTCY, Cisco’s open-source platform for the Internet of Agents. Listen to the episode to hear Vijoy Pandey talk about how a future society in which multi-agents and humans interact may be a real possibility, what TCP/IP is, how to find trustworthy AI agents, and how to get your hands on AGNTCY today! This episode is brought to you by the Dell⁠⁠⁠⁠⁠⁠⁠⁠⁠, by⁠ ⁠⁠Intel⁠⁠⁠, by ⁠Fabi⁠ and by ⁠Gurobi⁠⁠⁠⁠. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/941⁠⁠⁠ Interested in sponsoring a SuperDataScience Po…

1 month, 1 week назад @ podtrac.com
SDS 940: In Case You Missed It in October 2025
SDS 940: In Case You Missed It in October 2025 SDS 940: In Case You Missed It in October 2025

Jon Krohn curates a selection of clips from the month that was. Hear from the orchestrators of an expanding AI universe in this episode of In Case You Missed It, with news, views and groundbreaking ideas from Sheamus McGovern, Jerry Yurchisin, Stephanie Hare, Larissa Schneider, and Adrian Kosowsky. We cover baby dragons, the Hippocratic Oath, and, of course, all the latest in artificial intelligence! Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/940⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

1 month, 1 week назад @ podtrac.com
939: Mixture-of-Experts and State-Space Models on Edge Devices, with Tyler Cox and Shirish Gupta
939: Mixture-of-Experts and State-Space Models on Edge Devices, with Tyler Cox and Shirish Gupta 939: Mixture-of-Experts and State-Space Models on Edge Devices, with Tyler Cox and Shirish Gupta

State space models (SSMs), granite models, and Mamba: Dell’s Tyler Cox and Shirish Gupta discuss with Jon Krohn why state space models can process information so efficiently, and how Dell’s AI factory helps enterprises manage custom AI workloads. Hear the latest on the Dell Pro AI Studio and Dell’s partnerships with IBM and Hugging Face in this episode. This episode is brought to you by the Trainium2, the latest AI chip from AWS and by Gurobi. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/939⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information. In this episode you will learn: (02:58) Dell Pro AI…

1 month, 2 weeks назад @ podtrac.com
938: Frontier AI Agents for Data Science, with Sphinx’s Rohan Kodialam
938: Frontier AI Agents for Data Science, with Sphinx’s Rohan Kodialam 938: Frontier AI Agents for Data Science, with Sphinx’s Rohan Kodialam

Jon Krohn speaks to Rohan Kodialam, Cofounder and CEO of Sphinx, the company that redefines how machine intelligence reasons data with frontier AI. In this Feature Friday, Jon and Rohan discuss the benefits of using Sphinx to assist with data analysis. Get under the hood to learn how Sphinx operates, from running commands to ensuring your data stays secure, and find out how you can get your hands on this great tool for free. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/938⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

1 month, 2 weeks назад @ podtrac.com
937: How to Design AI-First Products, with Marc Dupuis
937: How to Design AI-First Products, with Marc Dupuis 937: How to Design AI-First Products, with Marc Dupuis

AI tools won’t eliminate but elevate data scientists, says Marc Dupuis. The CEO of fabi.ai talks to Jon Krohn about the new wave of AI-driven platforms that integrate workflows within popular work tools like Slack and email, and how building AI-first products means widening access to all ability levels. This episode is brought to you by the Gurobi⁠⁠⁠⁠, by ⁠⁠Dell⁠⁠ and by ⁠⁠Intel. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/937 Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information. In this episode you will learn: (09:31) Will fabi.ai outshine data science practitioners (20:40) Resolving workflows w…

1 month, 3 weeks назад @ podtrac.com
Data Science at Home Data Science at Home
последний пост 2 days, 10 hours назад
When Data Stops Being Code and Starts Being Conversation (Ep. 297)
When Data Stops Being Code and Starts Being Conversation (Ep. 297) When Data Stops Being Code and Starts Being Conversation (Ep. 297)

Mark Brocato built Mockaroo—the tool that taught millions of developers how to fake data.

Now, as Head of Engineering at Tonic.ai, he’s building the AI agent that’s making his own creation obsolete.

From the hidden failures of legacy mocks to the security implications of agent-driven synthesis, Mark reveals what happens when data generation becomes a conversation—not a pipeline.

SponsorsTonic.ai Synthetic data solutions for software and AI development.

Accelerate engineering velocity and ensure compliance with AI-powered data synthesisThis episode is brought to you by Statistical HorizonsAt Statistical Horizons, you can stay ahead with expert-led livestream seminars that make data analytics…

2 days, 10 hours назад @ datascienceathome.com
Your AI Strategy is Burning Money: Here’s How to Fix It (Ep.295)
Your AI Strategy is Burning Money: Here’s How to Fix It (Ep.295) Your AI Strategy is Burning Money: Here’s How to Fix It (Ep.295)

Most companies don’t have an AI problem.

In this conversation, he breaks down when AI actually makes sense, where AWS costs spiral out of control, and why your “cool demo” keeps dying before launch.

If you’re tired of AI hype and ready for straight answers, hit play.

Our Discord community is full of ML engineers, researchers, and AI enthusiasts discussing papers, sharing projects, and helping each other level up.

Whether you’re debugging your first neural net or training your tenth transformer, there’s a place for you.

4 weeks, 1 day назад @ datascienceathome.com
From Tokens to Vectors: The Efficiency Hack That Could Save AI (Ep. 294)
From Tokens to Vectors: The Efficiency Hack That Could Save AI (Ep. 294) From Tokens to Vectors: The Efficiency Hack That Could Save AI (Ep. 294)

LLMs generate text painfully slow, one low-info token at a time.

Researchers just figured out how to compress 4 tokens into smart vectors & cut costs by 44%—with full code & proofs!

🔥📊SponsorsThis episode is brought to you by Statistical HorizonsAt Statistical Horizons, you can stay ahead with expert-led livestream seminars that make data analytics and AI methods practical and accessible.

Join thousands of researchers and professionals who’ve advanced their careers with Statistical Horizons.

Get $200 off any seminar with code DATA25 at https://statisticalhorizons.com

1 month, 1 week назад @ datascienceathome.com
Why AI Researchers Are Suddenly Obsessed With Whirlpools (Ep. 293)
Why AI Researchers Are Suddenly Obsessed With Whirlpools (Ep. 293) Why AI Researchers Are Suddenly Obsessed With Whirlpools (Ep. 293)

VortexNet uses actual whirlpools to build neural networks.

By borrowing equations from fluid dynamics, this new architecture might solve deep learning’s toughest problems—from vanishing gradients to long-range dependencies.

Today we explain how vortex shedding, the Strouhal number, and turbulent flows might change everything in AI.

SponsorsThis episode is brought to you by Statistical HorizonsAt Statistical Horizons, you can stay ahead with expert-led livestream seminars that make data analytics and AI methods practical and accessible.

Join thousands of researchers and professionals who’ve advanced their careers with Statistical Horizons.

1 month, 3 weeks назад @ datascienceathome.com
The Scientists Growing Living Computers in Swiss Labs (Ep. 292)
The Scientists Growing Living Computers in Swiss Labs (Ep. 292) The Scientists Growing Living Computers in Swiss Labs (Ep. 292)

At the intersection of ethics and engineering, Amethix creates AI systems that don’t just function—they adapt, learn, and serve.

With a focus on dual-use innovation, Amethix is shaping a future where intelligent machines extend human capability, not replace it.

Discover more at https://amethix.com This episode is brought to you by Intrepid AI.

From drones to satellites, Intrepid AI gives engineers and defense innovators the tools to prototype, simulate, and deploy autonomous systems with confidence.

Learn more at intrepid.aiReferencesWebsite: finalspark.comDiscord account: / discordNewsletter: https://finalspark.com/#newsletterTopics: Biological computing • Neural engineering • Energy-effic…

2 months назад @ datascienceathome.com
When AI Hears Thunder But Misses the Fear (Ep. 291)
When AI Hears Thunder But Misses the Fear (Ep. 291) When AI Hears Thunder But Misses the Fear (Ep. 291)

Sanjoy Chowdhury reveals AI’s hidden weakness: while systems can see objects and hear sounds perfectly, they can’t reason across senses like humans do.

At the intersection of ethics and engineering, Amethix creates AI systems that don’t just function—they adapt, learn, and serve.

Discover more at https://amethix.comThis episode is brought to you by Intrepid AI.

From drones to satellites, Intrepid AI gives engineers and defense innovators the tools to prototype, simulate, and deploy autonomous systems with confidence.

Whether it’s in the sky, on the ground, or in orbit—if it’s intelligent and mobile, Intrepid helps you build it.

2 months, 2 weeks назад @ datascienceathome.com
Why VCs Are Funding $100M Remote Control Toys (Ep. 290)
Why VCs Are Funding $100M Remote Control Toys (Ep. 290) Why VCs Are Funding $100M Remote Control Toys (Ep. 290)

ReferencesWar On The Rocks: https://warontherocks.com/2025/08/ukraine-isnt-the-model-for-winning-the-innovation-war/LinkedIn: https://www.linkedin.com/in/jonasrsinger/Spotify: https://tr.ee/Omy_1X8k1UApple Podcast: https://podcasts.apple.com/us/podcast/defence-innovation-podcast/id1797131332YouTube: https://youtube.com/@DefenceInnovationpodcast?si=cu2WlnVgL5XKnM0pSponsorsThis episode is proudly sponsored by Amethix Technologies.

At the intersection of ethics and engineering, Amethix creates AI systems that don’t just function—they adapt, learn, and serve.

Discover more at https://amethix.comThis episode is brought to you by Intrepid AI.

From drones to satellites, Intrepid AI gives engineers…

3 months, 1 week назад @ datascienceathome.com
How Hacker Culture Died (Ep. 289)
How Hacker Culture Died (Ep. 289) How Hacker Culture Died (Ep. 289)

At the intersection of ethics and engineering, Amethix creates AI systems that don’t just function—they adapt, learn, and serve.

Discover more at amethix.comDSH is brought to you by Intrepid AI.

🐦 Twitter: @DataScienceAtHome📘 LinkedIn: https://www.linkedin.com/in/fragadaleta/Instagram: https://www.instagram.com/datascienceathome/Facebook: https://www.facebook.com/datascienceAHLinkedIn: https://www.linkedin.com/company/data-science-at-home-podcastDiscord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

Send us mail at:[email protected]’t forget to like, subscribe, and hit the 🔔 for…

3 months, 3 weeks назад @ datascienceathome.com
Robots Suck (But It’s Not Their Fault) (Ep. 288)
Robots Suck (But It’s Not Their Fault) (Ep. 288) Robots Suck (But It’s Not Their Fault) (Ep. 288)

At the intersection of ethics and engineering, Amethix creates AI systems that don’t just function—they adapt, learn, and serve.

Discover more at amethix.comDSH is brought to you by Intrepid AI.

🐦 Twitter: @DataScienceAtHome📘 LinkedIn: https://www.linkedin.com/in/fragadaleta/Instagram: https://www.instagram.com/datascienceathome/Facebook: https://www.facebook.com/datascienceAHLinkedIn: https://www.linkedin.com/company/data-science-at-home-podcastDiscord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

Send us mail at:[email protected]’t forget to like, subscribe, and hit the 🔔 for…

4 months, 3 weeks назад @ datascienceathome.com
Your Favorite AI Startup is Probably Bullshit (Ep. 287)
Your Favorite AI Startup is Probably Bullshit (Ep. 287) Your Favorite AI Startup is Probably Bullshit (Ep. 287)

The brutal truth about why Silicon Valley is blowing billions on glorified autocomplete while pretending it’s the next iPhone.

We’re diving deep into the AI investment circus where VCs who can’t code are funding companies that barely understand their own technology.

From blockchain déjà vu to the “ChatGPT wrapper” economy—this episode will make you question every AI valuation you’ve ever seen.

Fair warning: We’re naming names and calling out the hype.

Don’t listen if you work at a “revolutionary AI startup” that’s just OpenAI’s API with a pretty interface.

4 months, 3 weeks назад @ datascienceathome.com
Tech’s Dumbest Mistake: Why Firing Programmers for AI Will Destroy Everything (Ep. 286) [RB]
Tech’s Dumbest Mistake: Why Firing Programmers for AI Will Destroy Everything (Ep. 286) [RB] Tech’s Dumbest Mistake: Why Firing Programmers for AI Will Destroy Everything (Ep. 286) [RB]

From the viral article “Tech’s Dumbest Mistake: Why Firing Programmers for AI Will Destroy Everything” on my newsletter at https://defragzone.substack.com/p/techs-dumbest-mistake-why-firinghere are my thoughts about AI replacing programmers…🎙️ Sponsors AGNTCY — The open source collective building the Internet of Agents🌐 https://www.agntcy.org✨ Connect with us!

🐦 Twitter: @DataScienceAtHome📘 LinkedIn: https://www.linkedin.com/in/fragadaleta/Instagram: https://www.instagram.com/datascienceathome/Facebook: https://www.facebook.com/datascienceAHLinkedIn: https://www.linkedin.com/company/data-science-at-home-podcastDiscord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Scie…

5 months, 3 weeks назад @ datascienceathome.com
Brains in the Machine: The Rise of Neuromorphic Computing (Ep. 285)
Brains in the Machine: The Rise of Neuromorphic Computing (Ep. 285) Brains in the Machine: The Rise of Neuromorphic Computing (Ep. 285)

In this episode of Data Science at Home, we explore the fascinating world of neuromorphic computing — a brain-inspired approach to computation that could reshape the future of AI and robotics.

The episode breaks down how neuromorphic systems differ from conventional AI architectures like transformers and LLMs, diving into spiking neural networks (SNNs), their benefits in energy efficiency and real-time processing, and their limitations in training and scalability.

Real-world applications are highlighted, including low-power drones, hearing aids, and event-based cameras.

Francesco closes with a vision of hybrid systems where neuromorphic chips and LLMs coexist, blending biological inspiratio…

6 months, 1 week назад @ datascienceathome.com
DSH/Warcoded – AI in the Invisible Battlespace (Ep. 284)
DSH/Warcoded – AI in the Invisible Battlespace (Ep. 284) DSH/Warcoded – AI in the Invisible Battlespace (Ep. 284)

This episode explores the invisible battlespace of cyber and electronic warfare, where AI takes center stage.

SponsorsBuilding multi-agent software is hard — agent-to-agent and agent-to-tool communication is still the wild west.

At the intersection of ethics and engineering, Amethix creates AI systems that don’t just function—they adapt, learn, and serve.

Discover more at amethix.comWarcoded is brought to you by Intrepid AI.

From drones to satellites, Intrepid AI gives engineers and defense innovators the tools to prototype, simulate, and deploy autonomous systems with confidence.

6 months, 3 weeks назад @ datascienceathome.com
DSH/Warcoded Swarming the Battlefield (Ep. 283)
DSH/Warcoded Swarming the Battlefield (Ep. 283) DSH/Warcoded Swarming the Battlefield (Ep. 283)

Swarming the Battlefield explores how artificial intelligence is revolutionizing combat through coordinated drone swarms.

This episode uncovers how these intelligent agents turn the chaos of the battlefield into a synchronized dance of machine warfare.

At the intersection of ethics and engineering, Amethix creates AI systems that don’t just function—they adapt, learn, and serve.

Discover more at amethix.comWarcoded is brought to you by Intrepid AI.

From drones to satellites, Intrepid AI gives engineers and defense innovators the tools to prototype, simulate, and deploy autonomous systems with confidence.

7 months назад @ datascienceathome.com
DSH/Warcoded Kill Chains and Algorithmic Warfare – Autonomy in Targeting and Engagement (Ep. 282)
DSH/Warcoded Kill Chains and Algorithmic Warfare – Autonomy in Targeting and Engagement (Ep. 282) DSH/Warcoded Kill Chains and Algorithmic Warfare – Autonomy in Targeting and Engagement (Ep. 282)

In this gripping follow-up, we dive into how AI is transforming kinetic operations—from identifying a threat to executing a strike.

At the intersection of ethics and engineering, Amethix creates AI systems that don’t just function—they adapt, learn, and serve.

Discover more at amethix.comWarcoded is brought to you by Intrepid AI.

From drones to satellites, Intrepid AI gives engineers and defense innovators the tools to prototype, simulate, and deploy autonomous systems with confidence.

Whether it’s in the sky, on the ground, or in orbit—if it’s intelligent and mobile, Intrepid helps you build it.

7 months, 2 weeks назад @ datascienceathome.com