Blog

防衛分野における開発の最前線：Sakana AI、Software Engineerインタビュー

2026-05-11T00:00:00+09:00

Sakana AIは、自然界の集合的知性から着想を得たユニークな生成AI技術の研究開発を行っています。この世界トップレベルの技術を社会に実装するため、2025年初頭にApplied Teamを始動しました。現在注力しているのは、金融や防衛など、社会の基盤となる分野です。その中でも防衛分野はいま急速に動き始めています。

では、その現場でSoftware Engineerは何をしているのでしょうか。システムを設計し、コードを書き、AIをプロダクトに実装する——そのような仕事が、防衛分野でどのように展開されているか、イメージできる人はあまり多くないのではないでしょうか。

本記事ではSakana AIの防衛分野でSoftware Engineerとして働く伊藤大さんへのインタビューを通じて、その働き方とその魅力をご紹介します。

インタビューイー

伊藤大
Masaru Itoh
Software Engineer

日米のバックグラウンドを持ち、九州大学在学時よりエンジニアとしてのキャリアを開始。組み込み機器、モバイルアプリ、Webサービスのフルスタックなど幅広い領域の経験を積み、2016年からはLINEヤフー株式会社にて大規模分散システムの設計、開発、運用に従事。

GYAO!のメタデータシステムのオーナー、Yahoo!映画のバックエンドや広告検索のランキング改善のプロジェクトテックリード、検索エンジン上の機械学習リランカーのチームリードを歴任。その後、Sakana AIに防衛分野のSoftware Engineerとして参画。

なぜSakana AIへ？これまでのキャリア

── これまでのキャリアを教えてください。

Yahoo! JAPAN時代から含めて、LINEヤフーには約10年間在籍しました。キャリアの前半はGYAO!やヤフー映画といったサービス開発の現場、後半は検索やML基盤などの横断的な技術組織に身を置き、一貫して大規模な分散システムの設計・開発・運用に従事してきました。

GYAO!では新規システムのオーナーとして作品メタデータの管理基盤を構築し、ヤフー映画ではサービス全体の技術刷新プロジェクトにて、バックエンド領域のTech Leadを務めました。その後は、広告検索エンジンの開発やSolrを用いたMLリランカー開発のチームリード、そしてリアルタイムな学習・推論を実行するML基盤の開発へと、徐々にプラットフォーム寄りのロールへと専門性を広げていきました。

前職には、自らの意志で希望するチームへの異動を志願できる優れた制度がありました。未経験の領域であっても、自身のキャリア志向に従って挑戦し、多様な技術領域で経験を積めたことには、今でも深く感謝しています。

各ロールを振り返ると、非常に多くのやりがいがありました。GYAO!でPoCから着手したClojure実装の基盤が、最終的に旧システムを完全に置き換えるまでに育ったこと。広告検索で極低レイテンシを実現するために、カリカリにチューニングされた独自実装のC++エンジンを大規模に運用したこと。初めてMLシステムに触れ、その効果の大きさに驚愕しながら必死にキャッチアップしたこと。

これらに共通するのは、日本最大規模の自社インフラとサービスの上での開発に身を置けた点です。大規模な分散システムを自ら作り出し、多くのユーザーに提供するインパクトと責任の重さを日々実感する時間でした。実生活に密着したデジタルインフラを支える仕事には大きな意義を感じながら、次第にプライベート企業のWebサービスの枠を超えた社会貢献できる道があるのではないか、とも意識するようになりました。

── 充実したキャリアの中で、Sakana AIとの出会いはどのようなものでしたか？

リクルーターの方からお声掛けいただいたのですが、それまでは正直「謎のAIリサーチラボ」ぐらいの印象しかありませんでした（笑）

印象が大きく変わったのは岩井さんとのカジュアル面談で、そこで初めて金融・防衛プロダクトの存在を知りました。選考過程の技術課題も印象深くて、まず単純に内容が面白く、課題設計に人材像の意図が明確に感じられて、改めて興味を引かれました。

最大の決め手はDavidさん（弊社CEO）との会談でした。企業としてのあり方や、今後開発していく予定のプロダクトのビジョンを惜しむことなく共有いただき、オープンエンドネスを重視する文化、ビジネスの具体的な進捗、明確な技術的優位性が揃っていて、会社自体の大きな飛躍が期待できると感じました。

防衛分野のSoftware Engineer

── 防衛分野のSoftware Engineerというポジションは、業務内容をイメージしにくいと思います。具体的にはどのようなプロジェクトや課題があるのでしょうか？

インテリジェンス領域ではSNS空間の偽情報対策の独自技術の開発していて、同様の技術が読売新聞社様との共同で行ったSNS上の「認知戦」の可視化にも活用されています。

防衛領域では、部隊行動の迅速な状況把握と意思決定の基盤となる指揮統制システムの開発を行っています。ドローン等も含めた現場で発生する大量のデータを統合・分析し、適切な判断と指揮司令のサポートをすることを目指しています。

── そうした領域で、Software Engineerとして具体的にはどのようなことを求められているのでしょうか？

防衛領域で実現すべきアプリケーションはプロツールで、ドメインに精通した熟練者が最大の効果を発揮するために利用するものです。ユーザーの目的を深く理解し、それを達成するためのワークフローと機能性を磨き込み、阻害要因を取り除くことが最も重要だと考えています。

実際の開発スタイルは、形式的なプロセスや会議体に縛られず必要なことだけやるagileな方法を取っていて、具体のヒアリングから自由に課題を抽出し、優先順位を合意しながら実装を行い、次のフィードバックを得るサイクルを進めています。

── 仕事のやりがいや面白さはどこにありますか？また、日常的にどのような技術スタックや開発スタイルで仕事をしているのかも教えてください。

ソフトウェアを通じて日本の重要課題である国防に関わり、安全保障が関わるミッションクリティカルな状況のサポートに携われることに、責任とともに大きな充実を感じています。こういった経験ができる環境に身をおけることは稀有で、今後も少しでも多くの価値を提供できることを期待しています。

技術スタックは現状では標準的なスタックで、MLとの親和性が高いPythonをバックエンドに、TypeScript/Next.jsのWeb UI、KotlinのAndroidアプリが主な要素です。Infrastructureは一般的なWebアプリケーションでクラウド環境で構成することもありますが、DDIL（通信が阻害、切断され、断続し、帯域が著しく制限される）環境を想定した分散システムは大きく異なる構成を選択します。チームの開発toolingの統一はライトにmiseで行っています。情報ガバナンスを確実に守れる運用を前提に、生成AIは業務全般に渡って不可欠なツールだと感じています。実装のみならず、ゴール設定や課題の抽出にも活用しており、同じ人数でより高品質な成果を短期間で実現するのに役立てています。

防衛領域においては、これまでのキャリアのあらゆる要素をフル動員している感覚があります。指揮統制システムでは、様々なデバイスによるインプット方法にエッジ推論を組み合わせたり、DDIL環境で稼働できる分散システムを設計したりと、多数の領域をまたがった開発を行っています。

自分たちの成果物の一つ一つが自衛官の生命に関わることを考えると、これ以上の緊張感や責任感はありません。このミッションクリティカルな領域に生成AIを活用するのは特に深い注意が必要で、実装コードやシステムの出力の品質には人が確実に介在し、担保することが極めて重要です。

── チームの雰囲気や、Applied Research EngineerとSoftware Engineerの役割分担についても聞かせてください。

一般にApplied Research EngineerはMLモデリングを中心としたデータサイエンティスト像、Software Engineerはそれをプロダクト化する役割を担うイメージですが、防衛はプロダクトフェーズが初期段階なこともあり、決まった分担よりも「できる仕事を全員で取り組む」密な関係の印象です。私は入社して日が浅いですが、このおかげですぐに馴染めたとも感じています。

初期段階ではコミュニケーションが重要ですが、チームメンバーは温和で気さくな方ばかりで、日々楽しみながら議論を進めています。領域への真剣さが求められるだけに、議論自体は和やかにできるメンバーのお人柄にただ感謝しています。

防衛に限らず、社員どうしのランチ会食の費用を会社が負担する制度もあり、ありがたくフル活用しています。ランチにご一緒して初めて出会う社内メンバーも多く、不思議なほど良い方ばかりで驚いています（同席募集のシステムもあるので、一人ぼっちになってしまう心配もありません！）。

── 最後に、防衛分野に興味があるエンジニアへ、メッセージをお願いします。

Software Engineerが活躍する場として、広告やレコメンデーションに代表される市場規模の大きい営利活動は社会貢献の面でも非常に重要ですが、防衛に携わることで得られる充実感は一味違うと実感しています。

防衛領域の事前知識を持つエンジニアは少ないと思いますが、チーム内には防衛省、外務省出身のエキスパートも在籍しています。私自身、ソフトウェアと関連の高いサイバー領域や認知戦を多少聞きかじった程度で、陸・海・空の防衛についてはほぼ知識がない状態で飛び込みましたが、皆さんの力を借りてキャッチアップしながらSoftware Engineerとして貢献しています。

Sakana AIの防衛への取り組みはこれから大きな拡大が期待される領域で、まさに今のタイミングが意思決定に広く深く影響を及ぼせる、最も面白い時期だと思っています。まずは話を聞いてみるカジュアル面談からでも、この領域に興味関心がある方のご応募をお待ちしております！

採用情報

「自分たちの成果物の一つ一つが、自衛官の生命に関わる」——インタビュー中のこの言葉が、非常に印象的でした。それは決して単なる重圧ではなく、エンジニアとしての深い充実感の源として語られていました。

自分の書いたコードが、国の安全や意思決定の速度に直結する。この圧倒的なスケールと社会的インパクトこそが、防衛分野における開発の醍醐味と言えるのかもしれません。

Sakana AIでは、多様なバックグラウンドを持つメンバーが、最先端AI技術の社会実装に挑んでいます。この大きな使命に共感し、共に未来を創っていただける方を心よりお待ちしています。

Software Engineer 採用情報

Sparser, Faster, Lighter Transformer Language Models

2026-05-09T00:00:00+09:00

How do we make LLMs faster and lighter? Don’t force the GPU to adapt to sparsity. Reshape the sparsity to fit the GPU! ⚡️

Excited to share our new #ICML2026 paper in collaboration with NVIDIA: “Sparser, Faster, Lighter Transformer Language Models”. This work introduces new open-source GPU kernels and data formats for faster inference and training of sparse transformer language models:

Paper: https://arxiv.org/abs/2603.23198
Technical Blog: https://pub.sakana.ai/sparser-faster-llms
Code: https://github.com/SakanaAI/sparser-faster-llms

The human brain is incredibly efficient because it only activates the specific neurons needed for a thought. Modern LLMs naturally try to do this too (> 95% of neurons in feedforward layers stay silent for any given word), but our hardware punishes them for it.

One of the most frustrating paradoxes in deep learning: making a model do less math often makes it run slower. Why? Because unstructured sparsity introduces irregular memory access, and GPUs are built for predictable, dense blocks of math.

We teamed up with NVIDIA to try to fix this hardware mismatch. Instead of forcing the GPU to adapt to the sparsity, we built a “Hybrid” format that reshapes the sparsity to fit the GPU. Our sparsity format (TwELL) dynamically routes the 99% of highly sparse tokens through a fast path, and uses a dense backup matrix as a safety valve for the rare, heavy tokens.

Our contribution is twofold:

We introduce TwELL (Tile-wise ELLPACK), a new sparse packing format designed to integrate directly in the same optimized tiled matmul kernels without disrupting execution.
We develop custom CUDA kernels that fuse multiple sparse matmuls to maximize throughput and compress TwELL to a hybrid representation that minimizes activation sizes.

We used our kernels to train and benchmark sparse LLMs at billion-parameter scales, demonstrating >20% speedups and even higher savings in peak memory and energy.

This work will be presented at ICML 2026. Please check out our blog and technical paper for a deep dive!

Sakana AI、SMBCグループと共同で複数AIエージェントを活用する「提案書自動生成アプリケーション」を開発

2026-04-30T00:00:00+09:00

Sakana AIは、株式会社三井住友フィナンシャルグループ（以下「SMBCグループ」）と連携し、ホールセールビジネスの高度化を目的とした「提案書自動生成アプリケーション」を開発しました。

本アプリケーションは、株式会社三井住友銀行（以下「三井住友銀行」）において実務への適用を開始します。

背景と目的

Sakana AIとSMBCグループは、2025年5月のパートナーシップ契約締結以来、最先端のAI技術を用いた業務変革について検討を重ねてきました。その第一号案件として、三井住友銀行のホールセールビジネスにおける提案プロセスを抜本的に進化させるべく、本アプリケーションを導入します。

複雑化する顧客企業の経営課題に対し、銀行員がより迅速かつ高度な専門性を持って応えるため、資料作成業務の自動化にとどまらず、AIによる戦略的な思考支援（仮説構築や多角的な分析）を実現します。

「提案書自動生成アプリケーション」の特徴

本アプリケーションは、Sakana AIが強みとする高度なAI技術を、銀行実務の複雑なワークフローに深く組み込んだものです。

自律的に連携する「複数AIエージェント」の活用

情報収集、分析、仮説構築、ストーリー策定、そして品質評価やファクトチェックに至るまで、役割の異なる複数のAIエージェントが相互に連携します。最適なワークフローをAIが自律的に構築・実行することで、一貫性のある高度な提案内容を安定的に創出します。

専門的な提案コンテンツの高度化

対象企業の財務・非財務情報をAIが深く分析し、単なるドラフト作成を超えて、人間では見落としがちな新たな視点や客観的な論点を提示します。これにより、行員はお客さまの本質的な課題解決に注力することが可能になります。

今後の展望

Sakana AIは、今回の三井住友銀行での利用開始を皮切りに、SMBCグループ内の他の業務領域においてもAIエージェント技術の活用を順次拡大していきます。

今後も、日本独自のニーズに応じた革新的なAIソリューションを提供し、金融をはじめとする基幹産業の高度化に向けて真摯に取り組んでいきます。

本日の日本経済新聞にも、今回の取り組みが掲載されています。

https://www.nikkei.com/article/DGXZQOUB2713R0X20C26A4000000/

これまで１〜２週間かかっていた大企業向け提案書作成業務を、数十分から数時間に短縮する見通しです。当社のAIエージェントが自律的に膨大なデータを調査、分析し、お客様のより高度な戦略構築を支援します。

Sakana AI

日本でのAIの未来を、Sakana AIと一緒に切り拓いてくださる方を募集しています。当社の募集要項をご覧ください。

KAME: Tandem Architecture for Enhancing Knowledge in Real-Time Speech-to-Speech Conversational AI

2026-04-29T00:00:00+09:00

（＊日本語は英文の後に）

Can a speech AI think deeply without pausing to process?

We’re excited to introduce KAME: Tandem Architecture for Enhancing Knowledge in Real-Time Speech-to-Speech Conversational AI, accepted at ICASSP2026! 🐢

In real conversation, we don’t wait until we’ve fully worked out what we want to say—we start talking, and our thoughts catch up as the sentence unfolds.

Fast speech-to-speech models achieve this, but their reasoning tends to stay shallow. Cascaded pipelines that route through a knowledgeable LLM are smarter, but the added latency breaks the flow—they fall back to “think, then speak.”

In our new paper, we propose a way to break this trade-off. We call it KAME (Turtle in Japanese).

A speech-to-speech model handles the fast response loop and starts replying immediately. In parallel, a backend LLM runs asynchronously, generating response candidates that are continuously injected as “oracle” signals in real time.

This shifts the AI paradigm from “think, then speak” to “speak while thinking.”

The backend LLM is completely swappable. You can plug in GPT-4.1, Claude Opus, or Gemini 2.5 Flash depending on the task without changing the frontend. In our experiments, Claude tended to score higher on reasoning, while GPT did better on humanities questions.

Blog: https://pub.sakana.ai/kame/
Paper: https://arxiv.org/abs/2510.02327
Model: https://huggingface.co/SakanaAI/kame

Japanese

音声AIの素早さと賢さを両立できるか？

私たち人間は会話の中で、言いたいことを全部まとめてから話し始めるのではなく、話しながら考えを整理していきます。応答の速い Speech-to-Speech モデルは、この「話しながら考える」を実現しましたが、そのぶん思考が浅くなりがちです。かといって知識豊富な LLM を挟むカスケード型では、遅延が生じるため「話しながら」が成立しません。

そこで Sakana AI は、このトレードオフを克服するKAMEモデルを開発しました。Speech-to-Speech モデルが高速な応答ループを担当し、即座に話し始めます。その裏でバックエンドの LLM が非同期に推論を進めて応答候補を生成し、それをオラクル信号としてリアルタイムに注入します。これにより「考えてから話す」ではなく「話しながら考える」ことが可能になります。

バックエンドの LLM は差し替えが可能で、タスクに応じてGPT-4.1、Claude Opus、Gemini 2.5 Flashなどを使い分けられます。フロントエンド側の変更は必要ありません。私たちの実験では、Claudeは推論系のタスクで、GPTは人文系のタスクで、それぞれ高いスコアを出す傾向が見られました。

本研究は ICASSP2026 で発表されます。ぜひ、お試しください。

ブログ: https://pub.sakana.ai/kame/
論文: https://arxiv.org/abs/2510.02327
モデル: https://huggingface.co/SakanaAI/kame

Learning to Orchestrate Agents in Natural Language with the Conductor

2026-04-27T00:00:00+09:00

TL;DR

For the past few years, humans have been doing “prompt engineering” to coax the best performance out of different LLMs. In this work, we explored what happens if we train an AI to do that job instead.

In this work, by training a Conductor model with RL, we found that it naturally learns to write highly effective, custom instructions for a whole pool of other models. It essentially learns to ‘manage’ them in natural language.

What surprised us most was how it dynamically adapts. For simple factual questions, it just queries one model. But for hard coding problems, it autonomously spins up a whole pipeline of planners, coders, and verifiers.

Summary

Introducing our new work: “Learning to Orchestrate Agents in Natural Language with the Conductor” accepted at ICLR2026

Paper: https://arxiv.org/abs/2512.04388
OpenReview: https://openreview.net/forum?id=U23A2BUKYt

What if we trained an AI not to solve problems directly, but to act as a manager that delegates tasks to a diverse team of other AIs?

To solve complex tasks, humans rarely work alone; we form teams, delegate, and communicate. Yet, multi-agent AI systems currently rely heavily on rigid, human-designed workflows or simple routers that just pick a single model. We wanted an AI that could dynamically build its own team.

We trained a 7B Conductor model using Reinforcement Learning to orchestrate a pool of frontier models (including GPT-5, Gemini, Claude, and open-source models available during the period leading up to ICLR2026).

Instead of executing code, the Conductor outputs a collaborative workflow in natural language. For any given question, the Conductor specifies:

Which agent to call
What specific subtask to give them (acting as an expert prompt engineer)
What previous messages they can see in their context window

Through pure end-to-end reward maximization, amazing behaviors emerged. The Conductor learned to adapt to task difficulty: it 1-shots simple factual questions, but autonomously spins up complex planner-executor-verifier pipelines for hard coding problems.

The results are very promising: The 7B Conductor surpasses the performance of every individual worker model in its pool, setting new records on LiveCodeBench (83.9%) and GPQA-Diamond (87.5%) at the time of publication. It also significantly outperforms expensive multi-agent baselines like Mixture-of-Agents at a fraction of the cost.

One of our favorite features: Recursive Test-Time Scaling! By allowing the Conductor to select itself as a worker, it reads its own team’s prior output, realizes if it failed, and spins up a corrective workflow on the fly. This opens a new axis for scaling compute during inference.

This research proves that language models can become elite meta-prompt engineers, dynamically harnessing collective intelligence.

Alongside our TRINITY research which we announced a few days earlier, this foundational research powers our new multi-agent system: Sakana Fugu! 🐡

Trinity: An Evolved LLM Coordinator

2026-04-26T00:00:00+09:00

What if instead of building one giant AI, we evolved a coordinator to orchestrate a diverse team of specialized AIs? 🐟

Excited to share our new paper: “TRINITY: An Evolved LLM Coordinator”, published as a conference paper at ICLR2026!

Paper: https://arxiv.org/abs/2512.04695
OpenReview: https://openreview.net/forum?id=5HaRjXai12

In nature, complex problems are rarely solved by a single monolithic entity, but rather by the coordinated efforts of specialized individuals working together. Yet, modern AI development is heavily focused on endlessly scaling up single, massive monolithic models, yielding diminishing returns. While model merging offers a way to combine different skills, it is often impractical due to mismatched neural architectures and the closed-source nature of top-performing models.

To address this, we took a macro-level approach: test-time model composition. We introduce TRINITY, a system that fuses the complementary strengths of diverse, state-of-the-art models without needing to modify their underlying weights.

TRINITY processes queries over multiple turns. At each step, a lightweight coordinator assigns one of three distinct roles to an LLM from its available pool:

Thinker: Devises high-level strategies and analyzes the current state.
Worker: Executes concrete problem-solving steps.
Verifier: Evaluates if the current solution is complete and correct.

By dynamically assigning these roles, the coordinator effectively offloads complex reasoning and skill execution onto the external models.

What makes TRINITY unique is its extreme efficiency. The coordinator relies on the hidden states of a compact language model and a small routing head. In total, it has fewer than 20K learnable parameters.

Training this system presented a massive challenge. Traditional Reinforcement Learning (REINFORCE) failed because the gradients had a low signal-to-noise ratio due to binary rewards and weak parameter coupling. Imitation learning (Supervised Fine-Tuning) was ruled out because generating multi-turn labels is prohibitively expensive.

Our solution? We turned to nature-inspired algorithms. We optimized the coordinator using a derivative-free evolutionary algorithm. We found that evolution is uniquely suited to optimize this tight, high-dimensional coordination problem where traditional gradient-based methods fail.

The results are very promising. In our experiments, TRINITY consistently outperforms existing multi-agent methods and individual models across various benchmarks. At the time of publication, it set a new state-of-the-art record on LiveCodeBench, achieving an 86.2% pass@1 score.

More importantly, it demonstrated incredible generalization. Without any retraining, TRINITY transferred zero-shot to four unseen tasks (AIME, BigCodeBench, MT-Bench, and GPQA). On average, the evolved coordinator surpassed every individual constituent model in its pool, including GPT-5, Gemini 2.5-Pro, and Claude-4-Sonnet (the top frontier models available at the time of our ICLR2026 submission last year).

This work is central to Sakana AI’s vision. We believe the future of AI isn’t just about scaling monolithic models, but engineering collaborative, diverse AI ecosystems that can adapt and combine their strengths.

We invite the community to read the paper and explore these ideas!

This foundational research is part of the core engine powering our multi-agent product: Sakana Fugu 🐡

Sakana Fugu: A Multi-Agent Orchestration System as a Foundation Model

2026-04-24T00:00:00+09:00

（＊日本語は英文の後に）

We are excited to introduce Sakana Fugu, our flagship international commercial AI product—a multi-agent orchestration system, now opening applications for early beta testers. Sakana Fugu coordinates pools of frontier foundation models to achieve state-of-the-art performance across coding, mathematics, scientific reasoning, etc.

Initially, our Sakana Fugu model will be available as an API, where it has served as a key internal tool for our own researchers and engineers, and we are now ready to invite people outside Sakana AI to try it:

👉 Apply for Beta Test

Sakana Fugu Model, which is a small language model itself, learns to call LLMs (left). In the course of training, it can learn to call itself, enabling Test-time scaling (right). The actual coordination in Sakana Fugu is adaptive and complex.¹

Pushing the Boundaries by Collective Intelligence

A core conviction at Sakana AI is that the most capable AI systems will not be monolithic models scaled in isolation, but collections of specialized agents working together. This thread runs through everything we have built: evolutionary model merging, which showed that diverse open-source models can be combined to produce capabilities none possessed individually; The AI Scientist, which demonstrated that coordinated AI agents can autonomously execute the full cycle of scientific research; ShinkaEvolve, which uses evolutionary search over a pool of LLM-generated programs to discover algorithms that outperform human-written solutions; and AB-MCTS, which showed that multiple frontier models cooperating through tree search can substantially outperform any individual model on hard reasoning tasks.

Sakana Fugu is the product form of this research direction.

Sakana Fugu 🐡

Conventional approaches to utilizing foundation models often require users to manage multiple API keys, as models from different providers tend to specialize in distinct areas. This multi-model management leads to economic inefficiency. Moreover, since model strengths are frequently problem-specific rather than broad area-specific, fine-grained optimization through model switching is difficult for end-users.

Sakana AI’s Fugu models resolve these limitations. Fugu models achieve superior performance by dynamically coordinating and orchestrating a diverse pool of powerful models. Instead of using domain knowledge to prescribe team organization, roles, or workflows, Fugu learns to dynamically assemble agents from a pool and coordinate them through non-obvious but highly efficient collaboration patterns.

Sakana Fugu models are based on our ICLR 2026 papers (Trinity and Conductor), and we have substantially further improved the methods to increase the performance and user experience, to be offered as a commercial product.

Task	Gemini 3.1 (high)	GPT 5.4 (high)	Opus 4.6 (max)	fugu-mini 🐟	fugu-ultra 🐡
GPQAD	94.4	90.9	92.7	92.4	95.1
LCBv6	90.3	92.1	92.4	90.4	93.2
SWEPro	48.4	51.2	53.4 ²	51.3	54.2

This adaptive, dynamic orchestration grants Fugu models superior performance on established benchmarks. The above table is a subset of our current results for our models in beta.

Using Sakana Fugu

Sakana Fugu is accessible via APIs, with compatibility for standard OpenAI-format endpoints. If you are already using GPT, Gemini, or Claude via API, Sakana Fugu can be integrated into existing workflows with minimal changes. Behind that familiar interface, Sakana Fugu handles coordination across the model pool automatically — establishing the collaboration topology, assigning the roles and dispatching the subtasks to complete complex tasks.

Two variants are available: Sakana Fugu Mini 🐟, optimized with latency in mind, and Sakana Fugu Ultra 🐡, the full orchestration system, optimized for performance for demanding tasks.

Join the Beta

We are looking for researchers and engineers from all areas to join as early testers. We want to understand how Sakana Fugu performs across domains we have not yet tested internally, where it falls short, and what researchers and engineers most need from a system like this.

If you are using foundation model APIs in coding assistants like OpenCode and Codex, or in your engineering, business-specific projects where you would like to see if Fugu models bring performance or novelty advantages, we would love to have you involved.

👉 Apply to Join the Beta

Publications

Xu, Sun, Schwendeman, Nielsen, Cetin, Tang. TRINITY: An Evolved LLM Coordinator. ICLR 2026.

https://arxiv.org/abs/2512.04695

Nielsen, Cetin, Schwendeman, Sun, Xu, Tang. Learning to Orchestrate Agents in Natural Language with the Conductor. ICLR 2026.

https://arxiv.org/abs/2512.04388

Japanese

マルチエージェント・オーケストレーションシステム「Sakana Fugu」βテスト開始

Sakana AIは、新たな商用AIプロダクトとして「Sakana Fugu（サカナ・フグ）」を開発しました。Sakana Fuguは、複数のフロンティア基盤モデルを協調させることで、コーディング、数学、科学的推論といった幅広い領域で高い性能を引き出すマルチエージェント・オーケストレーションシステムです。Sakana Fuguは、当初はAPIとして提供されます。これまで社内の研究者やエンジニアの主要なツールとして活用してきましたが、この度、社外の方々にもお使いいただけるよう、βテストを開始します。

👉 βテストに申し込む

Sakana Fuguはそれ自体が小規模なモデルであり、LLMを呼び出すことを学習します（左）。学習の過程で自分自身を呼び出すことも習得でき、これにより推論時スケーリングが実現します（右）。なお、図では説明のためにシングルステップのルーティングとして示していますが、実際のSakana Fuguが実現するオーケストレーションはより適応的かつ複雑です。³

集合知により、AIの限界を押し広げる

Sakana AIでは、AIの可能性を最大限活かすには、一つの大きなモデルではなく、役割の異なる複数のエージェントが協力し合うことが最も有望な方法だと考え、研究開発を進めてきました。

「進化的モデルマージ」では、多様なオープンソースモデルを組み合わせることで、どの単独モデルも持っていなかった能力を引き出せることを示しました。「AIサイエンティスト」では、複数のAIエージェントが協調することで、科学研究のプロセス全体を自律的に進められることを実証しました。「ShinkaEvolve」では、LLMが生成したプログラムに対して進化的な探索を行うことで、人間が書いたものよりも優れたアルゴリズムを発見できることを示しました。そして「AB-MCTS」では、複数のフロンティアモデルが木探索を通じて協力することで、単独のモデルを大きく上回る性能を発揮できることを明らかにしました。

Sakana Fuguは、こうした研究の方向性をひとつのプロダクトとして形にしたものです。

Sakana Fuguとは

これまで、複数の基盤モデルを活用する際には、複数のAPIキーを使い分ける必要がありました。モデルによって得意分野が違うため、タスクごとに最適なモデルを選ぶ必要があるからです。しかしこの運用は、コスト面でも効率面でも負担が大きく、さらにモデルの強みは領域単位ではなく問題ごとに異なることも多いため、ユーザー側で細かく最適化するのは容易ではありません。

こうした課題を解決すべく、Sakana Fuguを開発しました。Sakana Fuguは、どのモデルをどう組み合わせて使うかを固定のルールで決めるのではなく、問題に応じて最適なエージェントの組み合わせと協調の仕方を、モデルのプールの中から動的に選び出します。しかも、人間のドメイン知識では思いつきにくいような効率的な協調方法を、自律的に学習していくのが特徴です。Sakana Fuguのモデルは、私たちのICLR 2026採択論文（Trinity およびConductor）をベースとしており、さらなる性能向上とユーザー体験の向上に向けて手法を改良しています。

こうした適応的なオーケストレーションによって、Sakana Fuguは既存のベンチマーク上でも高い性能を発揮します。以下は結果の一部です。

タスク	Gemini 3.1 (high)	GPT 5.4 (high)	Opus 4.6 (max)	fugu-mini 🐟	fugu-ultra 🐡
GPQAD	94.4	90.9	92.7	92.4	95.1
LCBv6	90.3	92.1	92.4	90.4	93.2
SWEPro	48.4	51.2	53.4 ＊	51.3	54.2

各ベンチマークタスクごとのスコア：＊はAnthropic独自の検証用フレームワークを使用した自己申告スコア。SWEPro の評価には mini-swe-agent のスキャフォールドを使用。Anthropic が公表している Opus の最大思考モードのスコアについては、当社での評価試行中に頻繁にタイムアウトが発生したため、Anthropic 公式の報告値を採用。

Sakana Fuguの使い方

Sakana FuguはAPIで利用できます。OpenAI形式のエンドポイントとの互換性があり、いまGPT、Gemini、ClaudeなどのAPIをお使いの方は、既存のワークフローをほとんど変えずにそのまま導入いただけます。いつものインターフェースの背後で、Sakana Fuguがモデル間の協調の組み立て、役割の割り当て、サブタスクの振り分けまでを自動で行います。

ラインナップは2種類を予定しています。レイテンシを重視した「Sakana Fugu Mini 🐟」と、フルのモデルプールを活用する「Sakana Fugu Ultra 🐡」です。深い推論を求めるタスクにはUltraが適しています。

βテスター募集

今回のβテストでは、さまざまな分野の研究者・エンジニアの方にご参加いただきたいと考えています。社内ではまだ試せていない領域でSakana Fuguがどのような性能を発揮するのか、どこに課題があるのか、そしてこうしたシステムに対して現場でどのようなニーズがあるのかを、皆さまと一緒に見つけていくことが目的です。

OpenCodeやCodexといったコーディングアシスタントで基盤モデルのAPIを活用されている方、あるいはご自身のエンジニアリング業務やビジネス領域のプロジェクトで、Sakana Fuguが性能や可能性の面で新しい選択肢になりうるかを試してみたい方は、ぜひご応募ください。

👉 βテストに申し込む

Footnotes

One result from our papers points to an interesting future direction. When a Fugu model is allowed to call itself recursively, reading its own prior output as context and deciding whether to revise its coordination strategy, a new form of test-time scaling emerges. The model recognizes when its first attempt fell short and spins up a corrective workflow. The depth of recursion becomes a tunable compute axis at inference time, requiring no retraining. A small model, by reading itself, can iterate toward answers that neither it nor any of its workers could reach in a single pass. ↩
Self-reported score with custom Anthropic scaffold. SWEPro were evaluated with the mini-swe-agent scaffold. However, we use the scores reported by Anthropic for Opus with the max thinking efforts due to frequent timeouts during our evaluation trials. ↩
自己呼び出しがもたらす、新しい推論時スケーリング　Sakana Fuguが自分自身の出力を入力として読み込み、協調のしかたを見直しながら再帰的に自分を呼び出せるようにしたところ、新しいタイプの推論時スケーリングが現れることがわかりました。モデル自身が「最初の答えでは不十分だった」と気づき、修正のためのワークフローを自ら立ち上げるのです。再帰の深さは推論時に調整でき、再学習は必要ありません。小さなモデルであっても、自分自身の出力を読み返すことによって、1回の推論では到達できなかった答えへとたどり着けるようになります。 ↩

String Seed of Thought: Prompting LLMs for Distribution-Faithful and Diverse Generation

2026-04-21T00:00:00+09:00

（＊日本語は英文の後に）

Can LLMs flip coins in their heads?

When prompted to “Flip a fair coin” 100 times, the heads to tails ratio drifts far from 50:50. LLMs can understand what the target probability should be, but generating outputs that faithfully follow a given distribution is a separate problem.

This bias extends beyond coin flips. When LLMs are asked to generate multiple story ideas or brainstorm solutions, the outputs tend to cluster around a narrow range. The same probabilistic skew that distorts coin flips limits diversity in creative generation, recommendations, and other tasks where varied outputs are needed.

We discovered a prompting technique named String Seed of Thought (SSoT). The method is simple: instruct the LLM to generate a random string in its own output, then manipulate that string to derive its answer. It requires only a small addition to the prompt and no external random number generator.

SSoT significantly reduces output bias across a wide range of LLMs, both open and closed. With reasoning models (such as DeepSeek-R1), it reaches accuracy close to that of actual random sampling. The method generalizes from binary choices to n-way selections and arbitrary probability distributions. On the NoveltyBench diversity benchmark, SSoT outperformed other approaches across all six categories while maintaining output quality.

This work will be presented at ICLR2026!

Technical Blog: https://pub.sakana.ai/ssot
Paper: https://arxiv.org/abs/2510.21150
Openreview: https://openreview.net/forum?id=luXtb

Japanese

LLMは頭の中でコイントスができるか？

一見簡単そうで奥深いこの問題を「プロンプトだけ」で解決した論文 “SSoT: Prompting LLMs for Distribution-Faithful and Diverse Generation” が ICLR2026 に採択されました。

LLMに「コイントスをして」と100回プロンプトすると、出力の表と裏の比率は50:50から大きく離れてしまいます。明示的に確率の指示が与えられても、LLMがそれに忠実に従って出力を生成することは難しい問題です。

このことは、コイントスに留まりません。LLMに小説のアイデアを何本か出してもらったら似たような案ばかり出てきた、という経験はないでしょうか。コイントスを歪ませるのと同じ確率的な偏りが、創作やブレインストーミングなど多様な出力が求められるタスク全般で多様性を抑制しています。

私たちはこれらの問題の解決策として、String Seed of Thought (SSoT)というプロンプトを発見しました。SSoTは、LLMに頭の中で一旦ランダムな文字列を考えさせ、その文字列を操作させて結果を出力させるという非常にシンプルな手法です。外部の乱数生成器は一切使いません。

SSoTにより出力のバイアスはオープンモデルでもクローズドなモデルでも幅広いLLMで低減されます。一部のreasoningモデルでは、実際に乱数を使った場合とほぼ変わらない精度を達成しました。これは、2択の選択肢だけでなく一般の離散分布について有効です。

さらに重要なのは、SSoTはモデル出力の多様性を高めるのに使えることです。創作的な文書作成などにおいて、SSoTをプロンプトに加えるだけで、出力される文書などの多様性が高まることがわかりました。

本手法はコンテンツ生成やアイディア出し、推論時スケーリングの新手法の開発など、LLMを実世界のシステムに組み込んでいく上で重要な基盤になると考えています。

SSoTのメカニズム、理論的な解析、インタラクティブなデモについてはブログと論文をご覧ください。

ブログ： https://pub.sakana.ai/ssot
論文： https://arxiv.org/abs/2510.21150
OpenReview： https://openreview.net/forum?id=luXtb

Digital Ecosystems: Interactive Multi-Agent Neural Cellular Automata

2026-04-19T00:00:00+09:00

What happens when you put competing neural networks in a Petri Dish and start changing the rules while they adapt?

Last year we released Petri Dish NCA, where neural nets are the organisms that learn during simulation. Today we’re releasing Digital Ecosystems: a browser-based platform for interactive artificial life research.

The setup: several small CNNs share a 2D grid, each seeing only a 3x3 neighborhood. No global plan. They compete for territory by attacking neighbours and defending against incoming attacks, learning via gradient descent online while the simulation runs.

What we didn’t expect was the role of the learning itself. Gradient descent isn’t just optimising each species’ strategy. Instead, it acts to stabilize the whole system during simulation. Species that overextend get pushed back by the loss. Species that stagnate get nudged to grow. This means you can push parameters toward edge-of-chaos regimes: a zone characterised by emergent complexity. Letting the neural networks learn acts to hold the complex system together while you explore and interact.

The platform lets you steer all of this interactively. You can draw walls to create niches, erase parts of the system online, and tune 40+ system parameters to explore the most interesting configurations. We find it mesmerizing to watch species carve out territories and reorganise when you perturb them.

Everything runs client-side in your browser, no install needed.

Technical Blog: http://pub.sakana.ai/digital-ecosystem
Code: https://github.com/SakanaAI/digital-ecosystem

Sakana AI、総務省事業においてSNS空間の可視化と偽・誤情報対策を行う独自技術を開発

2026-04-07T00:00:00+09:00

Sakana AIは、技術開発主体として採択されている総務省事業「インターネット上の偽・誤情報等への対策技術の開発・実証事業（令和7年度）」において、SNS空間の可視化、総合的な偽情報判定、対策案の立案までを支援するシステム開発を完了しました。

Sakana AIがインテリジェンス領域に取り組む背景

社会においてSNS上を中心とする偽・誤情報の流通への対応が急務となっており、安全保障をめぐる議論においても「情報力」が重要な柱と位置付けられています。こうした状況下では、AIを活用し、偽・誤情報への早期対応・判断を行うことが重要となります。そして、安全保障にも深く関わるインテリジェンス領域の活動において、基盤となる最先端の技術を国内で保有することは不可欠です。

Sakana AIはこうした背景も踏まえ、日本発のスタートアップとして、インテリジェンス領域でのAIの社会実装に向けた取り組みを進めています。本事業ではその一環として、膨大な偽・誤情報の可視化・判定・対策を担う技術開発に取り組みました。

「インターネット上の偽・誤情報等への対策技術の開発・実証事業（令和7年度）」での成果

現代の情報環境の中で偽・誤情報対策を推進する際には、大きく以下3点の課題が存在しています。

SNS上の膨大な情報: 限られたリソースの中で、日々大量に生まれる情報の中から、真偽を確かめるべき対象を効率的に見つけ出すのは容易ではありません。
偽情報の複雑化・巧妙化: AI技術の進展により、偽情報はますます精巧になり、検証作業自体にも大きな時間とコストがかかるようになっています。
対策検討の難しさ: 偽・誤情報を発見した後、カウンター発信などの対策が、正しい情報を広める上でどれだけ有効かを事前に予測することは困難です。

Sakana AIはこれらを解決するため、それぞれの課題に対応した以下3種類の技術開発を行いました。

SNS空間の可視化: ナラティブの深層を捉えるノベルティサーチ技術

膨大な情報が溢れる環境においては、「インプレッション数の多寡」などの単一の指標だけでは、言論空間の全体像を正しく把握することはできません。そこで、社会的に波及力を持つ具体的な「ナラティブ（論調）」を単位とした分析を行う技術として、ノベルティサーチ技術の開発と、それを踏まえた自律的な分析レポートの生成機能を開発・実装しました。これにより、有機的な言論空間を高解像度かつ網羅的に捕捉することを可能としています。

〈開発に取り組んだ技術〉

ノベルティサーチ技術の開発: 単なるタグ付けではなく、社会に波及力を持つ具体的なナラティブ（論調）を抽出するため、AIエージェントが自律的に重要情報を探索するノベルティサーチ技術を開発しました。本技術は、X（旧Twitter）の投稿データを再帰的にサンプリングすることで、新規性の高いナラティブを効率的に特定・抽出することを可能にします。
階層ナラティブツリーによる可視化: 抽出したナラティブを、AIを用いて階層的に整理し可視化することで、SNS空間の直感的な理解を可能にします。

総合的な偽・誤情報判定: 多様なAIモデルの組み合わせによる多角的な検証

続いて、生成AIの進化により巧妙化する偽情報を見極めるための技術開発にも取り組みました。本事業でSakana AIは、複数の検知器を組み合わせた多面的なアプローチにより、偽情報を高い精度で検知するシステムを実装しました。また、その判定根拠を明示する仕組みとすることで、AIの判断プロセスの透明性を確保し、実務者による検証も容易にしています。

〈開発に取り組んだ技術〉

画像・動画の生成・加工検知: AIによって、画像・動画の生成・加工跡を検知するシステムを開発しました。検知器を構成するAIモデルには、大局的な構造に注目するフロンティア基盤モデルと微細な構造に注目する弊社独自モデルを併用することで、モデルごとの検知精度の偏りや「死角」を相互補完し、安定した検知を可能にしました。
画像・動画のすりかえ検知: 画像や動画を逆画像検索し、イベントの時間・場所・背景等を照合することで、本物の画像に嘘のキャプションをつけてミスリードを誘う投稿を検知する機能を実装しました。
画像・動画・投稿文の自動ファクトチェック: AIエージェントが投稿から検証可能な主張（クレーム）を抽出し、WebやSNSを自律的に検索して妥当性を検証するシステムを構築しました。人間の専門家が行うような、仮説検証の繰り返しによる調査プロセスを模倣して多角的に検証を行い、判定に至る論理構成や、裏付けとなる複数の情報ソースを明示的に提示します。
ユーザー分析・反応分析: 検証対象となる投稿の周辺情報として、①投稿者の過去投稿やプロフィールの要約・整理、②反応内容の要約・整理、③反応者の挙動確認、といった機能を実現するAIツールを開発しました。

対策案の立案: ABM（Agent Based Modeling）を活用したSNS空間のシミュレーション

偽・誤情報を検知するだけでなく、その拡散を抑制・鎮静化させる「カウンター発信」の有効性を事前に検証するためのシミュレーション基盤を開発しました。

〈開発に取り組んだ技術〉

SNS空間のシミュレーション: 弊社独自のABM（Agent Based Modeling）標準化フレームワーク「Shachi」を活用し、SNS空間を精緻に再現するシステムを開発しました。これにより、偽・誤情報に対するカウンター発信がどの程度有効に機能するかを検証できます。
高解像度なペルソナの自動生成: 実際の投稿データに基づき、特定の論調を背景に持つ多様な仮想的なSNSアカウント（ペルソナ）を構築しました。ここでは、抽出された各ナラティブを体現させるため、階層的ナラティブツリーの手法を応用しています。これにより、カウンター発信が「どの層に」「どのような心理的変容を与えたか」というミクロな視点での検証を可能としました。

「インターネット上の偽・誤情報等への対策技術の開発・実証事業成果発信イベント」への参加

2026年3月16日には、Sakana AIのチームが総務省「インターネット上の偽・誤情報等への対策技術の開発・実証事業成果発信イベント」に参加しました。

会場では、関連事業者やメディアの皆様、中央省庁の皆様等にお越しいただき、本事業成果について紹介・意見交換を実施させていただきました。

今後もSakana AIは、インテリジェンス領域でのAIの社会実装に貢献していきます。

ライトニングトークにて、Applied Research Engineerの沢田恭兵が事業成果を紹介しました
Applied Research Engineerの沢田、アルチョム、Project Managerの佐和らがイベントに参加し、事業成果についてご紹介しました

Sakana AI

日本でのAIの未来を、Sakana AIと一緒に切り拓いてくださる方を募集しています。当社の募集要項をご覧ください。

Blog

防衛分野における開発の最前線：Sakana AI、Software Engineerインタビュー

インタビューイー

なぜSakana AIへ？ これまでのキャリア

防衛分野のSoftware Engineer

採用情報

Sparser, Faster, Lighter Transformer Language Models

Sakana AI、SMBCグループと共同で複数AIエージェントを活用する「提案書自動生成アプリケーション」を開発

背景と目的

「提案書自動生成アプリケーション」の特徴

今後の展望

Sakana AI

KAME: Tandem Architecture for Enhancing Knowledge in Real-Time Speech-to-Speech Conversational AI

Can a speech AI think deeply without pausing to process?

音声AIの素早さと賢さを両立できるか？

Learning to Orchestrate Agents in Natural Language with the Conductor

TL;DR

Summary

Trinity: An Evolved LLM Coordinator

Sakana Fugu: A Multi-Agent Orchestration System as a Foundation Model

Pushing the Boundaries by Collective Intelligence

Sakana Fugu 🐡

Using Sakana Fugu

Join the Beta

Publications

マルチエージェント・オーケストレーションシステム「Sakana Fugu」βテスト開始

集合知により、AIの限界を押し広げる

Sakana Fuguとは

Sakana Fuguの使い方

βテスター募集

関連論文

Footnotes

String Seed of Thought: Prompting LLMs for Distribution-Faithful and Diverse Generation

Can LLMs flip coins in their heads?

LLMは頭の中でコイントスができるか？

Digital Ecosystems: Interactive Multi-Agent Neural Cellular Automata

What happens when you put competing neural networks in a Petri Dish and start changing the rules while they adapt?

Sakana AI、総務省事業においてSNS空間の可視化と偽・誤情報対策を行う独自技術を開発

Sakana AIがインテリジェンス領域に取り組む背景

「インターネット上の偽・誤情報等への対策技術の開発・実証事業（令和7年度）」での成果

SNS空間の可視化: ナラティブの深層を捉えるノベルティサーチ技術

総合的な偽・誤情報判定: 多様なAIモデルの組み合わせによる多角的な検証

対策案の立案: ABM（Agent Based Modeling）を活用したSNS空間のシミュレーション

「インターネット上の偽・誤情報等への対策技術の開発・実証事業成果発信イベント」への参加

Sakana AI

なぜSakana AIへ？これまでのキャリア