What Is Sakana AI's "Fugu"? The Japanese AI That Bundles Other AIs

Top/Articles/What Is Sakana AI's "Fugu"? The Japanese AI That Bundles Other AIs

News Updated today

Makoto Horikawa

Backend Engineer / AWS / Django

2026.06.2314 min2 views

Share X Hatena LINE LinkedIn RSS

Key takeaways

Japan's Sakana AI launched Fugu and Fugu Ultra, an AI that bundles and routes between multiple models. What it is, and how it differs from Claude and ChatGPT, explained for non-experts.

On June 22, 2026, Tokyo-based AI company Sakana AI publicly released a new AI service called "Sakana Fugu" and its top-tier version "Fugu Ultra." Its defining feature is the idea of automatically combining and routing between multiple AIs, rather than entrusting everything to a single clever AI. As a food analogy, it is less like a private chef who cooks everything personally, and more like a concierge who routes each order to the kitchen that does it best.

Sakana AI claims that the higher-end Fugu Ultra rivals the performance of leading overseas models "Fable" and "Mythos" (both top-tier AIs from Anthropic, the maker of Claude). It also pitches the fact that it carries far less risk of suddenly being shut down by a particular country's export controls. That is not just marketing flourish: those very "strongest AIs" vanished worldwide on US government orders only days earlier.

This article explains, to the point where you can explain it to others, what kind of company Sakana AI is, what is new about Fugu, and the question people ask most often: how does it differ from Claude and ChatGPT (Codex)? Read it once and you will see where Sakana AI stands on today's generative-AI map.

What is Sakana AI? A Tokyo lab the world is watching

Sakana AI is an AI company founded in Tokyo in 2023. The name "Sakana" (Japanese for "fish") reflects the idea that small fish moving as a school can produce great power. Rather than building one giant AI, the company's philosophy is to combine small, clever ones to become strong.

The founders' résumés drew attention first. The central figure, David Ha, is known for distinctive research published while at Google's AI division. The other, Llion Jones, is one of the co-authors of the 2017 paper that produced "Transformer," the foundational technology behind today's generative-AI boom. In other words, both ChatGPT and Claude ultimately trace back to inventions by Jones and his colleagues, and that caliber of figure has set up shop in Japan.

It has also attracted expectations and capital from home and abroad. Reports put its valuation at roughly $2.65 billion, among the highest for a private startup in Japan. Investors include the three megabanks Mitsubishi UFJ, Sumitomo Mitsui, and Mizuho, plus Itochu, Nomura, the telecom giant KDDI, and even chip giant Nvidia. Overseas, Google has partnered with Sakana AI as well.

The company has two technical signatures. One is Evolutionary Model Merging: a method that "crossbreeds" several existing AIs, much like biological evolution, to produce an AI with new capabilities. The other is the AI Scientist: an attempt to let the AI itself handle research, from generating ideas to running experiments and writing papers. Both are distinctive approaches that differ from the mainstream overseas path of relentlessly scaling up one giant AI.

A move closer to home for Japanese users: in March 2026 it released a free AI chat, "Sakana Chat." It runs an AI called "Namazu," tuned for Japan to reduce the biases that overseas AIs tend to carry; we covered that in a separate article. Fugu is the company's serious next move along that same line.

→ Sakana AI launches free "Sakana Chat": what is the uncensored "Namazu" model?

What is Fugu? An AI that conducts other AIs

In a phrase, Fugu is a conductor that bundles AIs. Sakana AI itself describes it as "a full multi-agent system accessible via a single API (endpoint)." Multi-agent means "multiple AIs sharing the work," and an agent is "an AI that thinks and works on its own."

Normally, when we use AI we pick "this one AI" and hand it everything: writing, translation, code, all to the same single AI. Fugu's idea is different here. The user only asks one endpoint, Fugu, and from there Fugu looks at the task, decides which AI to assign which part, splits the work, and returns a consolidated result. It resembles an orchestra conductor signaling "violins here, brass there" to shape one piece. The conductor does not play an instrument, but knows who should do what.

And the "players" Fugu bundles are not only its own AIs. It says the candidate mix includes both the world's open models (AIs whose internals are public) and closed models (commercial AIs with hidden internals, like ChatGPT). Founder David Ha says the company already uses Fugu internally for research and coding.

Introducing Sakana Fugu: A full multi-agent orchestration system accessible via a single model API. Our 'Fugu Ultra' model matches the performance of Fable and Mythos, delivering frontier capability without the risk of export controls.
— hardmaru (@hardmaru) June 2026

It comes in two flavors: the standard Fugu and the high-performance Fugu Ultra. Their intended uses can be organized as follows.

Item	Fugu (standard)	Fugu Ultra (high-end)
Intended use	Everyday work (chat, coding assistance)	Hard, multi-step problems (AI research, security analysis, academic investigation)
Priority	Balance of performance and speed	Maximum performance
Positioning (company's claim)	Frontier class	On par with Fable and Mythos
Availability	Generally available now via Sakana AI's console (dashboard)

The goal is to free users from "the complicated machinery underneath." Normally you would have to contract several kinds of AI for different purposes and choose which to use yourself. Fugu takes over that decision entirely, so the user just asks one endpoint. The selling point: the mechanism may be complex, but the user's experience is no different from using a single AI.

Not just "routing": what Fugu really is

By now you will naturally think: "If it just combines other companies' AIs, can't an individual do that?" Indeed, plenty of people bundle multiple AIs with tools like CrewAI or OpenRouter, or with a homemade router. If it merely bundles, there is nothing appealing about it. Whether Fugu has value hinges on exactly this point. To state the conclusion up front: Fugu is not "just a router."

First, the router (the conductor) is different inside. Fugu's conductor is not a simple "if-then" rule branch but "Conductor," a dedicated model of about 7 billion parameters that Sakana AI trained with reinforcement learning. The point is that it has learned by itself, not from a human-written manual, how to coordinate: whom to assign which work, and when to make a model "think again."

It builds on the company's research. "TRINITY," which bundles multiple AIs over many turns, and "AB-MCTS," Sakana's own search algorithm that makes several leading AIs cooperate and try, fail, and retry toward an answer (it has produced results on the hard ARC-AGI-2 benchmark). It dynamically assigns roles to each model on the spot, such as Thinker, Worker, and Verifier.

This is the trick behind "high scores even with lightweight models." Rather than making a model bigger and smarter, it pours more computation into adjusting, verifying, and retrying at answer time (inference time), the idea known as "test-time scaling." Instead of trying to nail it in one shot, it has the models solve cleverly and repeatedly, cross-check each other, and lift the answer's quality. Fugu can even call itself recursively, spinning up a corrective workflow when the first attempt falls short.

Most homemade setups and existing frameworks call other companies' AIs through a fixed, human-designed procedure. Fugu's difference is that "who gets assigned what" is left to a trained model and a search algorithm, not written by hand. This is the "only here" substance Sakana AI claims, continuous with the company's distinctive lines like Evolutionary Model Merging and the AI Scientist.

There is a point not to misunderstand, though. Sakana AI did not build, in-house, a giant conversational model to go head-to-head with GPT or Claude. The company itself admits it "did not train a single frontier model." What it built is not the genius inside, but the intelligence of the "conductor" that bundles the geniuses. And the very idea of making routing smarter through learning can be followed by others, so how long this advantage (moat) lasts is unknown. Even so, it is worth noting that the substance is clearly a cut above "just bundling."

→ How close can an open-source AI agent get to the original (hands-on comparison)

How is it different from Claude and ChatGPT?

This is the part you will most want to explain to someone. First, a quick primer on the two AIs whose names you hear most.

Claude is an AI made by the US company Anthropic. It is known for being strong at long back-and-forth conversations, writing and fixing code, and long, complex tasks. Anthropic offers several Claudes at different tiers: the current top is "Fable," with a limited-availability "Mythos" above it. The everyday workhorses are a three-tier lineup: the top-flight Opus, the well-balanced Sonnet, and the fast, low-cost Haiku. For engineers, the "Claude Code" coding tool is also popular.

ChatGPT is the AI from the US company OpenAI. It is widely used for general text generation, and OpenAI also offers "Codex," a service specialized in writing code. Roughly speaking, Codex is "OpenAI's code-writing AI," a rival to Claude Code. We covered a hands-on comparison of the two using real vulnerabilities in a separate article.

What these two companies share is the form of "have you use a single excellent AI they built in-house." The core is one model (the seat of intelligence), and the path is to grow it bigger and smarter. Fugu, by contrast, takes the form of bundling multiple AIs, including those from other companies, and routing each task to whichever is best suited. The arena is shifted by one level. The three can be summarized as follows.

Service	Maker	Core approach	Strengths
Claude	Anthropic (US)	Own model lineup (top: Fable; workhorses: Opus / Sonnet / Haiku)	Long conversations, code, long-running tasks
ChatGPT / Codex	OpenAI (US)	Own model lineup (Codex specialized for code)	General text generation, code generation
Sakana Fugu	Sakana AI (Japan)	"Conductor" type that bundles multiple AIs	Auto-routing to the best AI per task

To put it more plainly: Claude and ChatGPT are "private chefs." One skilled chef cooks the Japanese, Chinese, and Italian dishes alike; as the chef improves, the whole menu improves. Fugu is a "skilled concierge." It does not cook itself, but routes "sushi to that place, ramen to this place" and brings it all to one table. Neither is good or bad; their roles simply differ.

The idea of bundling and routing AIs is not new to Fugu. We have tracked similar currents in an article measuring how close an open-source (free, publicly available) AI agent can get to the original, and in a roundup of moves by various dev-tool makers. What is new about Fugu is that it repackages the idea into a single commercial service that requires no specialist knowledge.

Isn't this the same as Claude's "sub-agents"?

A sharp question. Claude, too, has a mechanism called sub-agents that splits work among multiple AIs with different roles (in Claude Code for engineers, and in its agent features for business use). On the single point of "multiple AIs sharing the work," it does resemble Fugu.

The decisive difference is the breadth of who is bundled. With Claude's sub-agents, the members sharing the work are basically all Claude (the same Anthropic AI). Picture one company assigning a light model and a smart model to different roles. Fugu, by contrast, bundles across company lines. According to reports, today's Fugu holds candidates such as OpenAI's GPT-5.5, Anthropic's Claude Opus, Google's Gemini 3.1 Pro, various open models, and Fugu itself (calling itself recursively).

This difference connects directly to the "does not stop" point from the previous section. Because Claude's sub-agents are all Claude, if Anthropic goes down they all go down together. Fugu can re-route to another company's AI even if one company is cut off. Even within the same "multi-agent (multiple AIs sharing work)" idea, the fundamentals differ: splitting within one company versus splitting across many.

There is also a usability touch. Fugu itself is "a single model trained to call other models," and from the outside it behaves like an ordinary single model. It can also be called via an API in the same format as ChatGPT (OpenAI-compatible), so it slots straight into existing tools. The internals are a cross-company mixed team; the front door is a single window.

How does it compare with the world's leading AIs?

In the industry, "leading AIs" are called frontier models: the group of AIs at the performance frontier, with Anthropic's Fable and Mythos and OpenAI's top ChatGPT (the GPT series) as flagships. Sakana AI claims Fugu Ultra rivals this frontier, especially Anthropic's Fable and Mythos.

One thing to keep in mind soberly: for now, "rivals" is Sakana AI's own claim. AI performance is usually compared via benchmarks (scoring on a shared set of test problems), but results vary greatly with who measured them under what conditions. A self-announced "on par" is safest treated as confirmed only once a third party reproduces it under the same conditions.

We have flagged this "how to read the scores" point before. When one top-tier AI made headlines for "solving" an unsolved math problem, for example, one verification found it scored zero on the official test. The press headline and the actually-reproducible ability do not always match. For Fugu too, until independent verification is in, the fair stance is to treat it as a company claim.

In fact, the benchmarks Sakana AI published show Fugu Ultra standing shoulder to shoulder with Fable 5, but not sweeping it. The main figures line up as follows (higher is better in every case).

Test (ability measured)	Fugu Ultra	Fable 5
SWE-Bench Pro (practical code fixing)	73.7	86.0
Humanity's Last Exam (hard exam)	50.0	53.3
Terminal Bench 2.1 (terminal work)	82.1	80.4
LiveCodeBench (code generation)	93.2	89.8
CharXiv Reasoning (reading charts)	86.6	86.1

It wins on some items, but on others, like SWE-Bench Pro (a flagship measure of code fixing), Fable 5 leads by a wide margin. Sakana AI itself does not say "beats," only the careful "stands shoulder to shoulder." The fair reading for now: it likely has the chops to reach the frontier, but with strengths and weaknesses that vary by category.

That said, what makes Fugu interesting is not a simple score contest. The design itself, "not chained to one company's single model," brings a different axis to the frontier world. The next section digs into what that means.

What does it cost? Isn't it pricey if it uses Opus inside?

After performance, the next thing on your mind is money. And here the sharper reader suspects: "If it uses an expensive AI like Opus inside, won't Sakana's cut be added on top, making it pricey in the end?" Let's look at the price table first, then answer that. Like Claude and ChatGPT, Fugu has two tracks: a flat rate (subscription) and pay-as-you-go (by usage).

Plan	Price (approx.)	Notes
Standard	~$20/month	Everyday personal use. Both Fugu and Fugu Ultra
Pro	~$100/month	About 10x the usage
Max	~$200/month	About 20x the usage
Pay-as-you-go (API)	Fugu Ultra: $5 input / $30 output per 1M tokens	For businesses, developers, heavy workloads

The ~$20 entry is in line with ChatGPT Plus and Claude Pro (both around $20/month). Now for the "isn't it pricey?" question. The key is that charges do not stack. Even when multiple AIs run at once internally to deliberate and cross-check, the bill does not double or triple: you are charged only for a single unit of the top-tier model actually used. Running five AIs does not mean "five models' worth."

That said, to be honest: Fugu Ultra's rate ($5 input / $30 output per million tokens) is roughly the same level as the top-tier models used inside (such as Opus). So it is not "dramatically cheaper than the sum of the parts," but more accurately "frontier-level results for roughly the price of a single model." And as the previous section explained, because it lifts quality through lightweight-model deliberation and test-time scaling, it does not always have to call the most expensive model; when it avoids the top tier, it comes out cheaper.

There is a flip-side caveat. Because test-time scaling "solves repeatedly and cross-checks," hard workloads can process more tokens, and cost can grow accordingly. When the ceiling is hard to read, the safe move is to measure the actual bill on a light task first. In short, Fugu's pricing pitch is not "the cheapest," but "frontier-level results, not locked to one company, on one easy-to-read bill."

Can you see, and choose, the AIs inside?

"It sounds convenient, but it is unsettling not to know which AI just answered my question." This too is a natural concern. In short, the design offers limited real-time visibility, but the ability to "shut things out" in advance.

First, the routing itself, which model is called when, is undisclosed by design. Sakana AI treats "how it selects and combines" as the source of its competitive edge, so a breakdown like "this answer used GPT" is not shown to the user each time.

On the other hand, the control most in demand in practice, "do not let a specific AI be used," is available. Fugu says you can exclude specific providers or models from the candidate pool in advance to meet data-protection and compliance requirements. For requests like "this country's company's AI is off-limits under our internal rules" or "we want this vendor excluded," you can handle it by leaving them out of the pool from the start.

In other words, it is accurate to understand this as control to "keep out what you do not want used," rather than transparency that shows "what ran at this very moment." Note that the current pool includes the aforementioned GPT-5.5, Claude Opus, and Gemini 3.1 Pro, plus open models, but Anthropic's Fable 5 and Mythos are not generally available and so are not even candidates. The point that it claims "on par" while not actually using the very models it stands beside is also worth keeping in mind.

Why "made in Japan" and a "bundling design" matter

From here on is my view, grounded in the facts. I believe Fugu's greatest value lies less in performance numbers and more in the single point of "not depending on one company or one country." The reason is shown by something that happened only days ago.

In June 2026, Anthropic's top AIs "Fable 5" and its higher-end "Mythos 5," released as its strongest ever, became unusable worldwide just three days after launch. Not for lack of performance, and not an outage. The US government ordered them suspended on national-security grounds, and Anthropic complied. Paying users and Japanese companies alike were cut off with no exceptions. We reported this as breaking news.

In this context, Fugu's tagline, "delivering frontier capability without the risk of being stopped by export controls," starts to bite. Founder David Ha puts it more bluntly: "Relying on a single company's model for national infrastructure is a massive risk. As recent export controls have shown, access to top models can disappear overnight." Because Fugu is designed so the underlying AIs can be swapped wholesale, the performance can continue with another "player" even if a particular company or country is cut off.

Human intelligence is fundamentally a collective intelligence. We solve complex problems by participating in a vast cultural network that builds upon ideas across generations. I believe the strongest AI systems will become a collective intelligence, too.
— hardmaru (@hardmaru) June 2026

This idea also rhymes with Sakana AI's own history. "Namazu," which corrected overseas AIs' biases for Japan, started from unease about depending wholesale on either a US or Chinese AI. And the lineup of investors including domestic communications-infrastructure firms like KDDI is hardly unrelated to the wish "not to let a foreign company hold our critical infrastructure." Watching the broader saga of Anthropic being shut out by the US government amid friction with OpenAI, it is clear that leading AI is increasingly becoming an instrument of the state.

Let me also pose the mean question: "So if the high-performing frontier models all become unusable at once, won't Fugu hit the same wall in the end?" That is half right. If the commercial top tiers, GPT, Claude, Gemini, all stop together, Fugu's hand thins out and performance will certainly drop. Fugu is no "magic that never stops." But two things differ from simply using a single AI. One: because it is spread across several companies, an "all companies down at once" event is less likely than a single company going down. Two: as a last line of defense, open models (whose internals are public) and its own models (the Namazu family) remain. Open models already distributed worldwide are not the kind of thing that vanishes overnight under export controls. So Fugu's essence is not "never stops" but "unlikely to all stop at once, with a baseline you can secure in-house", a practical insurance policy.

Of course, the bundling design has weaknesses too. Routing through multiple AIs may slow responses or raise costs. How accurately the conductor can route, the quality of its "judgment," directly shapes usability. Even so, there are situations, mainly for businesses and government, where you would choose "a setup that does not stop" over "the single fastest machine." That Japan's Sakana AI has put such an option on the table is itself, I think, meaningful.

So how should you choose?

Finally, a practical wrap-up for people who actually get their hands dirty. Should you switch to Fugu today? The answer is "it depends on your use." For now, thinking about it as follows should keep you from getting lost.

For everyday personal use, the AI you already know is plenty. For drafting text, looking things up, or a bit of code, the Claude or ChatGPT you already use will not let you down. There is little reason, for now, to switch to a new endpoint.

If you or your team already juggle multiple AIs at work, Fugu is worth a try. The manual effort of routing "this process to AI A, that one to AI B" could be consolidated into a single endpoint. Start with a small workload and actually measure response speed, cost, and the smartness of the routing. Do not swallow the self-announced "on par"; judge by whether it reproduces in your own work.

If "not stopping" is critical, for businesses and government, it is worth evaluating the design philosophy itself. If you have requirements like avoiding dependence on one company or country, or avoiding sudden cutoffs from overseas regulation, then Fugu's "swappable internals" property matters more than the performance numbers. That Sakana AI is a Japanese company can also be a plus for data handling and procurement.

In short, Fugu is not a contender in the arena of "the single smartest machine," but a challenger standing in a different arena: "how to bundle smart AIs and keep using them without stopping." Rather than going head-to-head with Claude and ChatGPT, it is closer to laying one more layer on top of them. It does not make private chefs obsolete; think of it as a new profession, the skilled concierge, being born, and it clicks into place. That Japan's Sakana AI has taken the lead here is a move worth remembering as you read the generative-AI map to come.

Frequently asked questions

Where is Sakana AI based?

It is a Japanese AI company headquartered in Tokyo. It was founded in 2023 by David Ha, a former Google AI researcher, and Llion Jones, a co-author of the "Transformer" paper that underpins today's generative AI. Its valuation is reported at roughly $2.65 billion, among the highest for a private startup in Japan. Investors include Japan's three megabanks, Itochu, KDDI, and Nvidia.

What makes Fugu impressive?

Instead of relying on a single clever AI for everything, it works like a "conductor" that automatically picks and combines multiple AIs. You send a request to a single endpoint (API), and Fugu chooses the best AI for each part of the task, splits the work, and assembles the result. There is a standard Fugu and a high-performance Fugu Ultra; Ultra claims performance on par with leading overseas models (though for now that is the company's own claim).

How is it different from Claude or ChatGPT?

Claude and ChatGPT have you use a single excellent AI they built in-house. Fugu instead bundles multiple AIs, including those from other companies, and routes each task to whichever is best suited. As a food analogy, Claude and ChatGPT are a private chef who cooks everything, while Fugu is more like a concierge who directs each dish to the best kitchen. Their roles differ, so it is not a simple better-or-worse comparison.

Why is "made in Japan" drawing attention?

Because access to leading AI can vanish overnight due to one country's regulations. In June 2026, Anthropic's top AIs "Fable 5" and "Mythos 5" were shut down worldwide just three days after launch on the orders of the US government. Fugu is designed so the underlying AIs can be swapped out, avoiding dependence on a single company or country. Being a domestic company can also be a plus for data handling and procurement.

How much does Fugu cost?

Flat-rate plans for individuals start at about $20/month (Standard), with Pro (~$100) and Max (~$200) by usage. All include both Fugu and Fugu Ultra. The ~$20 entry is in line with ChatGPT Plus and Claude Pro. For businesses and developers there is also pay-as-you-go (API), and even when multiple AIs run at once internally, charges do not stack: you pay only for a single unit of the top-tier model actually used.

Can you choose the AIs used? Can you exclude a particular country's or company's AI?

Which model is used when (the routing) is undisclosed by design and not visible in real time. However, to meet data-protection and compliance requirements, you can exclude specific providers or models from the candidate pool in advance. Requests like "we do not want this country's or this company's AI" can be handled by leaving them out of the pool from the start.

How is it different from Claude's sub-agents?

With Claude's sub-agents, the members sharing the work are basically all Claude (the same Anthropic AI): one company assigning a light model and a smart model to different roles. Fugu bundles across company lines, GPT-5.5 (OpenAI), Claude Opus (Anthropic), Gemini 3.1 Pro (Google), open models, and its own models. That is why, if one company stops, it can re-route to another company's AI.

Isn't it pricey if it uses Opus and the like inside?

The key is that charges do not stack. Even when multiple models deliberate and cross-check internally, you are billed only for a single unit of the top-tier model actually used, not "five models' worth." Fugu Ultra's rate ($5 input / $30 output per million tokens) is roughly the same level as top-tier models like Opus, so it is less "dramatically cheaper than the sum of parts" and more "frontier-level results for roughly the price of one model." That said, because it solves and cross-checks repeatedly, hard workloads can consume more tokens and cost more.

Isn't it just combining other companies' AIs?

The router itself is different. Fugu's conductor is "Conductor," a dedicated ~7-billion-parameter model Sakana AI trained with reinforcement learning, which learns by itself, not from a human procedure, whom to assign what. It builds on Sakana's own search algorithm "AB-MCTS" and "TRINITY" for making multiple AIs cooperate, dynamically assigning thinker, worker, and verifier roles. Most CrewAI setups and homemade routers use fixed, human-designed procedures, which is the difference. Note, though, that Sakana AI did not build a frontier conversational model of its own; what it built is the intelligence of the "conductor."

References

Update history

June 23, 2026: First published (created following the June 22 general availability of Sakana Fugu)
June 23, 2026: Revised. Added pricing and plans, the difference from Claude's sub-agents, the reality of dependency risk, the visibility and opt-out of which models are used, and a real benchmark comparison with Fable 5
June 23, 2026: Restructured into the order readers want. Added a new section "Not just routing: what Fugu really is" (the trained Conductor orchestrator, AB-MCTS, test-time scaling, the difference from CrewAI and the like) and moved the pricing section to right after performance, answering "isn't it pricey if it uses Opus inside?" honestly