SGLang CVE-2026-5760 and 3 more RCE flaws hit AI inference server (3 unpatched)

Top/Articles/SGLang CVE-2026-5760 and 3 more RCE flaws hit AI inference server (3 unpatched)

NewsPublished May 26, 2026Last updated May 28, 2026

Table of contents

Key takeaways

Four critical RCE vulnerabilities disclosed in SGLang, the AI inference server used by xAI, AMD, NVIDIA, and major cloud providers. CVSS 9.8, no auth required, three remain unpatched as of May 26, 2026. JPCERT/CC issued an advisory.

Four critical vulnerabilities have been disclosed in SGLang, the AI inference server widely used by major cloud providers and AI companies, allowing unauthenticated server takeover. Japan's JPCERT/CC issued an advisory via JVN on May 26, 2026.

One of the four scores 9.8 out of 10 — about as bad as it gets — and lets an attacker execute arbitrary code on the server merely by getting a malicious AI model file loaded. The other three allow direct intrusion by sending crafted data to the server's management ports, and no patch is available as of May 26, 2026.

SGLang is used extensively by companies that run their own LLM inference servers in-house. xAI, AMD, NVIDIA, LinkedIn, Cursor, Oracle, Google Cloud, Microsoft Azure, and AWS are among the listed users. According to the GitHub repository, the project runs on more than 400,000 GPUs in production worldwide — effectively the industry standard.

One Poisoned Model Carries Off the Whole GPU Cluster

Compromising SGLang is not on the level of "an LLM server goes down." It is the company's fine-tuned weights — the thing that cost time and money to build — walking out the door with them. The blast radius is worth mapping before the code-level details.

The actors hunting in-house inference servers are not science-fiction AI criminals; they are the operators who already know this work pays. Industrial spies passing a competitor's fine-tuned LLM and system prompts to a buyer, information brokers monetizing employee query logs (HR questions, financial outlooks, customer names, unreleased code fragments), credential-theft clusters collecting enterprise AI API keys in bulk, monetization crews looking to repurpose A100 and H100 fleets for crypto mining, and adversarial nation-state units. The loot they actually want is the proprietary model weights that cost millions to fine-tune, the operational know-how baked into the system prompt, and persistent access to the GPU cluster behind SGLang. Trigger these CVEs and the other party only needs to get one GGUF file loaded — or land one pickle on the ZeroMQ ROUTER socket — to copy your company's proprietary model, prompts, and connected RAG data and walk out with all three. The normal AI workflow of "pull an untrusted model from the internet and evaluate it" is the intrusion path.

What this four-CVE bundle exposes is that "model theft" and "system-prompt extraction" have collapsed into one motion, with the inference server itself serving as the stepping stone to flip the whole GPU cluster into adversary use. SGLang is typically deployed without authentication, sitting bare inside the corporate VPC. A pickle into the ROUTER socket is instant RCE; a payload-laced GGUF becomes instant RCE the moment `/v1/rerank` is called. One poisoned model from HuggingFace turns into the route to take over the company's entire GPU infrastructure. And because three of the four bugs remain unpatched as of May 26, defenders cannot even claim "no signs of compromise" — pickle payloads are designed to slip past hash and signature checks, so supply-chain scanning will not catch them.

A 9.8 score measures how easily one server falls; what an organization running SGLang actually stands to lose is the fine-tuned model weights it spent time and money to build, the operational know-how written into the system prompt, and the employee query logs typed in without a second thought — the company's own intelligence — copied into a competitor's or a third country's hands. Once weights are out, they do not come back.

What is SGLang?

SGLang is software for serving large language models (LLMs) from your own servers and exposing them as APIs internally or externally. It sits in the same category as vLLM, and the two compete fiercely on performance and features.

The maintainer is LMSYS, the non-profit research organization behind Chatbot Arena, which ranks AI models like ChatGPT and Claude side by side. SGLang itself is open source and installable from PyPI with `pip install sglang`.

A typical deployment looks like this: a company stands up its own "internal ChatGPT" by replacing OpenAI's API with SGLang as the backend, loading open-source models such as Llama or DeepSeek from HuggingFace, and exposing OpenAI-compatible endpoints like `/v1/chat/completions` to internal users.

The four vulnerabilities at a glance

Here is a summary of the four CVEs that JVN bundled together. All four can be triggered without authentication, and successful exploitation leads to arbitrary code execution on the server.

CVE	Entry point	Impact	Patch status
CVE-2026-5760	Malicious GGUF model file + `/v1/rerank` call	RCE (CVSS 9.8)	Fixed in v0.5.11
CVE-2026-7301	Crafted pickle sent to the ZeroMQ ROUTER socket	RCE (CVSS 9.8)	Unpatched
CVE-2026-7302	Path separators in filename for `/v1/images/edits` and `/v1/videos`	Arbitrary file write (path traversal)	Unpatched
CVE-2026-7304	Dill payload in `custom_logit_processor` field	RCE (CVSS 9.8)	Unpatched

CVSS is the industry-standard severity score on a 0-10 scale, with anything 9.0 or above classified as "Critical." Three of these four hit 9.8 — meaning remote, unauthenticated, no user interaction, and severe impact on confidentiality, integrity, and availability all at once. About the worst combination possible.

CVE-2026-5760: HuggingFace downloads turn into landmines

The flashiest of the four was the first to be disclosed, back in April: CVE-2026-5760. Discovered by Stuart Beck (GitHub handle Stuub). A proof-of-concept is already on GitHub.

The attack scenario is simple, which is what makes it nasty:

The attacker builds a malicious model file in GGUF format (the common file format for open-source LLMs). Embedded inside is Jinja2 template code stuffed into the tokenizer.chat_template field.
The attacker uploads the file to a public repository such as HuggingFace, or otherwise gets the target organization to load it.
An engineer running SGLang loads the model into their server, intrigued by an interesting-looking open-source release.
The moment someone hits the `/v1/rerank` endpoint, the template renders server-side and the embedded code runs with whatever privileges the SGLang process has.

The root cause is that SGLang used `jinja2.Environment()` without sandboxing when processing chat templates. According to The Hacker News, the fix is to switch to `ImmutableSandboxedEnvironment` — a well-known countermeasure also flagged in PyPI's own documentation.

The same class of vulnerability was reported in 2024 in llama-cpp-python as "Llama Drama," and the pattern keeps repeating across the AI ecosystem.

CVE-2026-7301: Direct intrusion via the management socket

CVE-2026-7301 lets attackers reach SGLang's internal inter-process communication channel directly when the server runs in multimodal mode (handling images and video). The reporter is Antiproof.

SGLang uses ZeroMQ, a lightweight messaging library, for inter-process communication. The receiving side decodes incoming messages with pickle.loads(), and the recommended startup example in the official documentation uses `--host 0.0.0.0` (accepting connections from anywhere).

Pickle is a convenient Python-specific serialization format, but it has a well-known pitfall: passing untrusted data to `pickle.loads()` runs whatever code is embedded in it. "Never unpickle untrusted data" is the warning printed in bold at the top of Python's official documentation.

As a result, if the SGLang server is exposed to the internet or even a corporate LAN, an attacker can send a crafted pickle to the ZeroMQ socket without authentication and execute arbitrary code on the server. The same structural problem was disclosed in March as CVE-2026-3059 / 3060, partially fixed in v0.5.10 — but CVE-2026-7301 is a different code path that the earlier patch did not cover.

CVE-2026-7302 and 7304: Path traversal and a binary in an API field

The remaining two are simpler in construction.

CVE-2026-7302 (path traversal) exists because the image editing endpoint `/v1/images/edits` and the video endpoint `/v1/videos` concatenate the uploaded filename directly into a server filesystem path. Drop something like ../../../etc/cron.d/backdoor in the filename and you can write a file anywhere the server process has write access. Write a cron job and you have de facto RCE — the severity is essentially indistinguishable from arbitrary code execution.

CVE-2026-7304 (dill deserialization RCE) involves the `custom_logit_processor` field of the generation API. Pass a hex-encoded dill binary in this field and it gets unpacked with `dill.loads()` without validation. Dill is a pickle-compatible superset and shares the same "don't open untrusted data" property. Exploitation requires `--enable-custom-logit-processor` to be on at startup.

CVE-2026-7304 carries a CVSS of 9.8, and both vulnerabilities are remotely exploitable without authentication. According to Antiproof's blog, all three (7301/7302/7304) were disclosed unpatched after the vendor did not respond during coordination.

Why do AI inference servers keep getting hit?

SGLang's troubles are not a one-off. Oligo Security researcher Avi Lumelsky reported in November 2025 that many major AI inference servers — including frameworks from Meta, NVIDIA, Microsoft, plus vLLM and SGLang — share the same root cause: improper use of ZeroMQ combined with Python's pickle. The investigation was published as "ShadowMQ."

The reason is structural. AI inference servers chase performance by running multiple processes in parallel — workers handling inference, schedulers dispatching requests, engines generating tokens — and they exchange large volumes of Python objects between them (model configs, tensor metadata, templates, dictionaries). Pickle is by far the most convenient option for this and appealing from the implementer's side.

The trouble starts when developers shrug off authentication or encryption with "it's just an internal channel," and then ship the default binding as `0.0.0.0`. Depending on the container setup, that "internal" channel ends up directly exposed to the corporate LAN or even the internet. The same pattern showed up in the LiteLLM supply chain attack we covered in November 2025, where convenient-but-dangerous Python features turned into a path to full server compromise.

SGLang has been working on fixes in response to Lumelsky's findings, but according to CSO Online those fixes are incomplete — and new variants of the same problem (CVE-2026-7301 is exactly that) keep surfacing.

Affected versions and what to do

Pulling together the public information, the action items prioritize as follows.

CVE	Affected versions	Action
CVE-2026-5760	≤ v0.5.9	Upgrade to v0.5.11+
CVE-2026-7301	v0.5.5+ (multimodal enabled)	Unpatched. Restrict `--host` to a trusted internal IP, firewall the ZMQ ports
CVE-2026-7302	v0.5.5+ (multimodal enabled)	Unpatched. Block `/v1/images/edits` and `/v1/videos` at the proxy
CVE-2026-7304	v0.4.1.post7+ (custom_logit_processor on)	Unpatched. Disable `--enable-custom-logit-processor`

Start with pip show sglang to check your version, then upgrade to v0.5.11 or later. That closes CVE-2026-5760.

For the other three, which remain unpatched, defense for now is at the configuration level: limit `--host` to `127.0.0.1` or a trusted internal IP at startup, keep the ZeroMQ ports (assigned dynamically per inference server) unreachable from outside the trust boundary, block image and video endpoints at the reverse proxy if you do not use them, and disable `--enable-custom-logit-processor` if you do not need it.

Treatment of externally sourced GGUF files needs review too. The presence of a model on HuggingFace is not a trust signal, and obscure low-download models or freshly forked variants in particular can host attacks like CVE-2026-5760. Restrict where models can come from, and inspect `tokenizer.chat_template` before loading.

Reactions from the security community

Following JVN's May disclosure, security researchers have raised concerns about SGLang's vendor responsiveness. Antiproof explicitly stated that the vendor did not respond during coordination, explaining why disclosure proceeded with no patch in hand.

Meanwhile LMSYS, the team behind SGLang, has been shipping features at pace — a Day 0 DeepSeek-V4 support post went up on April 25. Heavy feature work paired with security response that hasn't kept up is a familiar shape for fast-growing open-source projects.

Given SGLang's deployment footprint, the impact is hard to dismiss. The SGLang README lists major cloud providers running serious AI infrastructure, alongside AI-native startups like Cursor. Whether each of them is on the right version, has multimodal enabled, and has the ZeroMQ ports locked down isn't visible from outside — but every operator should be checking their own configuration now.

Closing

SGLang now has four critical, unauthenticated remote code execution vulnerabilities publicly disclosed. One is fixed in v0.5.11; the other three remain unpatched as of May 26, 2026, leaving multiple entry points open to attackers.

The remediation sequence is: upgrade first, then review your network boundaries, then tighten the policy around externally sourced model files. AI inference infrastructure deserves the same "don't accidentally expose it" treatment as internal databases or Kubernetes clusters — and that mindset needs to be shared with the infra team.

Looking at the past year — LiteLLM's supply chain incident, ShadowMQ, and now SGLang — the underlying cause is consistently Python-ecosystem-specific unsafe use of otherwise convenient features. The same structural problem will keep surfacing in vLLM and other frameworks. Today's four are the visible tip of the iceberg.