[Critical] Four RCE Vulnerabilities Hit SGLang AI Inference Server, Three Still Unpatched
Four critical RCE vulnerabilities disclosed in SGLang, the AI inference server used by xAI, AMD, NVIDIA, and major cloud providers. CVSS 9.8, no auth required, three remain unpatched as of May 26, 2026. JPCERT/CC issued an advisory.

Makoto Horikawa
Backend Engineer / AWS / Django
Four critical RCE vulnerabilities disclosed in SGLang, the AI inference server used by xAI, AMD, NVIDIA, and major cloud providers. CVSS 9.8, no auth required, three remain unpatched as of May 26, 2026. JPCERT/CC issued an advisory.
Four critical vulnerabilities have been disclosed in SGLang, the AI inference server widely used by major cloud providers and AI companies, allowing unauthenticated server takeover. Japan's JPCERT/CC issued an advisory via JVN on May 26, 2026.
One of the four scores 9.8 out of 10 — about as bad as it gets — and lets an attacker execute arbitrary code on the server merely by getting a malicious AI model file loaded. The other three allow direct intrusion by sending crafted data to the server's management ports, and no patch is available as of May 26, 2026.
SGLang is used extensively by companies that run their own LLM inference servers in-house. xAI, AMD, NVIDIA, LinkedIn, Cursor, Oracle, Google Cloud, Microsoft Azure, and AWS are among the listed users. According to the GitHub repository, the project runs on more than 400,000 GPUs in production worldwide — effectively the industry standard.
What is SGLang?
SGLang is software for serving large language models (LLMs) from your own servers and exposing them as APIs internally or externally. It sits in the same category as vLLM, and the two compete fiercely on performance and features.
The maintainer is LMSYS, the non-profit research organization behind Chatbot Arena, which ranks AI models like ChatGPT and Claude side by side. SGLang itself is open source and installable from PyPI with `pip install sglang`.
A typical deployment looks like this: a company stands up its own "internal ChatGPT" by replacing OpenAI's API with SGLang as the backend, loading open-source models such as Llama or DeepSeek from HuggingFace, and exposing OpenAI-compatible endpoints like `/v1/chat/completions` to internal users.
The four vulnerabilities at a glance
Here is a summary of the four CVEs that JVN bundled together. All four can be triggered without authentication, and successful exploitation leads to arbitrary code execution on the server.
| CVE | Entry point | Impact | Patch status |
|---|---|---|---|
| CVE-2026-5760 | Malicious GGUF model file + `/v1/rerank` call | RCE (CVSS 9.8) | Fixed in v0.5.11 |
| CVE-2026-7301 | Crafted pickle sent to the ZeroMQ ROUTER socket | RCE (CVSS 9.8) | Unpatched |
| CVE-2026-7302 | Path separators in filename for `/v1/images/edits` and `/v1/videos` | Arbitrary file write (path traversal) | Unpatched |
| CVE-2026-7304 | Dill payload in `custom_logit_processor` field | RCE (CVSS 9.8) | Unpatched |
CVSS is the industry-standard severity score on a 0-10 scale, with anything 9.0 or above classified as "Critical." Three of these four hit 9.8 — meaning remote, unauthenticated, no user interaction, and severe impact on confidentiality, integrity, and availability all at once. About the worst combination possible.
CVE-2026-5760: HuggingFace downloads turn into landmines
The flashiest of the four was the first to be disclosed, back in April: CVE-2026-5760. Discovered by Stuart Beck (GitHub handle Stuub). A proof-of-concept is already on GitHub.
The attack scenario is simple, which is what makes it nasty:
- The attacker builds a malicious model file in GGUF format (the common file format for open-source LLMs). Embedded inside is Jinja2 template code stuffed into the
tokenizer.chat_templatefield. - The attacker uploads the file to a public repository such as HuggingFace, or otherwise gets the target organization to load it.
- An engineer running SGLang loads the model into their server, intrigued by an interesting-looking open-source release.
- The moment someone hits the `/v1/rerank` endpoint, the template renders server-side and the embedded code runs with whatever privileges the SGLang process has.
The root cause is that SGLang used `jinja2.Environment()` without sandboxing when processing chat templates. According to The Hacker News, the fix is to switch to `ImmutableSandboxedEnvironment` — a well-known countermeasure also flagged in PyPI's own documentation.
The same class of vulnerability was reported in 2024 in llama-cpp-python as "Llama Drama," and the pattern keeps repeating across the AI ecosystem.
CVE-2026-7301: Direct intrusion via the management socket
CVE-2026-7301 lets attackers reach SGLang's internal inter-process communication channel directly when the server runs in multimodal mode (handling images and video). The reporter is Antiproof.
SGLang uses ZeroMQ, a lightweight messaging library, for inter-process communication. The receiving side decodes incoming messages with pickle.loads(), and the recommended startup example in the official documentation uses `--host 0.0.0.0` (accepting connections from anywhere).
Pickle is a convenient Python-specific serialization format, but it has a well-known pitfall: passing untrusted data to `pickle.loads()` runs whatever code is embedded in it. "Never unpickle untrusted data" is the warning printed in bold at the top of Python's official documentation.
As a result, if the SGLang server is exposed to the internet or even a corporate LAN, an attacker can send a crafted pickle to the ZeroMQ socket without authentication and execute arbitrary code on the server. The same structural problem was disclosed in March as CVE-2026-3059 / 3060, partially fixed in v0.5.10 — but CVE-2026-7301 is a different code path that the earlier patch did not cover.
CVE-2026-7302 and 7304: Path traversal and a binary in an API field
The remaining two are simpler in construction.
CVE-2026-7302 (path traversal) exists because the image editing endpoint `/v1/images/edits` and the video endpoint `/v1/videos` concatenate the uploaded filename directly into a server filesystem path. Drop something like ../../../etc/cron.d/backdoor in the filename and you can write a file anywhere the server process has write access. Write a cron job and you have de facto RCE — the severity is essentially indistinguishable from arbitrary code execution.
CVE-2026-7304 (dill deserialization RCE) involves the `custom_logit_processor` field of the generation API. Pass a hex-encoded dill binary in this field and it gets unpacked with `dill.loads()` without validation. Dill is a pickle-compatible superset and shares the same "don't open untrusted data" property. Exploitation requires `--enable-custom-logit-processor` to be on at startup.
CVE-2026-7304 carries a CVSS of 9.8, and both vulnerabilities are remotely exploitable without authentication. According to Antiproof's blog, all three (7301/7302/7304) were disclosed unpatched after the vendor did not respond during coordination.
Why do AI inference servers keep getting hit?
SGLang's troubles are not a one-off. Oligo Security researcher Avi Lumelsky reported in November 2025 that many major AI inference servers — including frameworks from Meta, NVIDIA, Microsoft, plus vLLM and SGLang — share the same root cause: improper use of ZeroMQ combined with Python's pickle. The investigation was published as "ShadowMQ."
The reason is structural. AI inference servers chase performance by running multiple processes in parallel — workers handling inference, schedulers dispatching requests, engines generating tokens — and they exchange large volumes of Python objects between them (model configs, tensor metadata, templates, dictionaries). Pickle is by far the most convenient option for this and appealing from the implementer's side.
The trouble starts when developers shrug off authentication or encryption with "it's just an internal channel," and then ship the default binding as `0.0.0.0`. Depending on the container setup, that "internal" channel ends up directly exposed to the corporate LAN or even the internet. The same pattern showed up in the LiteLLM supply chain attack we covered in November 2025, where convenient-but-dangerous Python features turned into a path to full server compromise.
SGLang has been working on fixes in response to Lumelsky's findings, but according to CSO Online those fixes are incomplete — and new variants of the same problem (CVE-2026-7301 is exactly that) keep surfacing.
Affected versions and what to do
Pulling together the public information, the action items prioritize as follows.
| CVE | Affected versions | Action |
|---|---|---|
| CVE-2026-5760 | ≤ v0.5.9 | Upgrade to v0.5.11+ |
| CVE-2026-7301 | v0.5.5+ (multimodal enabled) | Unpatched. Restrict `--host` to a trusted internal IP, firewall the ZMQ ports |
| CVE-2026-7302 | v0.5.5+ (multimodal enabled) | Unpatched. Block `/v1/images/edits` and `/v1/videos` at the proxy |
| CVE-2026-7304 | v0.4.1.post7+ (custom_logit_processor on) | Unpatched. Disable `--enable-custom-logit-processor` |
Start with pip show sglang to check your version, then upgrade to v0.5.11 or later. That closes CVE-2026-5760.
For the other three, which remain unpatched, defense for now is at the configuration level: limit `--host` to `127.0.0.1` or a trusted internal IP at startup, keep the ZeroMQ ports (assigned dynamically per inference server) unreachable from outside the trust boundary, block image and video endpoints at the reverse proxy if you do not use them, and disable `--enable-custom-logit-processor` if you do not need it.
Treatment of externally sourced GGUF files needs review too. The presence of a model on HuggingFace is not a trust signal, and obscure low-download models or freshly forked variants in particular can host attacks like CVE-2026-5760. Restrict where models can come from, and inspect `tokenizer.chat_template` before loading.
Reactions from the security community
Following JVN's May disclosure, security researchers have raised concerns about SGLang's vendor responsiveness. Antiproof explicitly stated that the vendor did not respond during coordination, explaining why disclosure proceeded with no patch in hand.
Meanwhile LMSYS, the team behind SGLang, has been shipping features at pace — a Day 0 DeepSeek-V4 support post went up on April 25. Heavy feature work paired with security response that hasn't kept up is a familiar shape for fast-growing open-source projects.
Given SGLang's deployment footprint, the impact is hard to dismiss. The SGLang README lists major cloud providers running serious AI infrastructure, alongside AI-native startups like Cursor. Whether each of them is on the right version, has multimodal enabled, and has the ZeroMQ ports locked down isn't visible from outside — but every operator should be checking their own configuration now.
Closing
SGLang now has four critical, unauthenticated remote code execution vulnerabilities publicly disclosed. One is fixed in v0.5.11; the other three remain unpatched as of May 26, 2026, leaving multiple entry points open to attackers.
The remediation sequence is: upgrade first, then review your network boundaries, then tighten the policy around externally sourced model files. AI inference infrastructure deserves the same "don't accidentally expose it" treatment as internal databases or Kubernetes clusters — and that mindset needs to be shared with the infra team.
Looking at the past year — LiteLLM's supply chain incident, ShadowMQ, and now SGLang — the underlying cause is consistently Python-ecosystem-specific unsafe use of otherwise convenient features. The same structural problem will keep surfacing in vLLM and other frameworks. Today's four are the visible tip of the iceberg.
References
- JVNVU#96879318 - Multiple vulnerabilities in SGLang (JPCERT/CC)
- VU#915947 - SGLang RCE when rendering chat templates (CERT/CC, CVE-2026-5760)
- VU#777338 - Two RCEs and a path traversal in SGLang (CERT/CC, CVE-2026-7301/7302/7304)
- VU#665416 - Unsafe pickle deserialization in SGLang (CERT/CC, CVE-2026-3059/3060/3989)
- SGLang CVE-2026-5760 (CVSS 9.8) Enables RCE via Malicious GGUF Model Files (The Hacker News)
- Three Remote Code Execution vulnerabilities in SGLang (Antiproof)
- ShadowMQ: How Code Reuse Spread Critical Vulnerabilities Across the AI Ecosystem (Oligo Security)
- Proof of Concept exploitation of CVE-2026-5760 (Stuart Beck, GitHub)
- sgl-project/sglang (GitHub)