vLLM Silently Ignores --trust-remote-code=False: Third RCE, CVE-2026-4944, Fixed in 0.18.0

Top/Articles/vLLM Ignores --trust-remote-code=False: Third RCE, CVE-2026-4944

NewsPublished May 29, 2026Last updated May 29, 2026

Table of contents

Key takeaways

vLLM silently overrides your --trust-remote-code=False via hardcoded True in two model files (nemotron_vl.py, kimi_k25.py). Malicious HuggingFace repos can trigger RCE. Fixed in vLLM 0.18.0. The third bypass in the series.

You set --trust-remote-code=False to opt out, but two model implementation files inside vLLM silently override your decision with a hardcoded trust_remote_code=True. Pointing the server at a malicious HuggingFace repository is enough to make the inference process execute remote Python that you explicitly refused.

In March 2026, the vLLM project published GitHub Security Advisory GHSA-7972-pg2x-xr59 (CVE-2026-4944, also tracked as CVE-2026-27893). The CVSS 3.0 score is 8.8, with vector AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H. Affected versions are vLLM 0.10.1 through 0.17.x, fixed in 0.18.0. PR #36192 was merged by Russell Bryant of Red Hat on March 6, 2026, and shipped in vLLM 0.18.0 on March 20.

This is the third trust_remote_code bypass discovered in vLLM. Following CVE-2025-66448 (the auto_map path) and CVE-2026-22807 (dynamic module loading in registry.py), this new one shows the same root pattern: the trust boundary is scattered across per-model implementation files, and any of them can hardcode-override the operator's security opt-out.

What happened — at a glance

A single table to capture the dangerous combination — which CVE, which files, which models, which versions, and what triggers it.

CVE	Affected files	Affected models	Affected versions	Fixed in	Trigger
CVE-2026-4944 (aka CVE-2026-27893)	`nemotron_vl.py:430` `kimi_k25.py:177`	Nemotron-VL family Kimi-K2.5 family	0.10.1 — 0.17.x	0.18.0	Point vLLM at a malicious HF repo via the model path

The unusual part is that even with --trust-remote-code=False (or VLLM_TRUST_REMOTE_CODE=0) explicitly set, the hardcoded True takes effect the instant the vulnerable code path is reached. No warning is logged. The RAXE Labs advisory describes the override as "silent".

What `trust_remote_code` actually means

In the HuggingFace Transformers world, a model repository can ship its own modeling_*.py and tokenization_*.py Python files. This lets new architectures distribute themselves without waiting for the upstream library to add explicit support.

The catch: those files are imported and executed the moment AutoModel.from_pretrained() or AutoConfig.from_pretrained() is called. Functionally, this is no different from running exec() on arbitrary Python pulled from the internet.

The trust_remote_code flag exists precisely so operators can opt in to that risk. HuggingFace's security documentation states that enabling trust_remote_code=True "allows execution of arbitrary Python code from remote, which seems similar in risk to running exec() on unvetted code fetched from the internet". An operator running vLLM with --trust-remote-code=False is therefore making an explicit security choice: under no circumstances do I allow remote Python execution on this server.

What makes CVE-2026-4944 ugly is that this choice is silently overruled — no warning, no exception, only when certain model paths are loaded.

Who wants this bug, and what do they walk off with

Three same-shape trust_remote_code bypasses in vLLM within roughly six months is not random noise; it means a specific class of attacker treats this surface as high-value. With production vLLM deployments in mind, here is who would actually pay to walk through this door, and what they take when they do.

The people who want this bug are concrete: Chinese and Eastern-European offensive AI research collectives, supply-chain operators flooding HuggingFace and PyPI with look-alike packages, rival AI startups hunting for a competitor's inference customer list, and ex-employees still nursing root credentials to their old company's internal LLM platform. What they want to pull out of an inference server is not abstract: the operator's OpenAI and Anthropic API keys, the HuggingFace private token, the AWS Bedrock cross-account role, cached customer contracts and engineering PDFs the in-house RAG pipeline ingested, raw conversation logs queued for fine-tuning, and the persistent SSH key sitting in the GPU cluster's secrets mount. The moment CVE-2026-4944 fires, every item above leaves the inference process's memory and environment for the attacker's terminal.

The reconnaissance step is shockingly cheap. HuggingFace download counts reveal who is shipping Nemotron-VL or Kimi-K2.5 into production, and company engineering blogs and job listings explicitly say "we run vLLM" and "we host Nemotron-VL for internal Q&A;". From there it is a short walk through typosquatting (publishing nvidia-ai/nemotron-vl-8b next to the legitimate nvidia/nemotron-vl-8b), abandoned-repo takeover, or a planted Slack message saying "use this fork, it's faster". Any path that nudges the operator into typing the malicious model name on the command line is enough. Once the load succeeds, the attacker owns the Python process: lateral movement onto the internal network, exfiltration of model weights, and quiet persistence via cron or systemd are all available in whatever order they prefer.

CVSS 8.8 captures the technical severity of taking one inference server. For a company operating an AI inference platform, what actually walks out is the customer data still sitting in the prompt cache, the dozens of terabytes of internal knowledge gathered for retraining, and the credibility of every "our AI is run securely" statement made to customers. Worst of all, the teams that did the right thing — explicitly setting --trust-remote-code=False — are exactly the ones who answer "we're safe" right before being owned.

How far the blast reaches — the three-layer table

Per operating pattern, here is what CVE-2026-4944 takes and what it does not take (i.e., where a separate CVE is needed).

Deployment pattern	In reach via CVE-2026-4944	Out of reach (needs another CVE)
Solo dev running vLLM on a home GPU	Home directory `~/.ssh` private keys Browser cookies	OS root itself Other users' 　home directories
SaaS inference provider	Every customer 　prompt in the 　vLLM process Stored API keys DB connection strings	Hypervisor Other tenants' 　containers 　(depends on K8s)
Enterprise LLM platform (internal RAG)	Indexed internal docs Write access to 　the vector DB Internal Slack/ 　Confluence tokens	Active Directory The auth server 　(may be in reach 　if egress is open)

One caveat about "OS root is out of reach": many real-world vLLM deployments still run as root, particularly inside containers that ship with USER root. NVIDIA's official inference container images and several tutorials still default to root, which collapses the boundary above.

Inside CVE-2026-4944 — where the hardcoded `True` lived

The vulnerable code lives in two model implementation files. The exact lines, taken from PR #36192:

CVE-2026-4944 (1/2): `nemotron_vl.py:430`

When loading the vision sub-model for NVIDIA's Nemotron-VL family, trust_remote_code=True was written inline.

Before (vllm/model_executor/models/nemotron_vl.py:430)

vision_model = AutoModel.from_config(
    config.vision_config,
    trust_remote_code=True,  # ← hardcoded here
)

After

vision_model = AutoModel.from_config(
    config.vision_config,
    trust_remote_code=self.config.model_config.trust_remote_code,
)

Even when the operator passes --trust-remote-code=False at startup, this file's code path forces True, causing the sub-component's modeling_*.py to be fetched from the remote repository and executed.

CVE-2026-4944 (2/2): `kimi_k25.py:177`

When loading the image_processor for Moonshot AI's Kimi-K2.5 family, the same pattern repeated.

Before (vllm/model_executor/models/kimi_k25.py:177)

image_processor = cached_get_image_processor(
    self.ctx.model_config.model,
    trust_remote_code=True,  # ← hardcoded here too
)

After

image_processor = cached_get_image_processor(
    self.ctx.model_config.model,
    trust_remote_code=self.ctx.model_config.trust_remote_code,
)

The fix is a trivial substitution: replace the literal True with self.config.model_config.trust_remote_code (or self.ctx.model_config.trust_remote_code) so the operator's setting actually propagates. The PR description states: "Replace hardcoded trust_remote_code=True with the user's configured trust_remote_code setting from model_config in both nemotron_vl.py and kimi_k25.py. This prevents bypassing the user's explicit --trust-remote-code=False security opt-out when loading sub-components."

These two lines should have been written this way from day one. Why they weren't connects directly to the "third bypass" structural story below.

The third RCE in the same family

This is the third trust_remote_code bypass in vLLM. The RAXE Labs advisory labels it explicitly as "3rd trust_remote_code bypass in vLLM" and frames it as a structural problem. The full lineage in card form:

← Swipe to navigate

RAXE Labs's structural critique is direct: vLLM's model implementations are scattered across individual files, each of which independently manages the propagation of trust_remote_code. With no centralized trust-boundary enforcement, every new model added by an engineer who absent-mindedly types trust_remote_code=True turns into a brand-new hole that overrides production's False setting. That is exactly the pattern repeating itself.

Same hole in LMDeploy — CVE-2026-46432

On May 21, 2026, the InternLM-maintained inference server LMDeploy got its own version of the same finding: CVE-2026-46432, a hardcoded trust_remote_code=True in model initialization.

All versions of LMDeploy before 0.13.0 are affected; the fix lives in PR #4511. Same shape as vLLM: if an attacker controls the model path, remote Python from a HuggingFace repository runs on the inference server.

vLLM, LMDeploy, and SGLang all wrap HuggingFace Transformers for model loading. That means each of them ships its own minefield of "propagation mistakes" scattered across per-model implementation files. Now that OSS model distribution is standard practice, continuously auditing inference servers for "is the trust setting correctly propagated everywhere?" is an unavoidable line item for any vector-search or RAG team. For supply-chain coverage we maintain a tracker — see our OSS supply chain scanner writeup.

Five things to do this week

If you operate vLLM in production, work through the following in order. #1 is urgent; #2–#5 within the week.

#	Action	What that actually means
1	Upgrade to vLLM 0.18.0+	`pip install -U vllm` to 0.18.0 or newer. If already on 0.18.0+, this CVE is no longer reachable.
2	Pin model `revision`	Use `--revision <commit-sha>` to lock the HuggingFace repo to a specific commit. Defeats post-hoc tampering.
3	Review `modeling_*.py` up front	Read the `auto_map` and `modeling_*.py` of any HF repo before letting it touch production vLLM.
4	Restrict container egress	Allow only HuggingFace Hub and the S3/GCS buckets you actually need. Block `169.254.169.254` and similar metadata endpoints.
5	Track the vLLM version in SBOM	Add vllm to Dependabot/ Renovate watchlists. Operate on the assumption that bypass #4 is coming.

Action #4 (egress restriction) is the broadest payoff — well beyond this single CVE. Even if remote code somehow gets exec()'d on your inference server, restrictive egress cuts off C2 callbacks, blocks credential theft from AWS IMDSv1, and stops lateral movement into the internal network.

Closing — "I opted out" is the most dangerous assumption

What CVE-2026-4944 shows is that the intuition "I set the security flag to False, so I'm safe" doesn't hold in modern, multi-layered AI infrastructure. OSS libraries are assembled from many sub-components, and there is no guarantee that the user's False propagates correctly through every one of them. The third bypass in the series is the third signal driving that point home.

The best the operator can do is (a) pin inference-server dependencies and keep upgrading, (b) read the modeling_*.py of every HuggingFace model you load, and (c) lock down container egress. Most importantly, stop treating "--trust-remote-code=False is set" as a sufficient guarantee. Not assuming that is the strongest defense available today.

If you have not upgraded to vLLM 0.18.0 yet, check the official release notes and apply within the week.

References

▸ NVD - CVE-2026-4944 (published May 29, 2026)
▸ GitHub Security Advisory - GHSA-7972-pg2x-xr59 (March 26, 2026)
▸ vLLM PR #36192 - Respect user trust_remote_code setting in NemotronVL and KimiK25 (merged March 6, 2026)
▸ RAXE Labs - RAXE-2026-044: vLLM Hardcoded trust_remote_code Bypass (March 27, 2026)
▸ SentinelOne Vulnerability Database - CVE-2025-66448
▸ SentinelOne Vulnerability Database - CVE-2026-22807
▸ GitLab Advisory Database - CVE-2026-46432 (LMDeploy) (May 21, 2026)
▸ vLLM Releases - 0.18.0 (released March 20, 2026)
▸ HuggingFace Transformers - Security Overview

Makoto Horikawa

Backend Engineer / AWS / Django

News

LLaMA-Factory RCE Flaw (CVE-2026-58116): An Exposed Web UI Lets Anyone Hijack the Server

June 30, 2026

crawl4ai-cve-2026-56265-docker-jwt-hardcoded-key-auth-bypass-cover-en-0707

News

Unauthenticated takeover in AI crawler Crawl4AI (CVE-2026-57572, CVSS 10.0): update to 0.9.0

June 22, 2026

picklescan-cve-2026-3490-detection-bypass-8-flaws-cover-en

News

Picklescan Can Be Bypassed: 8 Flaws Let Malicious AI Models Pass as Safe (CVE-2026-3490), Update to v1.0.4

June 18, 2026

spring-ai-cve-2026-47835-vector-store-query-injection-cover-en

News

Query-Injection Flaw in Spring AI Vector Stores: CVE-2026-47835, Update to 1.0.9 / 1.1.8 Now

June 16, 2026

guardrails-ai-cve-2026-45758-pypi-supply-chain-teampcp-cover-en

News

AI Component Guardrails AI Hit by a Poisoned Package: CVE-2026-45758, TeamPCP's New Target

June 6, 2026

langroid-cve-2026-25879-sqlchatagent-prompt-injection-rce-cover-en-update

News

New Langroid server-takeover flaw CVE-2026-54769 (CVSS 10.0): AI-written code gives unauth RCE — update to 0.65.2

June 2, 2026

What happened — at a glance

What `trust_remote_code` actually means

Who wants this bug, and what do they walk off with

How far the blast reaches — the three-layer table

Inside CVE-2026-4944 — where the hardcoded `True` lived

CVE-2026-4944 (1/2): `nemotron_vl.py:430`

CVE-2026-4944 (2/2): `kimi_k25.py:177`

The third RCE in the same family

Same hole in LMDeploy — CVE-2026-46432

Five things to do this week

Closing — "I opted out" is the most dangerous assumption

References

Related articles

LLaMA-Factory RCE Flaw (CVE-2026-58116): An Exposed Web UI Lets Anyone Hijack the Server

Unauthenticated takeover in AI crawler Crawl4AI (CVE-2026-57572, CVSS 10.0): update to 0.9.0

Picklescan Can Be Bypassed: 8 Flaws Let Malicious AI Models Pass as Safe (CVE-2026-3490), Update to v1.0.4

Query-Injection Flaw in Spring AI Vector Stores: CVE-2026-47835, Update to 1.0.9 / 1.1.8 Now

AI Component Guardrails AI Hit by a Poisoned Package: CVE-2026-45758, TeamPCP's New Target

New Langroid server-takeover flaw CVE-2026-54769 (CVSS 10.0): AI-written code gives unauth RCE — update to 0.65.2

Today

vLLM Ignores --trust-remote-code=False: Third RCE, CVE-2026-4944

What happened — at a glance

What trust_remote_code actually means

Who wants this bug, and what do they walk off with

How far the blast reaches — the three-layer table

Inside CVE-2026-4944 — where the hardcoded True lived

CVE-2026-4944 (1/2): nemotron_vl.py:430

CVE-2026-4944 (2/2): kimi_k25.py:177

The third RCE in the same family

Same hole in LMDeploy — CVE-2026-46432

Five things to do this week

Closing — "I opted out" is the most dangerous assumption

References

Related articles

LLaMA-Factory RCE Flaw (CVE-2026-58116): An Exposed Web UI Lets Anyone Hijack the Server

Unauthenticated takeover in AI crawler Crawl4AI (CVE-2026-57572, CVSS 10.0): update to 0.9.0

Picklescan Can Be Bypassed: 8 Flaws Let Malicious AI Models Pass as Safe (CVE-2026-3490), Update to v1.0.4

Query-Injection Flaw in Spring AI Vector Stores: CVE-2026-47835, Update to 1.0.9 / 1.1.8 Now

AI Component Guardrails AI Hit by a Poisoned Package: CVE-2026-45758, TeamPCP's New Target

New Langroid server-takeover flaw CVE-2026-54769 (CVSS 10.0): AI-written code gives unauth RCE — update to 0.65.2

Today

This Week

What `trust_remote_code` actually means

Inside CVE-2026-4944 — where the hardcoded `True` lived

CVE-2026-4944 (1/2): `nemotron_vl.py:430`

CVE-2026-4944 (2/2): `kimi_k25.py:177`