Researchers Just Scanned 1 Million Exposed AI Services. 31% of Public Ollama Servers Will Answer Anonymous Prompts. Here's the 10-Minute Saturday Fix.

Researchers ran a 293-day scan of the public internet, identified roughly 1 million exposed AI services across 2 million hosts, and found 175,000 publicly exposed Ollama servers across 130 countries. They tested a sample of 5,200 of those Ollama servers. 31% responded to anonymous queries with a model already loaded and ready to answer. Of those, 518 were wrapping paid frontier models from Anthropic, OpenAI, Google, DeepSeek, and Moonshot — meaning a stranger on the internet could burn through someone else's API budget by hitting an unauthenticated endpoint.

If you're a solo operator who self-hosts an LLM for any reason — local agent, internal tool, side project, Raspberry Pi experiment — there's a meaningful chance you're in this scan. The fix is mostly a 10-minute config change. Here's the audit and the patch path I just ran on my own infra.

What the scan actually found

The 1 million number is the headline but the breakdown matters. The 2 million hosts include vector databases, MCP servers, embedding endpoints, and prompt-routing infrastructure — not just LLM serving. Roughly 175,000 of those hosts were running Ollama specifically, which is the part most relevant to indie operators because Ollama is the default "spin up a local LLM in five minutes" tool.

Of the Ollama instances tested, 31% responded to a /api/tags call with a model loaded. That endpoint requires no authentication by default — it's the same endpoint the local Ollama UI uses to populate its model list. If it returns a model, the same server will accept a /api/generate call, also without authentication, and run inference on whatever prompt the attacker sends.

The most disturbing finding wasn't the unauthenticated Ollama instances themselves. It was that 518 of them were configured as proxies to paid frontier APIs. Someone, somewhere, set up Ollama as a routing layer in front of their Anthropic or OpenAI key, exposed it to the internet without authentication, and is now funding strangers' model calls. The API keys in question are presumably going to get rotated as soon as the operators notice the bill, but the operational pattern — "I'll just put my key behind a local proxy and forget about it" — is exactly the pattern indie operators reach for, and it's the pattern that produced the most expensive failures in the scan.

Why solo operators are uniquely in the blast radius

Most of the indie crowd self-hosts an LLM for at least one of three reasons. A local agent on a homelab box. A side-project demo deployed to a $5/month VPS. An "I'll move it to private later" experiment that became permanent because it was working fine.

All three patterns produce exactly the configuration the scan caught: Ollama bound to 0.0.0.0 (so it's reachable from outside the host), no authentication enabled (because Ollama doesn't enable any by default), port 11434 open to the internet (because most VPS providers don't firewall it by default), fresh install with default settings (because that's what the tutorial said to do).

The default Ollama install does not enable authentication. The most popular indie-developer YouTube tutorials walking through Ollama setup also don't enable authentication. The result is a 175,000-server attack surface that the indie ecosystem helped build through an accumulated pattern of "ship the demo and worry about security later" tutorials.

This isn't a criticism of Ollama specifically — most self-hosted LLM tools have the same default. It's a criticism of the operational pattern that's spread across the indie ecosystem in the last 18 months as local LLMs got good enough to deploy.

The 10-minute Saturday fix

In order, with the actual commands:

Step 1: Check whether you have an exposed Ollama instance. From a network that is not your home network — your phone on cellular works fine — run:

curl http://your-server-ip:11434/api/tags

If it responds with a JSON list of models, you're exposed. If the connection refuses or times out, you're either firewalled or bound to localhost. Repeat this for every server you've ever run Ollama on. The forgotten-VPS case is the one that gets people.

Step 2: Bind Ollama to localhost only. Set the environment variable:

OLLAMA_HOST=127.0.0.1

Add it to your shell profile or systemd unit, restart the Ollama service. This single change fixes the majority of cases. The model is still accessible from the same host, but no longer from the internet.

Step 3: If you actually need remote access, put Cloudflare Tunnel or Tailscale in front of it. Cloudflare Tunnel is free and gives you authenticated access via a Cloudflare-fronted URL. Tailscale gives you a private network where the server is only reachable by your enrolled devices. Pick whichever fits your workflow. Never expose port 11434 directly to the internet, even with "I'll add auth later" intent — the "later" never comes.

Step 4: Audit your reverse-proxy logs. If you've been running an Ollama instance with a reverse proxy (Caddy, Nginx, Traefik), grep the access logs for hits to /api/tags, /api/generate, and /api/chat over the last 30 days. Filter for IP addresses you don't recognize. If you find unfamiliar traffic from outside your usual IPs, you've been hit.

Step 5: If you found unfamiliar traffic, rotate API keys. Specifically, rotate any API keys the proxied service had access to — Anthropic, OpenAI, whatever was on the routing path. Prompt injection of cached system prompts is a real exfiltration vector for these scans, and "they were just hitting it for cheap inference" is the optimistic interpretation. The pessimistic interpretation is that they pulled your system prompt and any embedded credentials.

The whole audit takes about 10 minutes per server unless you find something, in which case it takes a couple of hours to clean up. Worth doing this Saturday before the scan results spread and the weaponization rate goes up.

The 16-day pattern this is the latest example of

Solo operators leak attack surface through self-hosted infrastructure, and the rate has been accelerating. A short timeline of the last 16 days:

April 21: LMDeploy SSRF against self-hosted inference servers. April 26: Bitwarden npm package compromise via stolen 2FA on a maintainer account. April 28: LeRobot pickle RCE in the robotics control plane and GitHub CVE-2026-3854 platform-layer git push RCE. April 30: cPanel zero-day under active exploitation, Linux kernel local privilege escalation (CVE-2026-31431) affecting every kernel since 2017, and the CanisterWorm self-propagating npm worm hitting SAP and Intercom packages. May 4: MOVEit Automation auth bypass (CVE-2026-4670). And today, the structural Ollama exposure scan.

That's eight distinct security wedges in 16 days, all hitting infrastructure that solo operators specifically over-index on. The honest takeaway is that "self-hosted" is no longer a safe default. It's a posture that requires active maintenance the same way "production code" requires it. The "spin it up and forget" model is gone.

This isn't a panic-migrate-everything-to-managed-services argument. It's an honest accounting of the operational overhead self-hosting now requires. If you have the operational chops and the time, self-hosting still makes sense for the right workloads. If you don't, the math has shifted.

The MCP server angle nobody is connecting

The same scan caught roughly 80,000 exposed MCP server instances. MCP is the protocol indie devs are doing the most rapid prototyping in right now, and the tooling has the same "default unauthenticated" problem Ollama has — the mcp dev command spins up a server bound to all interfaces with no auth, because that's what makes the dev loop fast.

If you've shipped an MCP server to a client or to your own production stack in the last 90 days, the audit needs to extend there. From outside your network, run mcp inspect against your endpoint and verify whether tool calls require authentication. If they don't, the same "anonymous user can run inference on your dime" problem applies, except now they can also call any tool you've exposed — including ones that touch your file system, your databases, or your APIs.

The fix pattern is the same: bind to localhost by default, use a tunnel or VPN for remote access, and add authentication before you expose anything to the internet. None of this is hard. All of it is easy to skip, which is why it gets skipped.

Should you stop self-hosting?

Honest answer for most indie workloads: yes.

The DeepSeek V3.2 / Qwen 3.6 / GLM-4.7 cohort runs at $0.11 per million tokens on managed providers — Together, Fireworks, OpenRouter. Below 100 million tokens per month, the operational overhead of patching, securing, and monitoring a self-hosted instance now exceeds the cost differential. You're paying $11 a month in inference plus an unbounded number of hours in security upkeep, versus $0 in inference plus the time it took to swap providers.

The exception, and it's a real one: legitimate privacy or air-gap requirements. Working with PHI under HIPAA. Working with regulated data under GDPR or local data-sovereignty laws. Customer contracts that prohibit cloud inference. In those cases, self-hosting still justifies itself, but the security posture has to match the use case. That means real authentication, real network segmentation, real audit logging, and real monthly maintenance — not the "fresh Ollama install on a $5 VPS" pattern most of the 175,000 exposed servers represent.

The shift from "self-hosting is the default for indie LLM workloads" to "managed inference is the default and self-hosting is the exception" is the right one for most operators in 2026. The scan is the latest data point arguing for it.

What to do right now

Three things, in order. First, run the curl check from a non-home network against every server you've ever run Ollama on. Five minutes per server. Second, if any of them respond, set OLLAMA_HOST=127.0.0.1 and restart. Five minutes per server. Third, if you find unfamiliar traffic in your access logs, rotate every API key that was on the routing path. An hour or two depending on how many keys.

If all of your Ollama instances pass the check on the first try, congratulations — you've already done the security work the indie ecosystem needs to make standard. If any of them fail, you're now in the 31%, and Saturday is when to fix it.

Researchers Just Scanned 1 Million Exposed AI Services. 31% of Public Ollama Servers Will Answer Anonymous Prompts. Here's the 10-Minute Saturday Fix.

Researchers Just Scanned 1 Million Exposed AI Services. 31% of Public Ollama Servers Will Answer Anonymous Prompts. Here's the 10-Minute Saturday Fix.

What the scan actually found

Why solo operators are uniquely in the blast radius

The 10-minute Saturday fix

The 16-day pattern this is the latest example of

The MCP server angle nobody is connecting

Should you stop self-hosting?

What to do right now

Sources

Stay in the Loop

Related Posts

Anthropic Just Stood Up a $1.5B Services Company With Blackstone and Goldman. Here's What Happens to the Indie AI Consultants Doing This Work Today.

Anthropic Just Shipped 10 Pre-Built Finance Agents. I Rated All Ten by Indie Utility — Three Are Genuinely Useful, Seven Aren't.

Cerebras Just Filed for a $3.5B IPO at $26.6B With $10B in Order Demand. Here's the Indie Inference Read That Most of the Coverage Will Miss.