Model providers
Using vLLM with OpenClaw
Browse more in Model providers.
All model providers guides →This guide shows you how to connect a vLLM server to OpenClaw using the OpenAI-compatible `openai-completions` API. You will configure both auto-discovered and explicit vLLM models, including custom base URLs and token limits.
By the end, your OpenClaw agents will call vLLM models via the `vllm` provider ID.
Prerequisites
- ✓A running vLLM server exposing OpenAI-compatible `/v1` endpoints such as `/v1/models` and `/v1/chat/completions`.
- ✓Network access from your OpenClaw runtime to the vLLM base URL, commonly `http://127.0.0.1:8000/v1`.
- ✓The OpenClaw CLI installed so you can run `openclaw models list --provider vllm`.
Steps
- 1
Start vLLM with an OpenAI-compatible server
Run vLLM in OpenAI-compatible server mode so it exposes `/v1` endpoints that OpenClaw can call. Your base URL should serve `/v1/models` and `/v1/chat/completions`, and during local development it commonly runs on this address.
textGET http://127.0.0.1:8000/v1/models - 2
Set the VLLM_API_KEY environment variable
Set `VLLM_API_KEY` so OpenClaw knows to enable the vLLM provider and, if you do not configure it explicitly, to auto-discover models. If your vLLM server does not enforce auth, any non-empty value works as the opt-in signal.
bashexport VLLM_API_KEY="vllm-local" - 3
Select a vLLM model for your agents
Point your agent defaults at a vLLM model by using the `vllm/` prefix and a model ID that exists on your vLLM server. This is the minimal configuration when you rely on auto-discovery and the default base URL.
json{ agents: { defaults: { model: { primary: "vllm/your-model-id" }, }, }, } - 4
Verify that OpenClaw can list vLLM models
Use the models CLI to confirm that OpenClaw can reach vLLM and list available models. If this command fails or returns no models, you either have a connectivity issue or auto-discovery is disabled by explicit config.
bashopenclaw models list --provider vllm - 5
Configure vLLM explicitly with local models and limits
Switch to explicit configuration when you need a non-default host/port, pinned `contextWindow` or `maxTokens`, or custom auth behavior. This block defines the `vllm` provider on the default base URL and declares a local model with cost and token settings.
json{ models: { providers: { vllm: { baseUrl: "http://127.0.0.1:8000/v1", apiKey: "${VLLM_API_KEY}", api: "openai-completions", models: [ { id: "your-model-id", name: "Local vLLM Model", reasoning: false, input: ["text"], cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 }, contextWindow: 128000, maxTokens: 8192, }, ], }, }, }, } - 6
Point OpenClaw at a remote vLLM server with a custom base URL
When vLLM runs on another host or port, set `baseUrl` in the provider config so OpenClaw targets the correct `/v1` endpoint. This example also shows how to define a remote model with its own context window and max token limits.
json{ models: { providers: { vllm: { baseUrl: "http://192.168.1.50:9000/v1", apiKey: "${VLLM_API_KEY}", api: "openai-completions", models: [ { id: "my-custom-model", name: "Remote vLLM Model", reasoning: false, input: ["text"], contextWindow: 64000, maxTokens: 4096, }, ], }, }, }, }
Configuration
| Option | Description | Example |
|---|---|---|
| VLLM_API_KEY | Auth token for the vLLM OpenAI-compatible server and the opt-in signal that enables the vLLM provider and model auto-discovery when no explicit `models.providers.vllm` is defined. | vllm-local |
| models.providers.vllm.baseUrl | The base URL for the vLLM OpenAI-compatible `/v1` API that OpenClaw calls. | http://127.0.0.1:8000/v1 |
| models.providers.vllm.apiKey | The API key value OpenClaw sends to vLLM, typically wired from `VLLM_API_KEY`. | ${VLLM_API_KEY} |
| models.providers.vllm.api | The API type OpenClaw uses for vLLM; vLLM uses the OpenAI-compatible completions API. | openai-completions |
| agents.defaults.model.primary | The default primary model reference for your agents, using the `vllm/` prefix and a vLLM model ID. | vllm/your-model-id |
| models.providers.vllm.models[].contextWindow | The maximum context window size in tokens that OpenClaw assumes for the vLLM model. | 128000 |
| models.providers.vllm.models[].maxTokens | The maximum number of output tokens OpenClaw requests from the vLLM model. | 8192 |
Troubleshooting
curl to the vLLM models endpoint fails or hangs when checking connectivity
OpenClaw cannot reach vLLM if the server is down, bound to a different host/port, or not running in OpenAI-compatible mode. Hit the models endpoint directly to confirm connectivity and that `/v1` is exposed.
curl http://127.0.0.1:8000/v1/modelsRequests to vLLM fail with auth errors even though VLLM_API_KEY is set
Your vLLM server likely expects a specific API key or header configuration. vllm` so you control the auth behavior.
No models appear when you run `openclaw models list --provider vllm`
vllm` config entry. If you have defined the provider manually, OpenClaw skips discovery and uses only your declared models, so add your vLLM models to the explicit config.
Frequently asked questions
Powered by Mem0
Add persistent memory to OpenClaw
Official Mem0 plugin for OpenClaw keeps context across chats and tools. Smaller prompts, lower cost, better continuity for your agents.