How do I configure OpenClaw to use a local vLLM server?

Run vLLM with an OpenAI-compatible `/v1` API, export `VLLM_API_KEY`, and set your agent default model to something like `vllm/your-model-id`. Then verify connectivity with `openclaw models list --provider vllm`.

How do I point OpenClaw at a remote vLLM instance on another host?

Use explicit provider config under `models.providers.vllm` and set `baseUrl` to your remote URL, for example `"http://192.168.1.50:9000/v1"`, along with `apiKey: "${VLLM_API_KEY}"` and your model definitions.

Does OpenClaw send OpenAI-specific features like service_tier or reasoning flags to vLLM?

No. vLLM is treated as a proxy-style OpenAI-compatible `/v1` backend, so native OpenAI request shaping, `service_tier`, response `store`, prompt-cache hints, and OpenAI reasoning-compat payload shaping are not applied, and hidden OpenClaw attribution headers are not injected on custom base URLs.

How does model auto-discovery work with the vLLM provider in OpenClaw?

When `VLLM_API_KEY` is set and you do not define `models.providers.vllm`, OpenClaw queries `GET http://127.0.0.1:8000/v1/models` and converts the returned IDs into model entries. If you set `models.providers.vllm` explicitly, auto-discovery is skipped and you must define models manually.

How do I add persistent memory to OpenClaw?

Mem0 is the official persistent-memory plugin for OpenClaw. Install it with `openclaw plugins install @mem0/openclaw-mem0`. After installation your agent stores and retrieves user memories automatically.

Model providers

Using vLLM with OpenClaw

3 min read

Browse more in Model providers.

All model providers guides →

This guide shows you how to connect a vLLM server to OpenClaw using the OpenAI-compatible `openai-completions` API. You will configure both auto-discovered and explicit vLLM models, including custom base URLs and token limits.

By the end, your OpenClaw agents will call vLLM models via the `vllm` provider ID.

Setup flow

Prerequisites

✓A running vLLM server exposing OpenAI-compatible `/v1` endpoints such as `/v1/models` and `/v1/chat/completions`.
✓Network access from your OpenClaw runtime to the vLLM base URL, commonly `http://127.0.0.1:8000/v1`.
✓The OpenClaw CLI installed so you can run `openclaw models list --provider vllm`.

Steps

1
Start vLLM with an OpenAI-compatible server
Run vLLM in OpenAI-compatible server mode so it exposes `/v1` endpoints that OpenClaw can call. Your base URL should serve `/v1/models` and `/v1/chat/completions`, and during local development it commonly runs on this address.
text
```
GET http://127.0.0.1:8000/v1/models
```
2
Set the VLLM_API_KEY environment variable
Set `VLLM_API_KEY` so OpenClaw knows to enable the vLLM provider and, if you do not configure it explicitly, to auto-discover models. If your vLLM server does not enforce auth, any non-empty value works as the opt-in signal.
bash
```
export VLLM_API_KEY="vllm-local"
```
3
Select a vLLM model for your agents
Point your agent defaults at a vLLM model by using the `vllm/` prefix and a model ID that exists on your vLLM server. This is the minimal configuration when you rely on auto-discovery and the default base URL.
json
```
{
  agents: {
    defaults: {
      model: { primary: "vllm/your-model-id" },
    },
  },
}
```
4
Verify that OpenClaw can list vLLM models
Use the models CLI to confirm that OpenClaw can reach vLLM and list available models. If this command fails or returns no models, you either have a connectivity issue or auto-discovery is disabled by explicit config.
bash
```
openclaw models list --provider vllm
```
5
Configure vLLM explicitly with local models and limits
Switch to explicit configuration when you need a non-default host/port, pinned `contextWindow` or `maxTokens`, or custom auth behavior. This block defines the `vllm` provider on the default base URL and declares a local model with cost and token settings.
json
```
{
  models: {
    providers: {
      vllm: {
        baseUrl: "http://127.0.0.1:8000/v1",
        apiKey: "${VLLM_API_KEY}",
        api: "openai-completions",
        models: [
          {
            id: "your-model-id",
            name: "Local vLLM Model",
            reasoning: false,
            input: ["text"],
            cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
            contextWindow: 128000,
            maxTokens: 8192,
          },
        ],
      },
    },
  },
}
```
6
Point OpenClaw at a remote vLLM server with a custom base URL
When vLLM runs on another host or port, set `baseUrl` in the provider config so OpenClaw targets the correct `/v1` endpoint. This example also shows how to define a remote model with its own context window and max token limits.
json
```
{
  models: {
    providers: {
      vllm: {
        baseUrl: "http://192.168.1.50:9000/v1",
        apiKey: "${VLLM_API_KEY}",
        api: "openai-completions",
        models: [
          {
            id: "my-custom-model",
            name: "Remote vLLM Model",
            reasoning: false,
            input: ["text"],
            contextWindow: 64000,
            maxTokens: 4096,
          },
        ],
      },
    },
  },
}
```

Configuration

Option	Description	Example
VLLM_API_KEY	Auth token for the vLLM OpenAI-compatible server and the opt-in signal that enables the vLLM provider and model auto-discovery when no explicit `models.providers.vllm` is defined.	vllm-local
models.providers.vllm.baseUrl	The base URL for the vLLM OpenAI-compatible `/v1` API that OpenClaw calls.	http://127.0.0.1:8000/v1
models.providers.vllm.apiKey	The API key value OpenClaw sends to vLLM, typically wired from `VLLM_API_KEY`.	${VLLM_API_KEY}
models.providers.vllm.api	The API type OpenClaw uses for vLLM; vLLM uses the OpenAI-compatible completions API.	openai-completions
agents.defaults.model.primary	The default primary model reference for your agents, using the `vllm/` prefix and a vLLM model ID.	vllm/your-model-id
models.providers.vllm.models[].contextWindow	The maximum context window size in tokens that OpenClaw assumes for the vLLM model.	128000
models.providers.vllm.models[].maxTokens	The maximum number of output tokens OpenClaw requests from the vLLM model.	8192

Troubleshooting

curl to the vLLM models endpoint fails or hangs when checking connectivity

OpenClaw cannot reach vLLM if the server is down, bound to a different host/port, or not running in OpenAI-compatible mode. Hit the models endpoint directly to confirm connectivity and that `/v1` is exposed.

bash

curl http://127.0.0.1:8000/v1/models

Requests to vLLM fail with auth errors even though VLLM_API_KEY is set

Your vLLM server likely expects a specific API key or header configuration. vllm` so you control the auth behavior.

No models appear when you run `openclaw models list --provider vllm`

vllm` config entry. If you have defined the provider manually, OpenClaw skips discovery and uses only your declared models, so add your vLLM models to the explicit config.

Frequently asked questions

Add persistent memory to OpenClaw

Official Mem0 plugin for OpenClaw keeps context across chats and tools. Smaller prompts, lower cost, better continuity for your agents.

Read the official Mem0 guide Open Mem0

More in Model providers

Model providers

Using Alibaba Model Studio with OpenClaw

Model providers

Using BytePlus with OpenClaw

Model providers

Using fal with OpenClaw

Model providers

Using GLM (Zhipu AI) with OpenClaw

Model providers

Using Hugging Face with OpenClaw

Model providers

Using Inferrs with OpenClaw

Prerequisites

Steps

Start vLLM with an OpenAI-compatible server

Set the VLLM_API_KEY environment variable

Select a vLLM model for your agents

Verify that OpenClaw can list vLLM models

Configure vLLM explicitly with local models and limits

Point OpenClaw at a remote vLLM server with a custom base URL

Configuration

Troubleshooting

Frequently asked questions

Add persistent memory to OpenClaw

More in Model providers