Model providers
Using Inferrs with OpenClaw
Browse more in Model providers.
All model providers guides →This guide shows you how to wire up Inferrs as an OpenAI-compatible backend for OpenClaw using the generic openai-completions path. You will start a local Inferrs server, verify it with curl, and register it as a provider so your agents can run against a Gemma 4 model.
By the end, you will have OpenClaw calling a local Inferrs-served model and a couple of flags tuned for the quirks of Inferrs’ chat API.
Prerequisites
- ✓An Inferrs installation with the `inferrs` CLI available on your PATH.
- ✓A local model that Inferrs can serve, such as `google/gemma-4-E2B-it`.
- ✓An OpenClaw setup where you can edit the agents and models configuration and run `openclaw infer model run`.
Steps
- 1
Start Inferrs with a local model
Start the Inferrs server and bind it to a host and port that OpenClaw can reach. 1:8080` using the `metal` device, which the later OpenClaw config expects.
bashinferrs serve google/gemma-4-E2B-it \ --host 127.0.0.1 \ --port 8080 \ --device metal - 2
Verify the Inferrs server is reachable
Before touching OpenClaw, confirm that Inferrs is actually listening and exposing the OpenAI-compatible endpoints. These curl checks hit the health probe and list models; if either fails, fix Inferrs networking or model loading first.
bashcurl http://127.0.0.1:8080/health curl http://127.0.0.1:8080/v1/models - 3
Configure Inferrs as an OpenClaw provider
Add a provider entry that points OpenClaw at your Inferrs `/v1` base URL and describes the model capabilities. requiresStringContent` so OpenClaw flattens content into plain strings for Inferrs.
json{ agents: { defaults: { model: { primary: "inferrs/google/gemma-4-E2B-it" }, models: { "inferrs/google/gemma-4-E2B-it": { alias: "Gemma 4 (inferrs)", }, }, }, }, models: { mode: "merge", providers: { inferrs: { baseUrl: "http://127.0.0.1:8080/v1", apiKey: "inferrs-local", api: "openai-completions", models: [ { id: "google/gemma-4-E2B-it", name: "Gemma 4 E2B (inferrs)", reasoning: false, input: ["text"], cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 }, contextWindow: 131072, maxTokens: 4096, compat: { requiresStringContent: true, }, }, ], }, }, }, } - 4
Run a manual Inferrs chat completion smoke test
Test Inferrs directly with a minimal `/v1/chat/completions` request to confirm the model responds before involving OpenClaw. This isolates Inferrs issues from OpenClaw configuration problems and gives you a known-good baseline.
bashcurl http://127.0.0.1:8080/v1/chat/completions \ -H 'content-type: application/json' \ -d '{"model":"google/gemma-4-E2B-it","messages":[{"role":"user","content":"What is 2 + 2?"}],"stream":false}' - 5
Run an OpenClaw model inference against Inferrs
Once the direct curl call works, exercise the full OpenClaw → Inferrs path. This command uses the configured `inferrs/google/gemma-4-E2B-it` model and returns JSON so you can see exactly what OpenClaw got back.
bashopenclaw infer model run \ --model inferrs/google/gemma-4-E2B-it \ --prompt "What is 2 + 2? Reply with one short sentence." \ --json - 6
Tune Inferrs compatibility flags for Gemma and tools
If you see schema errors or tool-related crashes, adjust the `compat` block for your Inferrs model. Disabling tools with `supportsTools: false` can help when Gemma accepts small direct calls but fails on full agent turns.
textcompat: { requiresStringContent: true, supportsTools: false }
Configuration
| Option | Description | Example |
|---|---|---|
| agents.defaults.model.primary | Sets the primary default model for agents, here pointing to the Inferrs-backed Gemma 4 model. | inferrs/google/gemma-4-E2B-it |
| agents.defaults.models["inferrs/google/gemma-4-E2B-it"].alias | Human-friendly alias for the Inferrs Gemma 4 model shown in OpenClaw. | Gemma 4 (inferrs) |
| models.mode | Controls how the models configuration is applied; `merge` merges with existing providers. | merge |
| models.providers.inferrs.baseUrl | Base URL for the Inferrs OpenAI-compatible `/v1` API that OpenClaw calls. | http://127.0.0.1:8080/v1 |
| models.providers.inferrs.apiKey | API key value OpenClaw sends to Inferrs; for local setups this can be a placeholder. | inferrs-local |
| models.providers.inferrs.api | Specifies that this provider uses the generic OpenAI completions-compatible path. | openai-completions |
| models.providers.inferrs.models[0].id | Model identifier as exposed by Inferrs, used in requests to the `/v1` API. | google/gemma-4-E2B-it |
| models.providers.inferrs.models[0].name | Display name for the Inferrs model inside OpenClaw. | Gemma 4 E2B (inferrs) |
| models.providers.inferrs.models[0].reasoning | Flags whether the model supports reasoning features; set to false for this Inferrs Gemma model. | false |
| models.providers.inferrs.models[0].input | Lists the input modalities supported by the model; Inferrs Gemma here accepts text. | ["text"] |
| models.providers.inferrs.models[0].cost.input | Per-token input cost for the Inferrs model, set to 0 for local usage. | 0 |
| models.providers.inferrs.models[0].cost.output | Per-token output cost for the Inferrs model, set to 0 for local usage. | 0 |
| models.providers.inferrs.models[0].cost.cacheRead | Cost for cache reads, set to 0 for this Inferrs configuration. | 0 |
| models.providers.inferrs.models[0].cost.cacheWrite | Cost for cache writes, set to 0 for this Inferrs configuration. | 0 |
| models.providers.inferrs.models[0].contextWindow | Maximum context window size in tokens for the Inferrs Gemma model. | 131072 |
| models.providers.inferrs.models[0].maxTokens | Maximum number of tokens the model can generate in a single completion. | 4096 |
| models.providers.inferrs.models[0].compat.requiresStringContent | When true, OpenClaw flattens content parts into plain strings to satisfy Inferrs chat routes that only accept string `messages[].content`. | true |
| models.providers.inferrs.models[0].compat.supportsTools | When set to false, disables OpenClaw’s tool schema surface for this Inferrs model to avoid tool-related crashes. | false |
Troubleshooting
curl /v1/models fails
This usually means Inferrs is not running, not reachable, or not bound to the host/port you configured. Start Inferrs with the expected `--host` and `--port` values and re-run the health and models curl checks to confirm it is listening.
messages[1].content: invalid type: sequence, expected a string
content`. requiresStringContent: true` so OpenClaw flattens pure text content parts into strings before sending the request.
compat: {
requiresStringContent: true
}Direct /v1/chat/completions calls pass but openclaw infer model run fails
If a small direct curl request works but `openclaw infer model run` fails, the tool schema surface may be too heavy for your Inferrs + Gemma combo. supportsTools: false` in the model entry to disable tools and reduce prompt pressure.
compat: {
requiresStringContent: true,
supportsTools: false
}inferrs still crashes on larger agent turns
When schema errors are gone but Inferrs continues to crash on larger agent turns, you are likely hitting an upstream Inferrs or model limitation. Reduce prompt size or switch to a different local backend or model, since OpenClaw’s transport layer is already sending a compatible payload.
Frequently asked questions
Powered by Mem0
Add persistent memory to OpenClaw
Official Mem0 plugin for OpenClaw keeps context across chats and tools. Smaller prompts, lower cost, better continuity for your agents.