Model providers

Using Inferrs with OpenClaw

3 min read

Browse more in Model providers.

All model providers guides →

This guide shows you how to wire up Inferrs as an OpenAI-compatible backend for OpenClaw using the generic openai-completions path. You will start a local Inferrs server, verify it with curl, and register it as a provider so your agents can run against a Gemma 4 model.

By the end, you will have OpenClaw calling a local Inferrs-served model and a couple of flags tuned for the quirks of Inferrs’ chat API.

Setup flow

Prerequisites

  • An Inferrs installation with the `inferrs` CLI available on your PATH.
  • A local model that Inferrs can serve, such as `google/gemma-4-E2B-it`.
  • An OpenClaw setup where you can edit the agents and models configuration and run `openclaw infer model run`.

Steps

  1. 1

    Start Inferrs with a local model

    Start the Inferrs server and bind it to a host and port that OpenClaw can reach. 1:8080` using the `metal` device, which the later OpenClaw config expects.

    bash
    inferrs serve google/gemma-4-E2B-it \
      --host 127.0.0.1 \
      --port 8080 \
      --device metal
  2. 2

    Verify the Inferrs server is reachable

    Before touching OpenClaw, confirm that Inferrs is actually listening and exposing the OpenAI-compatible endpoints. These curl checks hit the health probe and list models; if either fails, fix Inferrs networking or model loading first.

    bash
    curl http://127.0.0.1:8080/health
    curl http://127.0.0.1:8080/v1/models
  3. 3

    Configure Inferrs as an OpenClaw provider

    Add a provider entry that points OpenClaw at your Inferrs `/v1` base URL and describes the model capabilities. requiresStringContent` so OpenClaw flattens content into plain strings for Inferrs.

    json
    {
      agents: {
        defaults: {
          model: { primary: "inferrs/google/gemma-4-E2B-it" },
          models: {
            "inferrs/google/gemma-4-E2B-it": {
              alias: "Gemma 4 (inferrs)",
            },
          },
        },
      },
      models: {
        mode: "merge",
        providers: {
          inferrs: {
            baseUrl: "http://127.0.0.1:8080/v1",
            apiKey: "inferrs-local",
            api: "openai-completions",
            models: [
              {
                id: "google/gemma-4-E2B-it",
                name: "Gemma 4 E2B (inferrs)",
                reasoning: false,
                input: ["text"],
                cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
                contextWindow: 131072,
                maxTokens: 4096,
                compat: {
                  requiresStringContent: true,
                },
              },
            ],
          },
        },
      },
    }
  4. 4

    Run a manual Inferrs chat completion smoke test

    Test Inferrs directly with a minimal `/v1/chat/completions` request to confirm the model responds before involving OpenClaw. This isolates Inferrs issues from OpenClaw configuration problems and gives you a known-good baseline.

    bash
    curl http://127.0.0.1:8080/v1/chat/completions \
      -H 'content-type: application/json' \
      -d '{"model":"google/gemma-4-E2B-it","messages":[{"role":"user","content":"What is 2 + 2?"}],"stream":false}'
  5. 5

    Run an OpenClaw model inference against Inferrs

    Once the direct curl call works, exercise the full OpenClaw → Inferrs path. This command uses the configured `inferrs/google/gemma-4-E2B-it` model and returns JSON so you can see exactly what OpenClaw got back.

    bash
    openclaw infer model run \
      --model inferrs/google/gemma-4-E2B-it \
      --prompt "What is 2 + 2? Reply with one short sentence." \
      --json
  6. 6

    Tune Inferrs compatibility flags for Gemma and tools

    If you see schema errors or tool-related crashes, adjust the `compat` block for your Inferrs model. Disabling tools with `supportsTools: false` can help when Gemma accepts small direct calls but fails on full agent turns.

    text
    compat: {
      requiresStringContent: true,
      supportsTools: false
    }

Configuration

OptionDescriptionExample
agents.defaults.model.primarySets the primary default model for agents, here pointing to the Inferrs-backed Gemma 4 model.inferrs/google/gemma-4-E2B-it
agents.defaults.models["inferrs/google/gemma-4-E2B-it"].aliasHuman-friendly alias for the Inferrs Gemma 4 model shown in OpenClaw.Gemma 4 (inferrs)
models.modeControls how the models configuration is applied; `merge` merges with existing providers.merge
models.providers.inferrs.baseUrlBase URL for the Inferrs OpenAI-compatible `/v1` API that OpenClaw calls.http://127.0.0.1:8080/v1
models.providers.inferrs.apiKeyAPI key value OpenClaw sends to Inferrs; for local setups this can be a placeholder.inferrs-local
models.providers.inferrs.apiSpecifies that this provider uses the generic OpenAI completions-compatible path.openai-completions
models.providers.inferrs.models[0].idModel identifier as exposed by Inferrs, used in requests to the `/v1` API.google/gemma-4-E2B-it
models.providers.inferrs.models[0].nameDisplay name for the Inferrs model inside OpenClaw.Gemma 4 E2B (inferrs)
models.providers.inferrs.models[0].reasoningFlags whether the model supports reasoning features; set to false for this Inferrs Gemma model.false
models.providers.inferrs.models[0].inputLists the input modalities supported by the model; Inferrs Gemma here accepts text.["text"]
models.providers.inferrs.models[0].cost.inputPer-token input cost for the Inferrs model, set to 0 for local usage.0
models.providers.inferrs.models[0].cost.outputPer-token output cost for the Inferrs model, set to 0 for local usage.0
models.providers.inferrs.models[0].cost.cacheReadCost for cache reads, set to 0 for this Inferrs configuration.0
models.providers.inferrs.models[0].cost.cacheWriteCost for cache writes, set to 0 for this Inferrs configuration.0
models.providers.inferrs.models[0].contextWindowMaximum context window size in tokens for the Inferrs Gemma model.131072
models.providers.inferrs.models[0].maxTokensMaximum number of tokens the model can generate in a single completion.4096
models.providers.inferrs.models[0].compat.requiresStringContentWhen true, OpenClaw flattens content parts into plain strings to satisfy Inferrs chat routes that only accept string `messages[].content`.true
models.providers.inferrs.models[0].compat.supportsToolsWhen set to false, disables OpenClaw’s tool schema surface for this Inferrs model to avoid tool-related crashes.false

Troubleshooting

curl /v1/models fails

This usually means Inferrs is not running, not reachable, or not bound to the host/port you configured. Start Inferrs with the expected `--host` and `--port` values and re-run the health and models curl checks to confirm it is listening.

messages[1].content: invalid type: sequence, expected a string

content`. requiresStringContent: true` so OpenClaw flattens pure text content parts into strings before sending the request.

bash
compat: {
  requiresStringContent: true
}

Direct /v1/chat/completions calls pass but openclaw infer model run fails

If a small direct curl request works but `openclaw infer model run` fails, the tool schema surface may be too heavy for your Inferrs + Gemma combo. supportsTools: false` in the model entry to disable tools and reduce prompt pressure.

bash
compat: {
  requiresStringContent: true,
  supportsTools: false
}

inferrs still crashes on larger agent turns

When schema errors are gone but Inferrs continues to crash on larger agent turns, you are likely hitting an upstream Inferrs or model limitation. Reduce prompt size or switch to a different local backend or model, since OpenClaw’s transport layer is already sending a compatible payload.

Frequently asked questions

Powered by Mem0

Add persistent memory to OpenClaw

Official Mem0 plugin for OpenClaw keeps context across chats and tools. Smaller prompts, lower cost, better continuity for your agents.

More in Model providers