OpenClaw Skill
ml-model-eval-benchmark
Compare model candidates using weighted metrics and deterministic ranking outputs. Use for benchmark leaderboards and model promotion decisions.
Install
$npx clawhub@latest install ml-model-eval-benchmark
View on GitHubv0.1.0
All-time installs3
Active installs3
Stars0
ML Model Eval Benchmark
Overview
Produce consistent model ranking outputs from metric-weighted evaluation inputs.
Workflow
- Define metric weights and accepted metric ranges.
- Ingest model metrics for each candidate.
- Compute weighted score and ranking.
- Export leaderboard and promotion recommendation.
Use Bundled Resources
- Run
scripts/benchmark_models.pyto generate benchmark outputs. - Read
references/benchmarking-guide.mdfor weighting and tie-break guidance.
Guardrails
- Keep metric names and scales consistent across candidates.
- Record weighting assumptions in the output.
Created by
@0x-professorPersistent memory
Give your OpenClaw agent a memory layer
Mem0 remembers users and context across sessions so you send fewer tokens and get better answers.
Try Mem0Mem0 + OpenClaw guide