Skill Eval Harness is a Python CLI for testing whether an Agent Skill changes observable output. It reads evals/shared-benchmark.json, emits answer-key-safe task rows, grades files under eval-runs/, ...
Abstract: Precipitable water vapor (PWV) is critical to global climate dynamics, the terrestrial water cycle, and extreme weather events. However, current Moderate Resolution Imaging Spectroradiometer ...
Abstract: Timely delivery of delay-sensitive information over dynamic, heterogeneous networks is increasingly essential for a range of interactive applications, such as industrial automation, ...
Review Eval Framework Problem Agent Validator supports multiple code review adapters (Claude Code, Codex CLI, GitHub Copilot CLI), each configurable with different models, aliases, and thinking/effort ...