iohann.s.titov@gmail.com Ivan S. Titov Wire dependencies for OpenAI/HF Inference / generic HTTP API model backends Enable instruction-following evaluation tasks (leaderboard_ifeval and friends) — checks generated text against structural constraints like length limits, language hints, and required keywords. Auto-downloads the NLTK punkt_tab tokenizer at task-load time (not deferred until eval); seed ~/nltk_data ahead of time on offline hosts. Enable math-grading tasks (minerva_math, leaderboard math, hendrycks_math, etc.) — parses LaTeX answers and verifies symbolic equality between predicted and ground-truth solutions Pull sci-ml/sentencepiece for tasks that tokenise via SentencePiece Pull dev-python/statsmodels for the discrim_eval task family Wire dev-python/vllm for the vLLM model backend

lm-eval

EleutherAI/lm-evaluation-harness