DevQualityEval v1.0https://github.com/symflower/eval-dev-qualityRuby score0%25%50%75%100%Percentage of total possible score across all tasks for language Ruby (higher is better)DeepSeek: R1 Distill Qwen 1.5BMeta: Llama 3.2 1B (Instruct)Mistral: Mixtral 8x7B (Base) (v0.1)Mistral: Mistral Tiny (v0.3)XWin-LM: Xwin 70BLiquid: LFM 7BLiquid: LFM 3BCohere: Command R (08-2024)Meta: Llama 3.2 3B (Instruct)Liquid: LFM MoE 40BCohere: CommandMistral: Mistral 7B (Instruct)Cognitive Computations: Dolphin 2.6 Mixtral 8x7BMicrosoft: WizardLM-2 7BMicrosoft: Phi-3 Medium (Instruct) (128K)NousResearch: Hermes 13BDeepSeek: DeepSeek R1 Distill Qwen 14BQwen: Qwen 2 7B (Instruct)Google: Gemma 2 27BMistral: Mistral 7B (Instruct) (v0.3)Cohere: Command R (03-2024)NousResearch: Hermes 2 Mixtral 8x7B (DPO)Teknium: OpenHermes 2.5 Mistral 7BJon Durbin: Airoboros 70BPerplexity: Llama 3.1 Sonar 8BOpenChat: OpenChat 3.5 7BMeta: Llama 3 8B (Instruct)Microsoft: WizardLM-2 8x22BMicrosoft: Phi-3 Mini (Instruct) (128K)Google: Gemma 2 9BGoogle: Gemini Pro 1.5Microsoft: Phi-3.5 Mini (Instruct) (128K)Cohere: Command R7B (12-2024)NousResearch: Hermes 3 405B (Instruct)Meta: Llama 3.1 8B (Instruct)DeepSeek: DeepSeek R1 Distill Llama 70BMistral: Mistral NeMo (v24.07)Mistral: Mixtral 8x22B (Instruct) (v0.1)Cohere: Command R+ (08-2024)Mistral: Mistral Small (v24.02)Cohere: Command R+ (04-2024)Cognitive Computations: Dolphin 2.9.2 Mixtral8x22BMistral: Pixtral 12B (v2409)Mistral: Codestral MambaAmazon: Nova Micro 1.0NousResearch: Hermes 2 Pro - Llama-3 8BAionLabs: Aion-1.0-MiniQwen: Qwen2.5 7B (Instruct)AI21: Jamba 1.5 LargeMistral: Mixtral 8x7B (Instruct) (v0.1)NousResearch: Hermes 3 70B (Instruct)Mistral: Mistral MediumMistral: Ministral 3BDatabricks: DBRX 132B (Instruct)Qwen: QwQ 32BDeepSeek: DeepSeek R1 Distill Qwen 32BQwen: Qwen 2 72B (Instruct)AI21: Jamba-InstructMistral: Ministral 8BMistral: Mistral Small 3NVIDIA: Llama 3.1 Nemotron 70B (Instruct)DeepSeek: DeepSeek R1AI21: Jamba 1.5 MiniQwen: Qwen-Turbo (2024-11-01)Mistral: Pixtral Large (2411)Mistral: Mistral Large 2 (2411)Anthropic: Claude 3 SonnetMeta: Llama 3 70B (Instruct)Amazon: Nova Lite 1.0Amazon: Nova Pro 1.0Google: Gemini Flash 1.5 8BMistral: Mistral Large 2 (2407)AionLabs: Aion-1.0Meta: Llama 3.1 70B (Instruct)Anthropic: Claude 3 Opus01.AI: Yi LargeAnthropic: Claude 3 HaikuMeta: Llama 3.1 405B (Instruct)Meta: Llama 3.3 70B (Instruct)Qwen: Qwen2.5 72B (Instruct)Perplexity: Llama 3.1 Sonar 70BPerplexity: Llama 3 Sonar 70B (Online)DeepSeek: DeepSeek V3Qwen: Qwen2.5 32B InstructDeepSeek: DeepSeek V2.5Microsoft: Phi 4Mistral: Codestral (2501)Anthropic: Claude 3.5 Sonnet (2024-10-22)Google: Gemini Flash 2.0Anthropic: Claude 3.5 Haiku (2024-10-22)Anthropic: Claude 3.7 Sonnet (2025-02-19)xAI: Grok-2 (1212)Google: Gemini Flash 1.5MiniMax: MiniMax-01Qwen: Qwen-PlusAnthropic: Claude 3.5 Sonnet (2024-06-20)Anthropic: Claude 3.7 Sonnet (Thinking)Qwen: Qwen-MaxQwen: Qwen2.5 Coder 32B (Instruct)OpenAI: o1-mini (2024-09-12)OpenAI: GPT-4o-mini (2024-07-18)Google: Gemini 2.0 Flash LiteOpenAI: o3-mini (2025-01-31)(reasoning_effort=high)OpenAI: o3-mini (2025-01-31)(reasoning_effort=low)OpenAI: o3-mini (2025-01-31)(reasoning_effort=medium)OpenAI: GPT-4o (2024-11-20)OpenAI: o1-preview (2024-09-12)9.00%9.42%9.70%14.10%20.17%21.29%23.73%26.25%26.25%27.22%27.38%27.70%27.73%27.78%28.00%28.17%28.56%28.75%29.12%29.33%29.40%30.82%31.37%31.79%34.68%35.29%37.00%39.09%39.44%39.78%40.65%42.88%44.45%45.28%45.66%48.28%48.54%49.49%49.69%50.77%51.05%51.48%52.87%54.43%57.54%58.07%59.12%59.25%59.29%63.19%64.10%64.17%67.56%67.93%71.53%71.91%71.94%73.17%73.36%73.51%73.57%73.91%74.25%75.48%75.53%76.31%77.12%77.35%79.41%80.06%80.25%81.76%82.59%82.85%82.86%83.03%83.25%83.55%83.78%85.15%85.21%85.22%85.56%85.88%86.00%86.63%86.70%88.42%88.44%88.78%89.85%89.93%90.35%90.58%90.59%91.10%91.43%91.53%92.57%92.68%92.95%93.44%94.18%94.19%95.11%95.47%95.55%