DevQualityEval v1.0https://github.com/symflower/eval-dev-qualityOverall chattiness0150300450600Characters per point scored (lower is better)DeepSeek: R1 Distill Qwen 1.5BMistral: Mixtral 8x7B (Base) (v0.1)Meta: Llama 3.2 1B (Instruct)Microsoft: WizardLM-2 7BCohere: CommandMicrosoft: Phi-3 Medium (Instruct) (128K)Meta: Llama 3.2 3B (Instruct)Liquid: LFM 3BMicrosoft: Phi-3 Mini (Instruct) (128K)NousResearch: Hermes 13BXWin-LM: Xwin 70BDeepSeek: DeepSeek R1 Distill Llama 70BMicrosoft: Phi-3.5 Mini (Instruct) (128K)Mistral: Ministral 8BLiquid: LFM 7BPerplexity: Llama 3.1 Sonar 8BMistral: Mistral Tiny (v0.3)NousResearch: Hermes 2 Mixtral 8x7B (DPO)Mistral: Mistral 7B (Instruct)Mistral: Mistral 7B (Instruct) (v0.3)DeepSeek: DeepSeek R1 Distill Qwen 14BCognitive Computations: Dolphin 2.6 Mixtral 8x7BQwen: QwQ 32BMeta: Llama 3.1 8B (Instruct)Liquid: LFM MoE 40BMistral: Ministral 3BCohere: Command R (03-2024)OpenChat: OpenChat 3.5 7BJon Durbin: Airoboros 70BMeta: Llama 3 8B (Instruct)Qwen: Qwen 2 7B (Instruct)Cohere: Command R7B (12-2024)Google: Gemini Pro 1.5Microsoft: WizardLM-2 8x22BTeknium: OpenHermes 2.5 Mistral 7BNousResearch: Hermes 2 Pro - Llama-3 8BCohere: Command R (08-2024)Amazon: Nova Micro 1.0Mistral: Mistral NeMo (v24.07)DeepSeek: DeepSeek R1 Distill Qwen 32BNousResearch: Hermes 3 70B (Instruct)Google: Gemini Flash 1.5 8BCohere: Command R+ (04-2024)AI21: Jamba 1.5 LargeMistral: Mixtral 8x7B (Instruct) (v0.1)Mistral: Mistral MediumCognitive Computations: Dolphin 2.9.2 Mixtral8x22BNVIDIA: Llama 3.1 Nemotron 70B (Instruct)AionLabs: Aion-1.0-MiniCohere: Command R+ (08-2024)Mistral: Codestral MambaMistral: Mixtral 8x22B (Instruct) (v0.1)NousResearch: Hermes 3 405B (Instruct)Google: Gemma 2 27BMicrosoft: Phi 4Amazon: Nova Lite 1.0MiniMax: MiniMax-01Meta: Llama 3.3 70B (Instruct)Mistral: Mistral Small 3Qwen: Qwen2.5 7B (Instruct)Mistral: Pixtral 12B (v2409)Google: Gemma 2 9BPerplexity: Llama 3.1 Sonar 70BMistral: Mistral Small (v24.02)Databricks: DBRX 132B (Instruct)Anthropic: Claude 3 SonnetPerplexity: Llama 3 Sonar 70B (Online)Google: Gemini Flash 2.0Meta: Llama 3.1 70B (Instruct)Meta: Llama 3.1 405B (Instruct)Meta: Llama 3 70B (Instruct)AionLabs: Aion-1.0Amazon: Nova Pro 1.0Qwen: Qwen-Turbo (2024-11-01)OpenAI: o3-mini (2025-01-31)(reasoning_effort=medium)OpenAI: o3-mini (2025-01-31)(reasoning_effort=low)AI21: Jamba-InstructOpenAI: o3-mini (2025-01-31)(reasoning_effort=high)Qwen: Qwen 2 72B (Instruct)01.AI: Yi LargeGoogle: Gemini Flash 1.5AI21: Jamba 1.5 MiniOpenAI: o1-preview (2024-09-12)Anthropic: Claude 3 HaikuDeepSeek: DeepSeek V2.5Anthropic: Claude 3.5 Haiku (2024-10-22)OpenAI: GPT-4o-mini (2024-07-18)Mistral: Pixtral Large (2411)Anthropic: Claude 3.7 Sonnet (Thinking)Qwen: Qwen-PlusMistral: Codestral (2501)Mistral: Mistral Large 2 (2411)Anthropic: Claude 3.7 Sonnet (2025-02-19)Google: Gemini 2.0 Flash LiteAnthropic: Claude 3.5 Sonnet (2024-06-20)Qwen: Qwen-MaxQwen: Qwen2.5 72B (Instruct)Qwen: Qwen2.5 Coder 32B (Instruct)OpenAI: o1-mini (2024-09-12)xAI: Grok-2 (1212)Mistral: Mistral Large 2 (2407)DeepSeek: DeepSeek V3Qwen: Qwen2.5 32B InstructDeepSeek: DeepSeek R1Anthropic: Claude 3 OpusOpenAI: GPT-4o (2024-11-20)Anthropic: Claude 3.5 Sonnet (2024-10-22)598.10363.89126.77115.76106.3296.5684.8384.4079.0875.1072.4370.6767.4959.4656.3852.9652.4751.3551.2250.7248.6748.4548.2343.1741.2539.6638.9838.5838.4637.4035.9735.0434.0130.5028.8728.1027.5727.4526.3124.7524.7324.6624.4024.1824.0624.0023.8523.1723.0422.4922.2822.2221.9920.7020.4320.2919.5718.9918.9318.0517.8317.8117.3617.2717.2217.2017.1516.8316.5616.4616.2215.7915.6515.2415.2215.2015.1314.9314.6714.4914.4314.3614.1414.0814.0413.9813.7013.6013.5813.5813.5213.4613.4113.3413.2413.1613.1413.0712.8112.6712.6112.4612.3012.0612.0311.5110.33