Rigorous evaluation harness comparing the fine-tuned model against Yi 1.5 34B base, closed-source frontier, and previous checkpoint.
Rigorous evaluation harness comparing the fine-tuned model against Yi 1.5 34B base, closed-source frontier, and previous checkpoint.
Rigorous evaluation harness comparing the fine-tuned model against Yi 1.5 34B base, closed-source frontier, and previous checkpoint.
Rigorous evaluation harness comparing the fine-tuned model against Yi 1.5 34B base, closed-source frontier, and previous checkpoint.
Rigorous evaluation harness comparing the fine-tuned model against Llama 3.3 70B base, closed-source frontier, and previous checkpoint.
Rigorous evaluation harness comparing the fine-tuned model against Llama 3.3 70B base, closed-source frontier, and previous checkpoint.
Rigorous evaluation harness comparing the fine-tuned model against Llama 3.3 70B base, closed-source frontier, and previous checkpoint.
Rigorous evaluation harness comparing the fine-tuned model against Llama 3.3 70B base, closed-source frontier, and previous checkpoint.
Rigorous evaluation harness comparing the fine-tuned model against Llama 3.3 70B base, closed-source frontier, and previous checkpoint.
Rigorous evaluation harness comparing the fine-tuned model against Llama 3.3 70B base, closed-source frontier, and previous checkpoint.
Rigorous evaluation harness comparing the fine-tuned model against Llama 3.3 70B base, closed-source frontier, and previous checkpoint.
Rigorous evaluation harness comparing the fine-tuned model against Llama 3.1 8B base, closed-source frontier, and previous checkpoint.