Full fine-tuning recipe: DPO on Llama 3.1 70B via Unsloth, targeting single RTX 3090 (24GB), with data mix and eval plan.
Full fine-tuning recipe: DPO on Mistral Small 3 via LitGPT, targeting 2x RTX 4090, with data mix and eval plan.
Full fine-tuning recipe: DPO on Qwen 2.5 7B via torchtune, targeting 2x RTX 4090, with data mix and eval plan.
Full fine-tuning recipe: DPO on Qwen 2.5 32B via Unsloth, targeting AWS g5.12xlarge, with data mix and eval plan.
Full fine-tuning recipe: DPO on Gemma 2 9B via OpenRLHF, targeting AWS g5.12xlarge, with data mix and eval plan.
Full fine-tuning recipe: DPO on Gemma 2 27B via DeepSpeed, targeting AWS g5.12xlarge, with data mix and eval plan.
Full fine-tuning recipe: DPO on Phi-4 via Hugging Face TRL, targeting AWS p4d.24xlarge, with data mix and eval plan.
Full fine-tuning recipe: DPO on DeepSeek-V3 base via Megatron-LM, targeting AWS p4d.24xlarge, with data mix and eval plan.
Full fine-tuning recipe: DPO on Mixtral 8x22B via DeepSpeed, targeting Lambda Labs 8xH100, with data mix and eval plan.
Full fine-tuning recipe: DPO on Yi 1.5 34B via Hugging Face TRL, targeting Lambda Labs 8xH100, with data mix and eval plan.
Full fine-tuning recipe: DPO on Llama 3.1 8B via Megatron-LM, targeting Lambda Labs 8xH100, with data mix and eval plan.
Full fine-tuning recipe: DPO on Llama 3.1 70B via FSDP, targeting single A100 80GB, with data mix and eval plan.