zhouyang.xie 98d070b8c5 换用github jwjohns/unsloth-GRPO-qwen2.5 验证GRPO训练模型 il y a 2 mois
..
conf_train.yaml 98d070b8c5 换用github jwjohns/unsloth-GRPO-qwen2.5 验证GRPO训练模型 il y a 2 mois