zhouyang.xie f9dc2bb16f 换用github jwjohns/unsloth-GRPO-qwen2.5 验证GRPO训练模型 2 ماه پیش
..
conf_train.yaml f9dc2bb16f 换用github jwjohns/unsloth-GRPO-qwen2.5 验证GRPO训练模型 2 ماه پیش