zhouyang.xie bdb109bdba 换用github jwjohns/unsloth-GRPO-qwen2.5 验证GRPO训练模型 преди 4 месеца
..
conf_train.yaml bdb109bdba 换用github jwjohns/unsloth-GRPO-qwen2.5 验证GRPO训练模型 преди 4 месеца