Qwen-MOE finetune results in mixed language tokens (language switching in generated answers)

Hi,
When I use megatron-swift to LORA finetune Qwen3-MOE-30B-3AB, upon converting the merged mcore back to huggingface format, the model tends to generate mixed languages tokens in the answer. For example, when I expect an English answer, then most of the tokens are in English, but there are also some Chinese tokens generated. The same observation is true if I expect a Chinese answer, I also see English and other languages' tokens mixed in the answer. If I use the google translate to translate these mixed tokens, they do have the "correct" in the meaning, but just in different languages.

For the LORA training, I just finetuned the linear-layers, NO training on the embedding and output layer. Flags that I used in the training script:
- method: SFT
- packing = true
- padding_free = true

I have a few thoughts in mind, but I wonder if you have any insights:
1. Could the "packing=true" is the issue? 
2. Conversations between mcore and huggingface format?? 
3. Should I enable training the embedding and the output layers in order to minimize the language-switching issue?

Thanks.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qwen-MOE finetune results in mixed language tokens (language switching in generated answers) #5587

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qwen-MOE finetune results in mixed language tokens (language switching in generated answers) #5587

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions