Skip to content

Qwen-MOE finetune results in mixed language tokens (language switching in generated answers) #5587

@chapter544

Description

@chapter544

Hi,
When I use megatron-swift to LORA finetune Qwen3-MOE-30B-3AB, upon converting the merged mcore back to huggingface format, the model tends to generate mixed languages tokens in the answer. For example, when I expect an English answer, then most of the tokens are in English, but there are also some Chinese tokens generated. The same observation is true if I expect a Chinese answer, I also see English and other languages' tokens mixed in the answer. If I use the google translate to translate these mixed tokens, they do have the "correct" in the meaning, but just in different languages.

For the LORA training, I just finetuned the linear-layers, NO training on the embedding and output layer. Flags that I used in the training script:

  • method: SFT
  • packing = true
  • padding_free = true

I have a few thoughts in mind, but I wonder if you have any insights:

  1. Could the "packing=true" is the issue?
  2. Conversations between mcore and huggingface format??
  3. Should I enable training the embedding and the output layers in order to minimize the language-switching issue?

Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions