You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
When I use megatron-swift to LORA finetune Qwen3-MOE-30B-3AB, upon converting the merged mcore back to huggingface format, the model tends to generate mixed languages tokens in the answer. For example, when I expect an English answer, then most of the tokens are in English, but there are also some Chinese tokens generated. The same observation is true if I expect a Chinese answer, I also see English and other languages' tokens mixed in the answer. If I use the google translate to translate these mixed tokens, they do have the "correct" in the meaning, but just in different languages.
For the LORA training, I just finetuned the linear-layers, NO training on the embedding and output layer. Flags that I used in the training script:
method: SFT
packing = true
padding_free = true
I have a few thoughts in mind, but I wonder if you have any insights:
Could the "packing=true" is the issue?
Conversations between mcore and huggingface format??
Should I enable training the embedding and the output layers in order to minimize the language-switching issue?