LLMs may spontaneously switch to Chinese mid-reasoning regardless of prompt language — observed in both OpenAI's o1 and Chinese models (DeepSeek, Qwen, GLM)
The papers listed below list 3 possible reasons for this: internal circuit competition, strategic reasoning advantages gained during training, and the influence of distributed training data.
Mechanistic interpretability research suggests that multilingual LLMs possess two distinct internal subsystems that govern generation: