my timesThe Korea Times

National AI model project faces controversies over plagiarism claims

Listen

Debate stirs over defining 'development from scratch'

People visit Naver Cloud's booth during a presentation event for the national artificial intelligence (AI) foundation model project at Coex in southern Seoul, Dec. 30, 2025. Yonhap

People visit Naver Cloud's booth during a presentation event for the national artificial intelligence (AI) foundation model project at Coex in southern Seoul, Dec. 30, 2025. Yonhap

The national artificial intelligence (AI) foundation model project, promoted as a major step towards the country’s AI sovereignty, has hit unexpected turbulence with allegations that its leading contenders borrowed key components from Chinese models, triggering debate over defining "development from scratch."

Two of five consortia, led by Naver Cloud and Upstage, have been embroiled in controversy over meeting the project’s essential from-scratch requirement.

Naver Cloud came under fire when claims surfaced in developer communities earlier this week that its flagship model for the project, HyperCLOVA X SEED 32B Think, shows striking similarities to Alibaba’s open-sourced Qwen 2.4 large language model (LLM) in its vision encoder. A vision encoder is a component that processes images and video into data that AI can understand.

The cosine similarity and Pearson correlation, the two widely used comparison methods, between the two models’ vision encoder values exceeded 99.5 percent and 98.9 percent, suggesting near-identical patterns.

Naver Cloud acknowledged that it used outside open-source modules but denied accusations that it copied Alibaba’s model, claiming it was a strategic, engineering decision. The company said it chose to “adopt a verified external encoder to optimize compatibility with the global technology ecosystem and enhance system efficiency,” while emphasizing that the model’s core reasoning engine was built entirely in-house.

The company further argued that its true innovation lies in integrating multimodal capabilities from text, audio and visuals into a unified architecture, noting that the decision was not due to a lack of its technological capability. “Naver possesses its own original visual technologies such as Vuclip,” it added.

The controversy intensified after the company published a technical report of HyperCLOVA X 8B Omni model on the global open-access archive, arXiv, revealing that the model’s vision encoder uses Alibaba’s Qwen2.5-VL architecture, while its audio encoder is based on OpenAI’s Whisper model.

Naver Cloud emphasized that vision encoders merely serve as a function to convert images and video, arguing that the foundation model itself, which is responsible for reasoning and identity, remains fully proprietary.

Upstage CEO Kim Sung-hoon speaks during the company's press conference in central Seoul, April 16, 2025. Courtesy of Upstage

Upstage CEO Kim Sung-hoon speaks during the company's press conference in central Seoul, April 16, 2025. Courtesy of Upstage

A similar controversy previously surrounded Upstage last week with its Solar Open 100B model. Sionic AI CEO Ko Suk-hyun alleged that Upstage’s model reused elements of Chinese Zhipu’s GLM-4.5-Air, citing a 96.8 percent similarity in LayerNorm parameters between the two models. LayerNorm parameters are settings inside an AI model that keep data values balanced and stable as information flows through each layer, helping the model learn and make predictions reliably.

Upstage immediately refuted the allegations by publicly disclosing its development process and conducting a verification session with experts streamed online on Friday.

The company’s CEO Kim Sung-hoon argued that the similarities were statistically insignificant, saying its model was developed from scratch through a fully independent pipeline of data collection, architecture design, training and tuning.

During the verification session, he showed that the overlapping section represented only 0.0004 percent of the entire network.

While the Upstage controversy was quickly put to rest with Ko publicly apologizing, scrutiny remains intense for Naver Cloud as the Ministry of Science and ICT prepares to complete its first round of evaluations on Jan. 15 to decide which of the five consortia will be first to be eliminated from the project.