Fairness dispute clouds national AI foundation model project

Minister of Science and ICT Bae Kyung-hoon speaks during a presentation for the national artificial intelligence foundation model project at Coex in Seoul, Dec. 30, 2025. Yonhap

By Lee Gyu-lee

Published Jan 14, 2026 3:49 PM KST
Updated Jan 14, 2026 5:01 PM KST

Debates over ‘from-scratch’ criteria and benchmark rules raises fairness concerns ahead of 1st elimination announcement

As the government prepares to announce the results of the first-round evaluation of its national artificial intelligence (AI) foundation model project on Thursday, debate continues to intensify over what constitutes “from-scratch” development and how fairness can be ensured in the evaluation process.

The Ministry of Science and ICT is expected to eliminate one of the five participating consortia — Upstage, SK Telecom, Naver Cloud, LG AI Research and NC AI — based on model performance and development originality.

The project aims to establish a sovereign foundation model free from excessive reliance on foreign AI technology. However, after the five teams unveiled their models in a presentation last month, it is now facing disputes over evaluation standards, particularly regarding the definition of the project’s essential requirements and the benchmarking system used to compare the competing models.

The controversy began when Upstage’s Solar-Open-100B model was accused of reusing inference code from Chinese firm Zhipu’s AI model. Similar allegations later surfaced against SK Telecom’s A.X K1 and the Chinese DeepSeek model’s inference code.

Naver Cloud also came under fire for partially adopting vision and audio encoders, along with pretrained weights, from Alibaba’s open-sourced Qwen model in developing its HyperCLOVA X Seed 32B Think and HyperCLOVA X 8B Omni models.

At the heart of the debate is the interpretation of “from-scratch,” which will determine how much external technology can be incorporated without compromising a model’s independence.

Upstage and SK Telecom both argued that using open-source inference code — software that runs a trained model on new inputs to produce outputs — does not affect the originality of their models, as it is distinct from the training code.

Naver Cloud, on the other hand, acknowledged using external encoders but said it was a “strategic” decision to enhance efficiency and global compatibility rather than undermining technological sovereignty.

Another fairness issue has emerged over the performance evaluation process itself, which uses benchmark testing, or a standardized evaluation of AI models using consistent tasks, datasets and metrics to objectively assess a model’s capability.

People visit Naver Cloud's booth during a presentation event for the national artificial intelligence foundation model project at Coex in southern Seoul, Dec. 30, 2025. Yonhap

In addition to a common benchmark applied to all teams, the government reportedly allowed each team to select two additional benchmarks tailored to its model. While this was intended to accommodate diversity in model design — particularly Naver Cloud’s multimodal model, which handles text, image and audio — this mixed evaluation structure has raised concerns that it could benefit certain models by allowing them to choose benchmarks that highlight their strengths.

Naver reportedly selected TextVQA and DocVQA benchmarks, which measure an AI’s ability to recognize text in images and documents, both of which rely heavily on the model’s vision encoder, the same module now under scrutiny for its use of Alibaba’s pretrained weights.

Despite its emphasis on originality across the entire development process, from data collection and model architecture to training, the ministry has not clearly defined whether the use of external inference code or pretrained encoders violates that principle, prolonging the controversy.

With the ministry set to announce the first team to be eliminated from the project, it is expected to clarify the criteria for what qualifies as “from-scratch” development and how benchmark weighting was determined.

The National IT Industry Promotion Agency, which oversees the project, said on Wednesday it plans to finalize the selection of two teams by the end of the year. However, without clear standards, debates over fairness and model independence are unlikely to subside even after this week’s announcement.

Fairness dispute clouds national AI foundation model project

Debates over ‘from-scratch’ criteria and benchmark rules raises fairness concerns ahead of 1st elimination announcement

Interesting contents

Recommended Contents For You