
Lee Hyeon-jun, a senior at Kookmin University’s department of AI, big data and management, poses before presenting a co-authored paper at the 29th Annual Conference on Artificial Intelligence and Statistics held in Tangier, Morocco, in early May. Courtesy of Kookmin University
A Kookmin University research team presented a paper at the 29th Annual Conference on Artificial Intelligence and Statistics (AISTATS) held in Morocco, the school said Wednesday.
The team, led by assistant professor Kim Jang-ho of the College of Computer Science, includes Chung Jin-woo, a master’s student, and two undergraduate students, Lee Hyeon-jun and Jo Hyeon-sik, both from the department of AI, big data and management.
The school highlighted the significance of presenting the study at AISTATS 2026, which took place in the Moroccan city of Tangier from May 2-5.
Since its inception in 1985, AISTATS has been regarded as a premier international conference covering both theoretical and applied research in the field of artificial intelligence (AI).
The university said the paper, titled “SQuaT: Self-Supervised Knowledge Distillation via Student-Aware Quantized Teacher Features,” proposes a novel method for improving the performance of quantized models without requiring training labels.
A quantized model is a machine learning or deep learning model that uses lower-precision numerical representations for its weights and computations in order to reduce memory usage and improve computational efficiency. A training label is the correct answer or target value that a machine learning model uses during training.
The method, called Student-Aware Quantized Teacher Features (SQuaT), aligns the intermediate features of a high-precision teacher model to the representational range of a quantized student model by taking into account the range that the student model can express under quantization.
The research team pointed out that conventional label-free quantization-aware training (QAT) methods have primarily relied on logit-based knowledge distillation (KD), limiting their ability to fully exploit intermediate feature information.
In contrast, approaches that use feature-level knowledge distillation suffer from a different issue: The discrepancy in value ranges between the high-precision teacher model and the low-bit student model creates learning targets that are practically unattainable.
Knowledge distillation enables a smaller student model to learn from a larger teacher model, while feature-level refers to information or representations extracted from the intermediate layers of a neural network, rather than only the final output.
To address those problems, the team developed a student-aware projection method that leverages the quantization parameters of the student model to project the intermediate features of the teacher model into a quantized space that the student can actually represent.
Through this approach, the team reduced the feature mismatch between the teacher and student models and demonstrated that more stable quantization-aware training is possible even in label-free settings.
“Through comprehensive experiments across diverse settings, we demonstrate that SQuaT consistently outperforms strong baselines, with particularly pronounced gains in extreme low-bit (e.g., 1- and 2-bit) settings,” the team stated in the paper.
The university said the study is significant in that it improves the performance of lightweight AI models while reducing the need for additional labeling costs.
The team has publicly released the SQuaT code so that it can be utilized in various studies on quantization and knowledge distillation. It said it plans to continue research on the development of efficient AI models in the future.