
gettyimagesbank
As global tech giants race to redefine voice as the next major interface for artificial intelligence (AI), local tech and telecom firms are also accelerating efforts to secure an edge in the domain with Korean-language optimized AI services and multimodal technologies.
The rapid shift comes as global companies, including Google, Apple, Microsoft and Meta move to embed voice into AI agents and connected devices, enabling systems that can understand context, execute tasks and interact naturally with users across devices and environments.
Google recently introduced Gemini Intelligence, a new AI feature on Android powered by its Gemini AI agent technology, at its Android Show: I/O Edition earlier this month. Unlike earlier voice assistants that were largely limited to short commands such as setting alarms, the new system is designed to automate multistep tasks, including reservations, shopping and food orders across applications.
Apple is also expected to unveil a significantly upgraded version of Siri at its upcoming Worldwide Developers Conference next month. The new Siri is anticipated to function as an AI agent that can orchestrate tasks across apps while also tapping external models such as OpenAI’s ChatGPT and Google’s Gemini, effectively turning the iPhone into an AI-native platform.
The company has also made an aggressive bet on voice interfaces, acquiring Israeli voice AI startup Q.ai for nearly $2 billion in January, gaining access to technology that analyzes facial muscle movements to interpret silent speech. The deal marked Apple's second-largest acquisition ever.
The market outlook is also fueling investment momentum. According to Fortune Business Insights, the global speech recognition market is projected to grow from $23.7 billion this year to about $104 billion by 2034, representing a compound annual growth rate of 20.3 percent.

A model uses SK Telecom's AI agent service A. (Adot). Courtesy of SK Telecom
The shift marks a departure from earlier voice assistants limited to simple commands toward a new generation capable of contextual understanding and multistep task execution — capabilities that are particularly valuable in hands-free environments such as vehicles.
Against this backdrop, domestic firms are also racing to secure a foothold in the voice AI ecosystem, focusing on localized language for Korean nuances, slang and conversational contexts, and real-world deployment across devices.
Telecom operators are leading the charge. SK Telecom rolled out A. auto, an in-car version of its AI agent A.(Adot), earlier this year, deploying it into Renault Korea’s newly launched model, Filante. Powered by the company’s Korean-language large language model, A.X 4.0, the system can handle not only standard commands such as navigation and music playback but also colloquial expressions, turning the vehicle into an AI-assisted personalized space.
The company has been rapidly expanding A. beyond smartphones into a broader ecosystem, integrating the AI-powered voice guidance system across its IPTV platform Btv and navigation service TMap.
KT is targeting the home AI market through its Genie TV AI agent, which allows users to interact conversationally for news, weather, educational content and everyday information through voice commands.

A robot demonstrates LG Uplus' AI agent ixi-O at the company's booth during MWC26 in Barcelona, Spain, March 2. Joint Press Corps
Meanwhile, LG Uplus is advancing its AI call agent ixi-O, an AI-powered voice service that analyzes conversational context, tone and emotional cues in real time during a call. The service offers call transcription and summarization features while also detecting potential threats during conversations such as voice phishing scams, positioning it as both a productivity and security tool.
The company recently secured its first overseas expansion for the service with a partnership with Malaysian telecom operator Maxis, with a local launch expected later this year.
IT companies are also moving aggressively into multimodal voice AI, leveraging their existing ecosystems.

Screenshots of Kakao's voice AI-powered public service platform within KakaoTalk messenger / Courtesy of Kakao
Kakao has rolled out a beta service for its integrated multimodal AI model, Kanana-O, earlier this year. The model can simultaneously process text, voice and images and is designed specifically to improve Korean-language comprehension compared with global AI models.
According to the company, the model currently achieves the highest benchmark scores among domestic multimodal models in its size.
The company has also expanded voice functionality within its AI-powered public service platform within its flagship KakaoTalk messenger. Users can now perform tasks such as issuing official documents or booking public facilities via voice commands within KakaoTalk, eliminating the need to navigate multiple apps or interfaces.

A screenshot of Naver's AI Tab / Courtesy of Naver
Meanwhile, Naver is expanding its voice and multimodal AI capabilities through both consumer and enterprise services. It recently beta-launched its AI-powered search engine feature, AI Tab, for its premium members, allowing users to make complex conversational queries rather than simple keyword searches.
The company plans to further integrate the service with its Smart Lens image search tool to upgrade it into a full multimodal AI feature that simultaneously understands text, images and voice by year-end.
On the enterprise front, Naver has enhanced its Clova Note voice-to-text service with AI-powered automatic speaker identification to distinguish multiple participants in meetings. The company plans to enhance real-time voice recognition and summarization quality in the second half of the year, aiming to position it as an essential enterprise service.