alt
2012-05-20 16:56

Making sense of Big Data



Data science and analysis are key skills in need

By Kang Ye-won

Big Data has been one of the top buzz words this year. Yet, except for a handful of global platform providers such as Google, Facebook, Apple and Amazon, most companies haven’t quite figured out how to utilize the massive amount of data streaming on a daily basis.

Almost 73 percent of companies increased collecting data in their organization but 53 percent of them said they use only half of this, according to SAS Institute, a business-to-business (B2B) software provider.

So what is at the core of the Big Data fanfare?

It’s about the technology to deal with Big Data that matters. Because in the smart era, anything in our daily lives can be recorded as data from what we chat about and text on our cellphones to where we eat lunch during the day to what transportation we take on commutes and what we do while on the move.

And if smart technologies such as smartphones are to play the role of our senses to perceive information from the slew of data: Big Data technology is like our brain making sense of the information.

Insiders describe Big Data has characteristics of three V’s — volume, velocity and variety.

And the skills on demand are abilities to access and extract key information out of the data deluge, analyze and update the real-time streaming of data, and visualize the outcome for better understanding and application.

There is a global shortage of so-called “data scientists” and there lies future opportunities, experts say.



Defining Big Data

Although there’s no specific threshold in defining Big Data, the size of data has grown from hundreds of gigabytes, which is about a laptop’s storage size, now to terabytes — thousands of gigabytes — to petabytes and even up to zetabytes.

To put the numbers in perspective, Facebook’s 845 million users’ log data amounts of more than 25 terabytes every day. Twitter generates more than 12 terabytes of tweeting data each day, according to Lee Ji-eun, a manager at IBM Korea.

Also the form of today’s data is not structured as in Excel files but rather unstructured in shapes ranging from the 140-character tweets to mobile photos to YouTube videos.

Despite the buzz, Big Data is not a new concept, at least in finance, said Jim Nelms, the chief information security officer (CISO) at the World Bank, during a Finance IT World conference held by IDG in Seoul.

Financial institutions such as banks and large retailers have churned out volumes of data on transactions and customer profiles.

“The biggest paradigm shift is that it’s fast and accessible to many more people,” Nelms said.

“The question is how we capture the capabilities we have and harness them for profits,” he said.

Although no one has quite figured out how to fully monetize Big Data, the possibilities are infinite.

For instance, the potential value of data in the U.S. healthcare sector was estimated at $300 billion every year, two-thirds of which come from a reduction of national expenditure by about 8 percent, according to a report by McKinsey Global Institute in May, 2011.

In retail, up to a 60 percent jump in net margins was expected and manufacturers can cut nearly half of product development and assembly costs.

In technology, the potential value shoots up exponentially. Personal location data alone was projected to lead to more than $100 billion in revenue for service providers and up to $700 billion for end users.



Global race

The biggest utilizers of Big Data have been the B2B software providers such as IBM, Oracle and EMC.

IBM’s Watson, a supercomputer that won Jeopardy! last year is a good example of using Big Data technology. It stored 200 million pages of text in its memory and included advanced search and analysis programs, which enabled it to go from data digging to answering within three seconds, to ultimately beat the other two former champions.

These large solution providers have been aggressively buying smaller firms specialized in Big Data. This April, IBM acquired Vivisimo, a Pittsburgh-based firm that helps organizations access and analyze Big Data.

Another B2B solution provider is MicroStrategy, a Washington D.C.-based company that focuses on mobile intelligence.

“As our competitors have been acquired by bigger players, (MicroStrategy) is one of a few left in the field,” said Chung Kyung-whu, a senior sales engineer at MicroStrategy in Korea.

Following the social media trend, MicroStrategy recently invested in analyzing Facebook data collected from mobile and tablet users. For clients, including clothing brand Guess and professional sports teams, including FC Barcelona and the Washington Capitals, the firm runs apps that collect Facebook fans’ personal information including gender, age, marital status and geographic location as well as income and education level, political stance, and likes and dislikes.

But the company hasn’t been able to connect the dots on how to monetize the data and analysis — that is the challenge for other data analytics.

“It’s been about a year since that we started to dig into social media data, and in eight out of 10 cases, no insights were found. ... most companies are at a stage of market sensing and trawling tweets at most,” Chung said.

Experts say the real business model lies in the “intelligent platform” such as an ecosystem built by Google. Most recently, the search giant, which has focused on providing consumer-based free services, released BigQuery, a cloud-based data analytics tool for corporate customers, which can scan terabytes of data in seconds.

Google’s platform has gradually expanded the field of its data in industries from media and healthcare to game and local traffic, according to Moon Byoung-soon, a researcher at the LG Economic Research Institute (LGERI), in a memo. And in order to succeed, Moon said Google had to master both software and hardware.

“Hardware is a key component in terms of distributing services, and the user information collected from them is more precious than anything else,” he said.

Similar to Google, other platform giants Facebook, Amazon and Apple have all accumulated stacks of user information locked into their own services, which have a mounting potential for Big Data business.

Just last Friday, Facebook’s most anticipated IPO was set at $38 a share to give the company a valuation of $104 billion, the largest gain by an American company. Despite some investors’ worries over the firm’s not-yet-so clear strategy on monetization models, the raging interest in the firm’s shares indicate the power of its 845 million users’ personal data.

Local status quo

In a recent study by the Korea Communications Commission, more than half of 50 million Korean cellphone owners use smartphones. Despite the high penetration of broadband across the nation, Korea’s data only takes up about 9 percent of the global data traffic, said Moon, in a report. The local portal site providers such as NHN and telecom companies including SK, KT and LG, own torrents of data in petabyte sizes but they’re mostly limited to domestic use.

The closest Big Data local technology is SK Telecom’s T Map, a real-time GPS app on mobile and tablet devices, said Lim Byung-hwa, a senior researcher at the Korea Economic Research Institute.

One of the reasons for Korea’s lag in Big Data is the government’s strict regulations on data collection.

Korea passed the Personal Information Protection Act, which took effect last September and requires any “data handler” to provide technical security at each step where the personal information is handled. The law also mandates the data handlers to obtain individuals’ consent when the information is collected and provide the use of data. Any employers or business companies who don’t comply with the rules face up to 10 years in prison or 100 million won in fine.

Google recently stirred a controversy on privacy glitch by secretly installing cookies on iPhone users’ Web browsers and tracking their search histories. Although this is a matter of ethical debate in the U.S., it is illegal in Korea and local companies have a very limited access to collect personal information for the use of target marketing, Moon said.

“The government’s regulation needs to loosen up, too, but at the same time, companies need to invest more in protecting online privacy,” he said.

Koreans also have high resistance toward sharing their information online due to previous privacy breach cases.

“If people don’t trust the service providers and refuse to share their information, the business won’t develop further,” he said.

Another challenge in Big Data is a shortage of workers with the skills set.

The so-called data science is projected to become the next wave of opportunities. The talent is not only limited to abilities in computer science, engineering and maths but also managers and analysts, who can capture insight from large data sets, will be in demand.

The U.S. alone faces a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million experts to handle Big Data and make decisions based on their findings, according to the McKinsey report.

Security of data is another area in need.

“There’s a high deficit in security jobs,” said Nelms. As technologies evolve, positions such as chief information officers (CIOs) or CISOs are challenged to take charge in securing sensitive data in more delicately-controlled fashion.

The field technologists are expected to have not only technical degrees such as science of security or encryption, but also expertise in the type of data they deal with, whether it’s in finance, health care or manufacturing.




관련 한글 기사


빅데이터 제대로 알기

빅데이터는 2012년의 톱 키워드로 뽑힌다. 그러나 구글, 페이스북, 애플, 아마존 4개의 거대 플랫폼 회사들을 제외하고는 대부분 회사들은 매일같이 쏟아져 나오는 엄청난 양의 데이터들을 어떻게 사용할 수 있는지 잘 모르는 것이 대부분이다.

그럼 빅데이터란 정확히 무엇인가?

빅데이터를 논의할 때, 그 핵심은 그것을 이용할줄 아는 기술이 중요한 것이다. 스마트시대에 들어서면서, 우리의 일상생활은 데이터로 레코드되고있다. 예를들어, 핸드폰으로 통화하고 문자를 보내고, 어디로 점심을 먹으러 가는지, 어떤 교통수단을 사용하여 움직이는지, 그동안 온라인상 어떤 일을 하는지 등 말이다.

이런 스마트 테크놀로지를 우리가 많은 데이터 안에서 필요한 정보를 인식해내는 데 도움을 준다면, 빅데이터의 기술은 그 정보를 이해하는 우리의 뇌의 역할을 한다고 할 수 있다.

현재까지 빅데이터 기술을 잘 활용한 기업들은 IBM, Oracle, EMC 같은 B2B 소프트웨어 회사들이 주를 이루었다.

그러나 전문가들은 빅데이터의 비즈니스 모델은 소프트웨어와 하드웨어의 기술을 다 갖추어 그로인해 얻게되는 개인화된 데이터들을 가질 수 있는 “Intelligent platform” 을 갖추어야 한다고 말한다. 가장 대표적인 구글을 비롯해서, 페이스북, 아마존과 애플 모두 각각의 서비스들로 인해 축적된 개인정보들의 가능성은 엄청나다.

한국은 최근 스마트폰 사용자가 핸드폰을 소유한 사람의 반이 넘을 정도로 기계의 전파율은 높으나, LG 경제연구소의 리포트에 따르면 한국의 데이터는 세계 데이터의 겨우 9 퍼센트 정도만 차지한다.

빅데이터 시대에 한국이 뒤쳐진 이유 중에 하나는 데이터 수집에 관한 정부의 규제가 큰 역할을 했다. 최근에 한국에서 많이 벌어진 개인정보 유출 사고들로 인해 국민들이 개인정보를 수집하는 회사들에 대해 불신이 높은 것 또한 문제라고, 한국경제연구원의 임병화 박사는 말한다.

빅데이터 시대에 또 다른 과제는 필요한 능력을 가진 인력도 양성해야 할 것이다. 이것은 전세계적인 현상으로 맥킨지 보고서에 따르면 미국에서만 데이터 과학기술과 분석능력을 갖춘 인재가 현재 140,000 에서 190,000 명 정도 부족한 것으로 나타났다.


  • 1. China 'has secret plan to replace NK leader'
  • 2. Ghost camera captures underage sex
  • 3. Army deploys Surion copters
  • 4. Stars have diverse tastes for cars
  • 5. Military becomes blue-chip cultural item
  • 6. CJ hit by slush fund probe
  • 7. Female body found in singer's car; suicide suspected
  • 8. Japan's historians deny Dokdo claim
  • 9. Two men cleared of spy charges decades after guilty verdict
  • 10. First death by SFTS virus from tick bite confirmed


Copyeditors, cartoonist wanted
‘Expat citizen reporters’ wanted
Koreatimes.co.kr puts on a new dress