A study by analyst IDC shows how companies are using the open source Hadoop big data analytics systems alongside other systems to get value out of their data.
IDC’s “Trends in Enterprise Hadoop Deployments” report, commissioned by Red Hat, found that 32 percent of companies questioned had deployed Hadoop. An additional 31 percent said they had plans to deploy Hadoop within 12 months, and 36 percent said their Hadoop deployment schedule could go beyond 12 months.
The study found that enterprises are combining Hadoop with other databases for big data analysis. Nearly 39 percent of respondents said they use NoSQL databases like HBase, Cassandra and MongoDB, while nearly 36 percent said they used MPP databases like Greenplum and Vertica.
“This situation underscores the importance of causality and correlation, in which traditional structured data sets are analysed in conjunction with unstructured data from newer sources,” the report says.
The report confirms the point made by Facebook analytics chief Ken Rudin earlier this week when he told a New York conference that Hadoop was not enough for organisations looking to exploit big data.
The IDC study shows the various ways companies are using Hadoop. These include the analysis of raw data, whether it is operations data, data from machines or devices, point of sale systems or customer behavioral data gathered from ecommerce or retail systems.
Some 39 percent of respondents said they use Hadoop for “service innovation”, which includes the analysis of secondary data sets for modeling of “if-then” scenarios for products and services.
Some of the less popular use cases for Hadoop include its deployment as a platform for non-analytic workloads, for example, in conjunction with a SQL overlay for OLTP (online transaction processing) working.
As a result, said IDC, enterprises are looking to alternative persistent storage systems. According to the report, “File systems like IBM’s Global File system (GPFS), Red Hat Storage (GlusterFS), EMC Isilon OneFS and others that have earned a reputation for their robust scale-out capabilities, are clearly preferred as alternatives to HDFS (Hadoop Distributed File System).”
The survey also found that most enterprises process big data both before and after Hadoop processing. “This highlights another attractive feature of other storage alternatives, including the ability to keep the data in native POSIX format and use traditional analysis tools,” said IDC.