At a big data roundtable hosted by Autonomy this week, Feldman explained that the complexity of big data technology requires an advanced skillset that is quite rare amongst IT professionals.
“There aren’t a lot of people who are very skilled in these new technologies. How are enterprises supposed to hire people if they aren’t there?” asked Feldman.
The most common technology used by companies to analyse hundreds of terabytes, or even petabytes, of unstructured data is an open-source tool called Hadoop.
Hadoop uses a process called parallel programming, which allows analytics to be run on hundreds of servers, with lots of disk drives, all at the same time. It stores this data in a file system called HDFS (Hadoop distributed file system), in effect a flat file system that can spread data across multiple disk drives and servers.
However, it is widely agreed in the industry that Hadoop is an extremely complex system to master and requires intensive developer skills.There is also a lack of an effective ecosystem and standards around the open-source offering.
“There are very few Hadoop experts around, and there are only very poor tools available for using it. You don’t only need experts that know how to master an Hadoop file system, but experts that know how to master an Hadoop file system using bad tools,” said Feldman.
Feldman urged the likes of Autonomy, EMC, Teradata and IBM to improve the tools on offer to reduce the impact of the skills crisis.
“If vendors could supply the expertise, if they could keep the software updated, then this would lessen the burden on IT departments in the enterprise,” she said.
“Integrating all of the big data pieces into a well formed architecture, so that everything can interact with everything else, that’s very difficult. Most people doing this could probably happily hire ten extra people, but they just aren’t around.”
IDC predicts a compound annual growth rate of 39.4 percent in big data hardware, software and service sales between 2010 and 2015. However, this doesn’t take into account open-source offerings, which accounts for a significant amount of enterprise use, but is difficult to measure.