Big data with Philip Roy, Director of Greenplum MEA, EMC

Philip Roy, Director of Greenplum MEA, EMC

1)      One of the current IT buzz words is big data, which presents an opportunity for companies to invest in order to gain the competitive advantage which results from data analytics. How big is this opportunity both regionally and globally?

Big data is certainly the topic du jour in the business and technology media, dividing critics and inspiring contradictory opinions. Dismissed by some as old hat, and hailed by others as a revolutionary new business tool for competitive advantage

The opportunity is massive, both globally and in the Middle East. Gartner forecasts that companies who will successfully implement big data solution will be average 20% more profitable than the one who don’t. This gives an idea of the opportunity.

This is driven by new technologies, new level of performance of IT systems that allow analytics to be delivered almost real time at an affordable cost. It will change the way we live and work as much as other major shifts like IP, Internet or 3G.

To put it simply: companies who will get more Intelligence out of their data will better differentiate themselves, provide more tailored services, acquire more customers and simply out-compete companies that won’t analyse their data.

Social media websites, smartphones and other consumer devices, including PCs and laptops, have allowed billions of people around the world to process huge amounts of information. The big social media platforms, which EMC supports through its backhaul IT infrastructure, are generating vast quantities of unstructured data as online conversations grow. Downloadable software applications – iPhone apps – allow people to do virtually everything from their mobile phones while applications like Google Maps generate vast quantities of transient data every day. According to IT company Cisco, there will be 10 billion mobile internet devices on the earth by 2016 for only 7.3 billion people. All this big data has consequences for how it is to be captured, stored, managed and analysed.

Digital data should concern leaders across the public and private sectors and every citizen who interfaces with technology every day. We should treat it as an opportunity to learn more about our business and public policy environments. This is where the real value lies for businesses and where forward-thinking IT professionals can prove their worth. IT managers know they face a monumental task in simply taming the deluge of data. A recent International Data Corporation (IDC) study, sponsored by EMC, projects that, to manage big data growth over the next decade enterprises will need 10 times the number of servers they have now. But there is potentially huge payback in the form of big data analytics. Big data requires a different treatment and substantially more powerful technology to manage and manipulate than ever before. Businesses and public sector organisations should invest in the appropriate technology to manage process and store exploding volumes of data.

2)      Where does the opportunity lie? In the analytics of structured and unstructured data?

The opportunity is about Analytics. Understanding what the data says and build new business and commercial strategies based on that. It doesn’t mean that Analytics will replace human decision making. It means that human decision making can be backed up by real information, either from the past or even predictions.

So it is about mining the data, looking at it differently, apply statistical method to simplify the view, apply mathematical method to build predictive or risk models.

In our vision, it is irrelevant of the type of data. Our Unified Analytic Platform is fully integrated and can manage transparently structured, unstructured and semi-structured data.

Right now, the emphases is about structured data. Since 80% of new data created is unstructured, a tipping point is coming.

EMC is driving the big data analytics market with its Greenplum and Isilon solutions. EMC is a leader in storage, security, capture, search, discovery and analysis tools for organisations, enabling them to derive real value and create revenue streams from unstructured data. The data itself provides valuable insights into customers and their relationships with companies: ones that, if extracted properly, can generate new sources of revenue.  EMC believes that full analysis of data needs to be agile, self-service, increasingly real-time, and ultimately collaborative. This is exactly what EMC is out to build with its unified analytics platform.

3)      Are vendors like EMC who provide end to end storage solutions including back-up and recovery in a good position to provide big data tools? Is big data more easily deployed as part of complete storage solutions?

Big data is beyond storage.

In our view, Explosion of Data Volume, mix of structured or unstructured data, close to real time data loading and a mix of reporting (understand the past) and analytics (predict the future) are the underlying components of big data. Whether they happen one by one or all together.

The storage is the layer beyond, along with the database, the mathematical algorithm, the networks of cores and CPU that render the intelligence. To that extend, it is IT infrastructure as usual. Vendors who understand data and can provide end to end solutions in terms of Appliances, backup and DR for instance have a clear competitive advantage.

As EMC, we’re creating scale-out storage platforms that are designed to handle big data vs. their traditional counterparts.

In 2006, Broad Institute was struggling with migration and load-balancing issues as well as the reconfiguration of their NetApp environment. They deployed Isilon and realised a 9 petabyte increase in incremental storage. Broad discontinued their investment in NetApp and increased investment in Isilon to realise greater storage growth. Today Broad has over 1 trillion files in the 9 petabyte environment they have in place with Isilon.

While 18% of respondents report they are using scale-out storage currently, 40% plan to deploy within the next 24 months and another 26% are interested in the technology. Current use is largely a function of capacity. Organisations managing at least 1 PB of storage capacity are almost twice as likely as those with less than 1 PB (28% vs. 16%) to be current users of scale-out storage technology. Usage of and interest in scale-out technology is expanding beyond traditional early adopter verticals—such as media and life sciences—to a broader range of enterprise IT users (ESG, Scale-out Storage Market Trends, December 2010).

More than half (54%) of current users said that an increase in their organisation‘s data growth rate would result in more pervasive usage of scale-out storage, while one-third of planned adopters would roll the technology out ahead of schedule to offset the effects of a reduction in IT staff. (ESG, Scale-out Storage Market Trends, December 2010).

According to Gartner, data growth will increase by 650% over the next 5 years; 80% of that being unstructured.

Gartner’s David Cuppuccio said at the 2010 Data Center Conference that the 650 percent of enterprise data growth over the next five years poses a major challenge, in part because 80 percent of the new data will be unstructured. “IT executives have to make sure data can be audited and meet regulatory and compliance objectives, while attempting to ensure that growing storage needs don’t break the bank. Technologies such as thin provisioning, deduplication and automated storage tiering can help reduce costs,” he said.

 4)      What are some of EMC’s big data specific solutions?

–       Unified Analytics Platform (UAP): designed to combine Greenplum and Hadoop analytics technology with new collaboration soft. EMC has combined its Greenplum products with new social networking software, Chorus, to ensure big data is easily accessible and clearly visible to broader numbers of staff within an organisation. UAP is meant to enable enterprises to consume big data analytics solutions in a seamless way and get productivity out of data.

Chorus was launched by Greenplum, and it will let analysts search for interesting datasets across all databases they have permission to access, whether it is Greenplum or databases from Oracle, Microsoft SQL Server or Teradata, as well as across disparate data centres.

Chorus will then allow analysts to select slices of data and put these in their own database sandbox for analysis, said EMC, adding that Chorus will update their copies of data whenever sources are updated.

Analysts can share analysis of the datasets with their colleagues using the Chorus social network tools.

Greenplum HD provides a 100 percent compliant Hadoop interface that eliminates the need to change applications developed against the Apache distribution. Examples include;

i.      MapReduce:

MapReduce is a Hadoop framework for easily writing applications that process large amounts of unstructured and structured data in parallel in a reliable and fault-tolerant manner. The framework is resilient to hardware failures, handling them transparently from user applications.

ii.      Hive

Hive is a data warehouse system for Hadoop that facilitates easy data summarisation, ad-hoc queries, and the analysis of large datasets stored in Hadoop-compatible file systems. This SQL-like interface gives users a columnar storage capability, which, along with compression, results in an improved compression ratio for storing data.

iii.      HBase

HBase is a distributed, versioned, column-oriented storage platform that delivers random real-time read/write access to big data for user applications.

Greenplum HD fault-tolerance capabilities protect users from the single point of failure inherent in Apache Hadoop. Automatic recovery from failures improves on the platform’s availability and lowers your total cost of ownership.


–       Greenplum Database is built to support big data analytics, and to manage, store, and analyse Terabytes to Petabytes of data. Users experience 10 to 100 times better performance over traditional RDBMS products – a result of Greenplum’s shared-nothing MPP architecture, high-performance parallel dataflow engine, and advanced gNet software interconnect technology.

Greenplum Database was conceived, designed, and engineered to allow customers to take advantage of large clusters of increasingly powerful, increasingly inexpensive commodity servers, storage, and Ethernet switches. Greenplum customers can gain immediate benefit from deploying the latest commodity hardware innovations.

With Greenplum and Isilon as the core technologies, they provide a rich set of capabilities to integrate and bring data to and from a broad set of data sources – such as Facebook, Twitter, LinkedIn – and build a system that allows the data science team to work together.

Companies that embrace the ‘big data’ challenge by deploying technologies to absorb, store and analyse unstructured information will be in prime position to move ahead of the pack and develop better marketplace intelligence which, ultimately, will make a difference to their bottom line.

Data analytics requires a new breed of IT professional. The data scientist, the global IT industry’s newest recruit, can uncover new marketplace trends and insights that inform the evolving business model. If companies and public sector organisations buy into the analytics space now, they will quickly turn IT into a tool to help them better understand the marketplace, developing new business opportunities and more tailored public policy responses.

5)      What role do vendors like EMC see for their distributors and systems integrators in adding value to big data offerings in the channel?

Their role is critical and the opportunity massive. In terms of strategic & business consulting, delivery and integration, architecture, data scientists, etc… this is a new business and the ecosystem of partner is already happening.

6)      Are System Integrator partners keen to take advantage of the big data opportunity in their strategies?

Again, this has little to do with storage per say. The big data appliances provide an integrated environment with network, storage, processing power and data management tools like RDMS, Hadoop, Chorus. Sizing, setup and delivery are fairly straightforward. The value is in the data management, integrating sources, Meta Data management, loading strategies, data models, analytic models, integration with legacy system like for instance upsell opportunity created from real time analytic of voice conversation in a call centre, back into the CRM so that the operator can get real time support based on how the discussion is going.

 7)      Which verticals do you foresee benefiting from big data analytics? Are they exclusively enterprise customers?

All verticals will benefit, with a phased approach in terms of adoption. Telecom and banks are already engaged. The public sector represents a massive opportunity. Some are obvious like national security or health care, others carry huge strategic importance like service to citizen, forecasting, etc…

Large companies are already pioneering big data and will build their own system since they have the investment capabilities and the skills to do it. We believe that as a second wave, SME will engage as well, but mostly via a servicepProvider model. We are already working on technology and business models for analytics in the cloud for instance.

Previous ArticleNext Article

Leave a Reply


The free newsletter covering the top industry headlines

Send this to a friend