EMC Corporation today announced a comprehensive strategy for distributing, integrating and supporting the Apache Hadoop open-source software used for data-intensive distributed applications. The company introduced a high-performance, data co-processing Hadoop appliance — the GreenplumHD Data Computing Appliance. The appliance marries Hadoop with the EMC Greenplum Database, allowing the co-processing of both structured and unstructured data within a single, seamless solution.
In addition, EMC announced the availability of the Hadoop-based EMC Greenplum HD Community Edition and EMC Greenplum HD Enterprise Edition software. Combined with product certification by a dozen leading partners, these will enable technology innovations such as real-time data interaction, offer greater reliability, and make Hadoop much easier to deploy and use.
Both announcements were made at EMC World 2011, which will take place all of this week at Las Vegas.
Apache Hadoop has rapidly emerged as the preferred solution for Big Data analytics across unstructured data. Organisations looking for opportunity in an ever-changing business environment are finding that Big Data analysis is the competitive advantage. Hadoop-based batch processing of unstructured and structured data at massive scale using commodity hardware has led to a profound change in analytics. By extracting the knowledge wrapped within unstructured machine-generated data, organisations can make better decisions that drive revenue, improve service and reduce costs.
According to EMC, the Greenplum HD product family enables an organization to take advantage of Big Data analytics without the overhead and complexity that comes with the cumbersome tools and solutions on the market today. Available in two editions — Community and Enterprise —Greenplum HD software provides a complete platform including installation, training, global support and value add beyond simple packaging of the Apache distribution.
EMC claims that Apache Hadoop is seamlessly integrated with the Greenplum database in the GreenplumHD Data Computing Appliance. The solution supports Hadoopexternal tables, thereby enabling users to access data residing on the Hadoop Distributed File System (HDFS) without materialising the data. Administrators can read and write files in parallel from Greenplum to HDFS, enabling rapid and simple data sharing. Cross-platform analysis can be performed using the power of Greenplum SQL and advanced analytic functions accessing data on HDFS. The combined solution delivers the industry’s only complete Big Data Analytics Platform.
The Enterprise Edition is a 100% interface-compatible implementation of the Apache Hadoop stack. By maintaining Hadoop interface compatibility, the Enterprise Edition provides seamless application portability while delivering advanced features required by larger organisations.
The Community Edition is a 100% open source certified and supported version of the Apache Hadoop stack comprising HDFS, MapReduce, Zookeeper, Hive and HBase. EMC Greenplum provides fault tolerance for the Name Node and Job Tracker, both single points of failure in standard Hadoop implementations.
In addition to its Hadoop offerings, EMC has created an ecosystem with twelve companies offering business intelligence, data transfer and other technology capabilities. These companies are Concurrent, CSC, Datameer, Informatica, Jaspersoft, Karmasphere, Microstrategy, Pentaho, SAS, SnapLogic, Talend, and VMware. This breadth of support is testament to the value EMC brings to Hadoop. Technology companies and enterprises can now extend the trust they have in EMC to the open source data analytics tool.
EMC Global Services has developed a series of professional services, support and training for data warehousing and business analytics, including a new Enterprise Business Analytics Assessment Service to review and understand data and its role across an organisation, its processes and technology. EMC professionals will help customers deploy and optimise the new Greenplum Data Computing Appliance and design an environment for complex correlation across massive data sets. In addition, EMC will assist data migration and consolidation requirements from their Oracle, Teradata and other existing database systems onto the Greenplum Data Computing Appliance.