VMware announced a new open source project, Serengeti, to enable
enterprises to quickly deploy, manage and scale Apache Hadoop in
virtual and cloud environments. In addition, VMware is working with
the Apache Hadoop community to contribute extensions that will make
key components “virtualization-aware” to support
elastic scaling and further improve Hadoop performance in virtual
environments.
“Apache Hadoop has the potential to transform business by
allowing enterprises to harness very large amounts of data for
competitive advantage,” said Jerry Chen, Vice President,
Cloud and Application Services, VMware. “It represents one
dimension of a sweeping change that is taking place in
applications, and enterprises are looking for ways to incorporate
these new technologies into their portfolios. VMware is working
with the Apache Hadoop community to allow enterprise IT to deploy
and manage Hadoop easily in their virtual and cloud
environments.”
Apache Hadoop is emerging as the de facto standard for Big Data
processing, however, deployment and operational complexity, the
need for dedicated hardware, and concerns about security and
service level assurance prevent many enterprises from leveraging
the power of Hadoop. By decoupling Apache Hadoop nodes from the
underlying physical infrastructure, VMware can bring the benefits
of cloud infrastructure – rapid deployment,
high-availability, optimal resource utilization, elasticity, and
secure multi-tenancy to Hadoop.
Available for free download under the Apache 2.0 license,
Serengeti is a “one-click” deployment toolkit that
allows enterprises to leverage the VMware vSphere platform to
deploy a highly available Apache Hadoop cluster in minutes,
including common Hadoop components like Apache Pig and Apache Hive.
As per the company, by using Serengeti to run Hadoop on VMware
vSphere, enterprises can easily leverage the high-availability,
fault tolerance and live migration capabilities of the
world’s most trusted, widely deployed virtualization platform
to enable the availability and manageability of Hadoop
clusters.
“Hadoop must become friendly with the technologies and
practices of enterprise IT if it is to become a first-class citizen
within enterprise IT infrastructure. The resource-intensive nature
of large Big Data clusters make virtualization an important piece
that Hadoop must accommodate,” said Tony Baer, Principal
Analyst at OVUM. “VMware’s involvement with the Apache
Hadoop project and its new Serengeti Apache project are critical
moves that could provide enterprises the flexibility that they will
need when it comes to prototyping and deploying Hadoop.”
To further simplify and speed enterprise use of Apache Hadoop,
VMware is working with the Apache Hadoop community to contribute
changes to the Hadoop Distributed File System (HDFS) and Hadoop
MapReduce projects to make them “virtualization-aware,”
so that data and compute jobs can be optimally distributed across a
virtual infrastructure. These changes will enable enterprises to
achieve a more elastic, secure and high available Hadoop
cluster.
VMware is also announcing updates to Spring for Apache Hadoop,
an open source project first launched in February of 2012 to make
it easy for enterprise developers to build distributed processing
solutions with Apache Hadoop. These updates allow Spring developers
to easily build enterprise applications that integrate with the
HBase database, the Cascading library, and Hadoop security.
Together, these projects and contributions are designed to help
accelerate Hadoop adoption and enable enterprises to leverage Big
Data analytics applications, such as Cetas, to obtain real-time,
intelligent insight into large quantities of data.