EMC Transforms Hadoop Infrastructures

EMC Greenplum HD on Isilon Scale Out NASEMC is transforming Hadoop based Big Data Analytics infrastructures from one-off, build-it-yourself, science projects of the early adopters to a fully supported, proven scalable, incredibly reliable solution for the majority of Enterprise IT shops.  EMC has married it’s proven Greenplum HD distribution of Apache Hadoop with the EMC Isilon, highest performing single filesystem scale-out NAS on the planet.  The Greenplum HD appliance removes the complexity of setting up a big data analytics infrastructure, and allows businesses to focus on generating value from their unstructured data.

 

Why Hadoop?

Not all data resides in a database.  It used to be the case that computers only analyzed data about well structured back office processes.  Business Intelligence was about sorting through transactions, and demographics, and data with very well defined structure.  imageBig Data Analytics is the next “big thing” for enterprise scale business, because not only are we now able to do BI on a much more rapid, iterative, dare I say “real-time” basis, but we are able to conduct these Analytics not just on data describing peoples’ demographics, but describing and tracking peoples’ behavior.  Peoples’ behaviors are fundamentally unstructured.  To track behavior (apparently) creates an unstructured mess of xml schemas, text log files, web traffic data, etc.  Hadoop (really a combination of MapReduce framework with the Hadoop Distributed File System) provides the ability to perform analytics tasks on any relationaly structured or non-structured data.  Imagine being able to iteratively process through all of the data you have about your products, customers, market trends, twitter streams, security logs, purchase history, etc. and come up with a predictive view of potential actions your constituency might take.  You constituency may be your marketing team given customers’ likely buying decisions, your product developers given product quality improvement data, your risk managers given data about potential clients, or your security team provided real-time data about attacks in progress.

Do you like spending money on science projects?

imageThe few who are willing to bet on new tech are called Early Adopters.  The Majority wait for a more guaranteed return on investment.  Early Adopters are willing to dedicate infrastructure for one-off projects, accept single points of failure and limited disaster recoverability, sacrifice solution efficiency for quicker time to market, and maintain a specialized support workforce when normal support channels don’t exist.

Why run a Hadoop appliance with EMC Isilon and EMC Greenplum HD?

According to the Enterprise Strategy Group’s White Paper: EMC’s Enterprise Hadoop Solution: Isilon Scale-out NAS and Greenplum HD (email address required), the EMC Hadoop Solution overcomes the innate issues with home grown Hadoop projects.

  • Isilon’s OneFS operating system eliminates the single point of failure of a single NameNode within Hadoop.  The NameNode contains all of the metadata for the HDFS storage layer.  By distributing all of the metadata across every node within the Isilon cluster, every node acts as a NameNode and provides a highly available solution for mission critical workloads.
  • Isilon’s HDFS implementation streamlines data access and loading by allowing NFS, CIFS, HTTP, or FTP access to data resident on the HDFS filesystem.  Since Hadoop applications can access the data directly without the expense of copy or move operations, this saves time, cost of storage, and greatly simplifies the Analytics workflow.
  • Implementing a dedicated storage layer allows for more efficient utilization of the compute and storage resources by allowing them to expand independently.  Most Hadoop infrastructures are based on DAS inside the compute nodes preventing independent scale.
  • Implementing the EMC Greenplum Hadoop Distribution on EMC Isilon hardware provides configuration backed by EMC’s premiere customer support capabilities.  Customers can leverage their existing knowledge and experience with EMC and Isilon, and don’t have to have specialists on staff to manage the Big Data Analytics infrastructure.

Ultimately any Hadoop implementation is just a portion of the overall Big Data Analytics requirement, but it is one that has held some mystery to traditional infrastructure customers.  Take a cue from what we’re learning from the Cloud value proposition and ask yourself if your enterprise is wants to get into the Hadoop business, or do they want to extract value from Big Data Analytics.  In the end Hadoop is a tool, now you can pick up the phone and “order one.”

The Cloud tech shift to be faster than any in IT history

Cloud is the New NormalCloud is the next transformational technology in the IT world, and is arriving faster on the heels of the previous tech shift than at any other time in IT history.   Even though Cloud is the most overhyped term out there, this rapid advance will take many IT organizations and IT vendors by surprise.

Price Waterhouse Cooper authored a whitepaper showing a good summary of the ratio of IT spending to GDP.   There is approximately a 15 year separation between the efficiencies delivered by each of the UNIX, Distributed, and Virtualization technology transformations.  Between each shift to the next technology, there was an increase in IT spending above the average growth trend.  This may be due to the proliferation of the existing technology within the datacenter, and the cost of maintaining the personnel to manage the proliferation of systems.

Will VM sprawl lead to massive increases in IT spending in the next few years to bring spending back to the trend?  Not if the next major technology transformation happens quickly enough to drive additional efficiencies of doing IT.
Cloud technologies (scalable & elastic infrastructure + on and off site data and app mobility + orchestration / automation + end user portals + financial transparency, and aaS pricing) have the potential to keep the industry on a new trajectory of lower costs relative to increased productivity.  It looks like the pace of transformative innovation has increased since widespread adoption of Cloud infrastructures is already beginning to displace “mere virtualization.”

My advice?  Become a transformation agent within your organization to champion the new normal of Cloud technologies.  Cloud will transform IT.  Now is the time to get ahead of the shift, develop new skills, lead others who can’t see what’s happening.