EMC VMAX3 is a Data Services Platform

VMAX3 is now Generally Available and represents the next generation of market leading platforms from EMC that have established themselves as the most reliable and highest performing data storage platforms on the planet. Let’s examine what makes VMAX3 the first Data Services Platform in the industry and why VMAX3 is an even more revolutionary step from VMAX, as VMAX was from DMX. Along the way let’s dispel some myths about the product.

At launch VMAX3 is ready for the most mission critical workloads.  At its introduction, the original VMAX was a revolutionary platform, virtualizing the matrix interconnects between internal components and allowing for both scale-up and a scale-out expansion based entirely on Virtual Provisioning.  These were enormous leaps forward in the simplicity and scalability of Symmetrix.  EMC has developed rigorous testing, manufacturing processes, and incredible architecture designs that gave VMAX and then VMAX2 the title of “Most Reliable Symmetrix” ever produced.

VMAX3 is standing on the shoulders of the VMAX2 and is taking Reliability, Availability, and Serviceability to the next level.  Additional redundancy has been added on the backend DA connections to the DAEs.  More upgrades and service events leave the director online.  Tons of serviceability design changes for CS
have been added such as rear facing light bars (ever try to find one of these in a datacenter?), a work tray in the cabinets, etc. that set the VMAX3 as the RAS benchmark in the industry.

A Data Services Platform must be available, but it must also take functionality to the next level.  EMC’s goal for VMAX3 HyperMax OS is to make it the foundational component in our customer’s data center providing access to all of our best capabilities, FAST, SLO, FTS, Cloud, SRDF, TimeFinder, ProtectPoint, etc…).

HyperMax is our most comprehensive OS rewrite to date that takes the massively parallel preemptive multitasking kernel of VMAX3, and opens the door to running other data services apps inside the array.  The VMAX3 Data Services Platform can be extended with additional software functionality online as it comes available.  Most critically to our customers, code upgrades are done without taking a single component offline.  There is no failover/failback, one-at-a-time reboots, ports offline, etc. No other array can do this!

VMAX3 is designed for Always On operations.  Combined with SRDF, VPLEX + RecoverPoint, or ProtectPoint customer operations are always protected from outage, site failure, or any other outage situation.

We talk to our customers about “always on” Platform 3 infrastructure, and now we’ve built VMAX3 into an “always on” Data Services Platform.

VMAX3 Always On

Our phased release schedule enables us to get products to market faster.  We tell our customers how Agile development has revolutionized the way apps are brought to market and how the new normal is “fast.”   Get products out fast and iterate fast to
ramp up to provide the most important features first, prioritized by customer demand.  It’s one thing to do this in the Platform 3 web space, because the service is designed to be always on, and upgradable without degradation to the end user experience.  To do this in the hardware market, the platform must provide the same Always On availability to allow upgrades that enable the additional functionality.

In exactly this way, VMAX3 provides us a platform that is always available, and through HyperMax, additional data services can be added later online. Getting the base platform right is critical.   As previously discussed the upgradeability of VMAX3 allows us to introduce this revolutionary product now, which is crucial to maintaining market leadership position and building momentum while simultaneously creating a revolutionary new data services platform.

VMAX3 will carry customers toward Storage as a Service.  Fundamental to the value proposition of VMAX3, like VMAX Cloud Edition before it, is the idea that purchase decisions and provisioning are based on Service Level Objectives as opposed to rotational speed of the drives.  This outcome-based thinking is a wave carrying our customers toward the beachhead of Storage aaS.

Three years ago, EMC’s message for introducing ITaaS Transformation was 1) Have C-level sponsorship, 2) Pick a project and grow it, 3) Let the technology get you there.  Well, VMAX3 is getting our customers there.  Whether or not they have financial models that equate to the service levels inside VMAX3, they gain the simplicity and ability to automate processes that Service Level based provisioning provides.

The VMAX3 Data Services Platform can handle any performance Service Level Objective required by the customer.  Designed as an all-flash capable array, any of the VMAX3 family can support all-flash configurations.  Should the customer chose a higher capacity or lower cost design, spinning disk can be used in combination to provide various SLO’s of performance and capacity within the array.  Front-end and Back-end CPU cores are now pooled giving any port full line rate capability, doubling the IOPs capable on VMAX2.  PCI Gen 3 and 6Gb SAS connections to the DAE’s deliver incredible bandwidth for DSS workloads, tripling the throughput of VMAX2.  VMAX3 provides both the rich data services functionality and the performance required to process the avalanche of new data in the datacenter.

In summary, VMAX3 is an “always-on” scale-up-and-out Data Services Platform
that can be online extended via HyperMax OS to provide additional advanced software functionality within the array over its lifespan with incredibly easy to use SLO based provisioning meeting any performance requirements of SAN or NAS attached hosts. This redefinition of the storage array makes VMAX3 a larger step forward from VMAX as VMAX was from DMX.

 

My first python and JSON code

I’m not a developer. There I said it.

I’m a presales technologist, architect, and consultant.

But I do have a BS in Computer Science. Way back there in my hind brain, there exists the ability to lay down some LOC.

I was inspired today by a peer of mine Sean Cummins (@scummins), another Principal within EMC’s global presales organization. He posted an internal writeup of connecting to the VMAX storage array REST API to pull statistics and configuration information. Did you know there is a REST API for a Symmetrix?! Sad to say most people don’t.

He got me hankering to try something, so I plunged into Python for the first time, and as my first example project, I attached to Google’s public geocoding API to map street addresses to Lat/Lng coordinates. (Since I don’t have a VMAX in the basement)

So here it is. I think it’s a pretty good first project to learn a few new concepts. I’ll figure out the best way to parse a JSON package eventually. Anyone have any advise?

###################
# Example usage
###################
$ ./geocode.py
Enter your address:  2850 Premiere Parkway, Duluth, GA                 
2850 Premiere Parkway, Duluth, GA 30097, USA is at
lat: 34.002958
lng: -84.092877

###################
# First attempt at parsing Google's rest api
###################
#!/usr/bin/python

import requests          # module to make html calls
import json          # module to parse JSON data

addr_str = raw_input("Enter your address:  ")

maps_url = "https://maps.googleapis.com/maps/api/geocode/json"
is_sensor = "false"      # do you have a GPS sensor?

payload = {'address': addr_str, 'sensor': is_sensor}

r = requests.get(maps_url,params=payload)

# store the json object output
maps_output = r.json()

# create a string in a human readable format of the JSON output for debugging
#maps_output_str = json.dumps(maps_output, sort_keys=True, indent=2)
#print(maps_output_str)

# once you know the format of the JSON dump, you can create some custom
# list + dictionary parsing logic to get at the data you need to process

# store the top level dictionary
results_list = maps_output['results']
result_status = maps_output['status']

formatted_address = results_list[0]['formatted_address']
result_geo_lat = results_list[0]['geometry']['location']['lat']
result_geo_lng = results_list[0]['geometry']['location']['lng']

print("%s is at\nlat: %f\nlng: %f" % (formatted_address, result_geo_lat, result_geo_lng))

EMC Transforms Hadoop Infrastructures

EMC Greenplum HD on Isilon Scale Out NASEMC is transforming Hadoop based Big Data Analytics infrastructures from one-off, build-it-yourself, science projects of the early adopters to a fully supported, proven scalable, incredibly reliable solution for the majority of Enterprise IT shops.  EMC has married it’s proven Greenplum HD distribution of Apache Hadoop with the EMC Isilon, highest performing single filesystem scale-out NAS on the planet.  The Greenplum HD appliance removes the complexity of setting up a big data analytics infrastructure, and allows businesses to focus on generating value from their unstructured data.

 

Why Hadoop?

Not all data resides in a database.  It used to be the case that computers only analyzed data about well structured back office processes.  Business Intelligence was about sorting through transactions, and demographics, and data with very well defined structure.  imageBig Data Analytics is the next “big thing” for enterprise scale business, because not only are we now able to do BI on a much more rapid, iterative, dare I say “real-time” basis, but we are able to conduct these Analytics not just on data describing peoples’ demographics, but describing and tracking peoples’ behavior.  Peoples’ behaviors are fundamentally unstructured.  To track behavior (apparently) creates an unstructured mess of xml schemas, text log files, web traffic data, etc.  Hadoop (really a combination of MapReduce framework with the Hadoop Distributed File System) provides the ability to perform analytics tasks on any relationaly structured or non-structured data.  Imagine being able to iteratively process through all of the data you have about your products, customers, market trends, twitter streams, security logs, purchase history, etc. and come up with a predictive view of potential actions your constituency might take.  You constituency may be your marketing team given customers’ likely buying decisions, your product developers given product quality improvement data, your risk managers given data about potential clients, or your security team provided real-time data about attacks in progress.

Do you like spending money on science projects?

imageThe few who are willing to bet on new tech are called Early Adopters.  The Majority wait for a more guaranteed return on investment.  Early Adopters are willing to dedicate infrastructure for one-off projects, accept single points of failure and limited disaster recoverability, sacrifice solution efficiency for quicker time to market, and maintain a specialized support workforce when normal support channels don’t exist.

Why run a Hadoop appliance with EMC Isilon and EMC Greenplum HD?

According to the Enterprise Strategy Group’s White Paper: EMC’s Enterprise Hadoop Solution: Isilon Scale-out NAS and Greenplum HD (email address required), the EMC Hadoop Solution overcomes the innate issues with home grown Hadoop projects.

  • Isilon’s OneFS operating system eliminates the single point of failure of a single NameNode within Hadoop.  The NameNode contains all of the metadata for the HDFS storage layer.  By distributing all of the metadata across every node within the Isilon cluster, every node acts as a NameNode and provides a highly available solution for mission critical workloads.
  • Isilon’s HDFS implementation streamlines data access and loading by allowing NFS, CIFS, HTTP, or FTP access to data resident on the HDFS filesystem.  Since Hadoop applications can access the data directly without the expense of copy or move operations, this saves time, cost of storage, and greatly simplifies the Analytics workflow.
  • Implementing a dedicated storage layer allows for more efficient utilization of the compute and storage resources by allowing them to expand independently.  Most Hadoop infrastructures are based on DAS inside the compute nodes preventing independent scale.
  • Implementing the EMC Greenplum Hadoop Distribution on EMC Isilon hardware provides configuration backed by EMC’s premiere customer support capabilities.  Customers can leverage their existing knowledge and experience with EMC and Isilon, and don’t have to have specialists on staff to manage the Big Data Analytics infrastructure.

Ultimately any Hadoop implementation is just a portion of the overall Big Data Analytics requirement, but it is one that has held some mystery to traditional infrastructure customers.  Take a cue from what we’re learning from the Cloud value proposition and ask yourself if your enterprise is wants to get into the Hadoop business, or do they want to extract value from Big Data Analytics.  In the end Hadoop is a tool, now you can pick up the phone and “order one.”