EMC VMAX3 is a Data Services Platform

VMAX3 is now Generally Available and represents the next generation of market leading platforms from EMC that have established themselves as the most reliable and highest performing data storage platforms on the planet. Let’s examine what makes VMAX3 the first Data Services Platform in the industry and why VMAX3 is an even more revolutionary step from VMAX, as VMAX was from DMX. Along the way let’s dispel some myths about the product.

At launch VMAX3 is ready for the most mission critical workloads.  At its introduction, the original VMAX was a revolutionary platform, virtualizing the matrix interconnects between internal components and allowing for both scale-up and a scale-out expansion based entirely on Virtual Provisioning.  These were enormous leaps forward in the simplicity and scalability of Symmetrix.  EMC has developed rigorous testing, manufacturing processes, and incredible architecture designs that gave VMAX and then VMAX2 the title of “Most Reliable Symmetrix” ever produced.

VMAX3 is standing on the shoulders of the VMAX2 and is taking Reliability, Availability, and Serviceability to the next level.  Additional redundancy has been added on the backend DA connections to the DAEs.  More upgrades and service events leave the director online.  Tons of serviceability design changes for CS
have been added such as rear facing light bars (ever try to find one of these in a datacenter?), a work tray in the cabinets, etc. that set the VMAX3 as the RAS benchmark in the industry.

A Data Services Platform must be available, but it must also take functionality to the next level.  EMC’s goal for VMAX3 HyperMax OS is to make it the foundational component in our customer’s data center providing access to all of our best capabilities, FAST, SLO, FTS, Cloud, SRDF, TimeFinder, ProtectPoint, etc…).

HyperMax is our most comprehensive OS rewrite to date that takes the massively parallel preemptive multitasking kernel of VMAX3, and opens the door to running other data services apps inside the array.  The VMAX3 Data Services Platform can be extended with additional software functionality online as it comes available.  Most critically to our customers, code upgrades are done without taking a single component offline.  There is no failover/failback, one-at-a-time reboots, ports offline, etc. No other array can do this!

VMAX3 is designed for Always On operations.  Combined with SRDF, VPLEX + RecoverPoint, or ProtectPoint customer operations are always protected from outage, site failure, or any other outage situation.

We talk to our customers about “always on” Platform 3 infrastructure, and now we’ve built VMAX3 into an “always on” Data Services Platform.

VMAX3 Always On

Our phased release schedule enables us to get products to market faster.  We tell our customers how Agile development has revolutionized the way apps are brought to market and how the new normal is “fast.”   Get products out fast and iterate fast to
ramp up to provide the most important features first, prioritized by customer demand.  It’s one thing to do this in the Platform 3 web space, because the service is designed to be always on, and upgradable without degradation to the end user experience.  To do this in the hardware market, the platform must provide the same Always On availability to allow upgrades that enable the additional functionality.

In exactly this way, VMAX3 provides us a platform that is always available, and through HyperMax, additional data services can be added later online. Getting the base platform right is critical.   As previously discussed the upgradeability of VMAX3 allows us to introduce this revolutionary product now, which is crucial to maintaining market leadership position and building momentum while simultaneously creating a revolutionary new data services platform.

VMAX3 will carry customers toward Storage as a Service.  Fundamental to the value proposition of VMAX3, like VMAX Cloud Edition before it, is the idea that purchase decisions and provisioning are based on Service Level Objectives as opposed to rotational speed of the drives.  This outcome-based thinking is a wave carrying our customers toward the beachhead of Storage aaS.

Three years ago, EMC’s message for introducing ITaaS Transformation was 1) Have C-level sponsorship, 2) Pick a project and grow it, 3) Let the technology get you there.  Well, VMAX3 is getting our customers there.  Whether or not they have financial models that equate to the service levels inside VMAX3, they gain the simplicity and ability to automate processes that Service Level based provisioning provides.

The VMAX3 Data Services Platform can handle any performance Service Level Objective required by the customer.  Designed as an all-flash capable array, any of the VMAX3 family can support all-flash configurations.  Should the customer chose a higher capacity or lower cost design, spinning disk can be used in combination to provide various SLO’s of performance and capacity within the array.  Front-end and Back-end CPU cores are now pooled giving any port full line rate capability, doubling the IOPs capable on VMAX2.  PCI Gen 3 and 6Gb SAS connections to the DAE’s deliver incredible bandwidth for DSS workloads, tripling the throughput of VMAX2.  VMAX3 provides both the rich data services functionality and the performance required to process the avalanche of new data in the datacenter.

In summary, VMAX3 is an “always-on” scale-up-and-out Data Services Platform
that can be online extended via HyperMax OS to provide additional advanced software functionality within the array over its lifespan with incredibly easy to use SLO based provisioning meeting any performance requirements of SAN or NAS attached hosts. This redefinition of the storage array makes VMAX3 a larger step forward from VMAX as VMAX was from DMX.

 

Daily Narrative rather than Daily Todo List

I’ve come to realize that I am much more motivated to perform routine tasks if I can identify two or more beneficial reasons for performing the task. If the task only has one use or benefit of its outcome, then I’m likely to think I can live without that outcome, and not perform the task.  I need these multiple input  motivations to stack together and push me to perform.

A recent convergence on multiple benefits for doing a task as emerged recently, in planning for my day. One thing I would like to do, is to be more organized and create a daily plan or task list to help me prioritize my goals for the day. Another thing I’d like to be able to do is practice writing on a daily basis, so that I can more frequently create blog posts and more comfortably articulate my thoughts in writing in the business world. Finally I’ve always wanted to create a journal that perhaps I can pass on to my kids or to use in long-term planning or personal review, and to look back and see the trends of my life.

Combining all of these things has led me to a practice of not creating a task list, but a narrative story of how I see my day unfolding. By avoiding the dreaded task list, visualizing the flow of my day, and practicing a storytelling narrative style all at the same time gives me many reasons for engaging in the habit of daily planning. I think the narrative visualization is a very important aspect of daily planning, and forming what you expect to happen in your day into a story really engages that side of your brain to rehearse your day and make it flow that much more easily.

The day never works out as I had envisioned. And that’s okay. At the end of the day I can read my narrative and make notes and updates in the same document that described what really happened, see differences between how I thought the day would unfold and how they actually unfolded, and then in the nightly review naturally see how my expectations are thwarted by the events of the day. This helps me plan the next day that much more effectively. 

Go update your iPhone Syncplicity client now

syncplicityWow. Don’t read this. Just go get the new EMC Syncplicity iPhone client now. It’s amazing. It isn’t so much as a sync and share app as it is a sexy remote control for enterprise document management policies.

It’s better than Finder.

Sure, it does enterprise grade sync and share, allowing users to securely collaborate with each other ad hoc, and corporate to push down their own folders of content.

Sure, without having to rely on “the magic folder” into which you put all of the stuff to be synced, it acts very much like the easiest backup and recovery client around.

Sure, it’s at the top right of Gartner’s Magic Quadrant (TM Gartner). (Dropbox was not top-right)

Sure, it allows your teams to find each other’s content by gravatar (if you can’t quite remember their name.

Sure, it adds native PDF and M$oft document annotation capabilities, and lets me add new slides to my powerpoint while traveling.

Sure, it adds contextual menu options to keep clutter out of the interface, while exposing an extremely powerful set of document management tools.

Sure, it provides intelligent insights to recognize (for instance) by looking at your calendar, that you were just in a meeting, and have been creating documents. Would you like to share this document with the meeting participants? (and it pre-populates the email with their addresses).

Sure, it tracks who’s looking at the shared links you provide, and lets you track who is downloading your content and from where in the world (plotted on a map).

Yes, it has all that, but, man is it a sexy app to use! I just loaded it onto my phone, and I swear I think I can navigate my folder tree on my phone better than I can in Finder on the Mac.

What it does is provide a continual context for where you are in the folder tree by using the “cards” interface more commonly in use these days. Riffle the DeckAll parent folders are cards that can be viewed by “riffling the deck from above” or by “fanning the cards” left to right.

I love that when you “fan the cards” the parent card UI is active, in IMG_1387that you can scroll down and jump straight into another folder without first having to “select that card, and then start scrolling”.

A few screenshots don’t really do it justice, so check out this UI video.

Great job Syncplicity.  You got this one right!

How I installed OpenStack at a Service Provider in 20 mins

I’ve been playing around with my skillz in python, puppet, OpenStack, deployment methods, etc for a few weeks.  Here’s a small example of deploying OpenStack in a non-production environment (no hardening or customization of any kind) in 20 mins.  Maybe it’ll help you in your self education.

For testing purposes I use a service provider called Digital Ocean. I’m sure what they do is competitive to all things EMC, so this is not an endorsement. They do, however have VERY CHEAP costs if you’re just doing testing (on the order of < 1 penny per hour) with linux based services (again YMMV, I have no idea how good they are).

Now when I say 20 mins, that’s how long it takes for the procedures to run once you figure out what to do. I’ve spent a couple of days playing with the site, VM’s, python, devstack, etc. to get the procedure in place. That said, it took 20 mins of wall clock time to get OpenStack running from the devstack.org package on a single machine deployment (ie: not production ready).

At the end of this post is the python code to talk to the Digital Ocean API and deploy the instance. Since it’s not very sophisticated code, I just edit it directly to do what I want to do as opposed to passing command line arguments (list, deploy, destroy).

So, to get a VM I called ‘devstack’ I executed my little script with settings hard-coded:

./go.py

The script returned the following output, parsed from the JSON response:

107.170.89.157 devstack 1790811

Then I did the following to install OpenStack:

$ sudo echo "107.170.89.157 devstack" >> /etc/hosts
$ ssh root@devstack
# adduser stack
# echo "stack ALL=(ALL:ALL) ALL" >> /etc/sudoers
# apt-get install git
# su - stack
$ git clone https://github.com/openstack-dev/devstack.git
$ cd devstack && ./stack.sh
Output:lots of output from stack.sh... 
Horizon is now available at http://107.170.89.157/
Keystone is serving at http://107.170.89.157:5000/v2.0/
Examples on using novaclient command line is in exercise.sh
The default users are: admin and demo
The password: NOTFORYOUREYESTOSEE
This is your host ip: 107.170.89.157
stack.sh completed in 794 seconds.

Here is the python script I mentioned above, It’s not great, but maybe it’ll give you some idea how to talk JSON, and return results to an online API.

#!/usr/bin/python

import requests, json, pprint, time, socket, sys

################
## make the json call to the public api
################
def jsonRequest(targetUrl):
#set the headers for how we want the response
headers = {'content-type': 'application/json','accept':'application/json'}

# make the actual request
r = requests.get(targetUrl, headers=headers, verify=False)

#take the raw response text and deserialize it into a python object.
try:
responseObj = json.loads(r.text)
except:
print "Exception"
print r.text
#print json.dumps(responseObj, sort_keys=False, indent=2)
return responseObj

################
## return list of instances
################
def getDroplets(apiKey, clientId):
#set the target url for the query
targetUrl = "https://api.digitalocean.com/v1/droplets/?client_id=%s&api_key=%s" % (clientId, apiKey)

# make the actual request
resultList = jsonRequest(targetUrl)
return resultList['droplets']

################
## Deploy a new instance
################
def deployDroplet(apiKey, clientId, hostName):
sizeId = '66' #smallest; 62 = 2GB Ram
imageId = '3240036' #ubuntu 64bit
regionId = '4' #NY region
sshKey1 = '153816' #brighs ssh key
sshKey2 = ''
privateNetworking = 'true' # create private network

#set the target url for the query
targetUrl = "https://api.digitalocean.com/v1/droplets/new?client_id=%s&api_key=%s&name=%s&size_id=%s&image_id=%s®ion_id=%s&ssh_key_ids=%s,%s&private_networking=%s" % (clientId, apiKey, hostName, sizeId, imageId, regionId, sshKey1, sshKey2, privateNetworking)

# make the actual request
responseObj = jsonRequest(targetUrl)
return responseObj['droplet']

################
## destroty an instance
################
def destroyDroplet(apiKey, clientId, dropletId):
#set the target url for the query
targetUrl = "https://api.digitalocean.com/v1/droplets/%s/destroy/?client_id=%s&api_key=%s" % (dropletId, clientId, apiKey)

# make the actual request
responseObj = jsonRequest(targetUrl)
return responseObj

######################################

apiKey = "GOGETYOUROWNAPIKEY"
clientId = "GOGETYOUROWNCLIENID"

droplets = getDroplets(apiKey, clientId)
for droplet in droplets:
#for key in droplet:
# print key
print "%s %s %s" % (droplet['ip_address'], droplet['name'], droplet['id'])
#print "destroying droplet %s" % (droplet['name'])
#destroyResult = destroyDroplet(apiKey, clientId, droplet['id'])
#print destroyResult['status']

#droplet = deployDroplet(apiKey, clientId, 'devstack')
#print "%s %s" % (droplet['name'], droplet['id'])
#deployDroplet(apiKey, clientId, 'tester4')
#deployDroplet(apiKey, clientId, 'tester5')
#destroyDroplet(apiKey, clientId, '1786240')

EMC ViPR Leverages 3rd Party OpenStack Cinder Plugins

EMC ViPR 2.0 was just announced today at EMC World.  It brings a distinct level of maturity to a product that was only just been brought to market, but is already the most advanced storage controller of heterogeneous arrays.  One of the most interesting aspects of the announcement was the leverage of Cinder to provide ViPR a much wider support capability for heterogeneous arrays!

ViPR LogoIf you haven’t heard of ViPR, it’s not “storage virtualization” in the sense that you’re familiar with.  It’s a control-plane product like Virtual Center rather than a virtualization product like the hypervisor ESXi.  ViPR discovers storage, SAN, and hosts and controls the provisioning and connectivity of those assets together.  It virtualizes (in a sense) heterogeneous storage “southbound” and provides the common higher function API “northbound” to any framework product that wants an easy way to communicate with the Software Defined Storage layer.

Enough similar products and management frameworks have been created over the years for the SNIA storage standards organization to create an API standard called SMIS.  EMC VMAX and VNX have SMIS providers for instance that ViPR can use to manage provisioning on these arrays.  Other vendors do not so easily provide or so fully support the SMIS standard, and as such ViPR can’t use a standards-based approach to managing them.  Thus the EMC ViPR team was left with creating plugins for each and every array to provide provisioning support–a daunting task indeed.

Enter the OpenStack community.  I’m sure as we were throwing our weight into the OpenStack community, some smart-gal-or-guy in our Advanced Software Division said, “Hang on a sec.  All these vendors are producing Cinder plugins for OpenStack.  Why can’t ViPR just leverage them?”  And where SMIS couldn’t deliver, a viable open source community has created a widely adopted API for OpenStack.

Let’s make sure you understand this.  We’ve had Cinder API support for “Northbound” into OpenStack.   This provides OpenStack the ability to provision block storage against ViPR-as-a-virtual-storage-array.  This new announcement leverages every other vendor’s Cinder support to provide ViPR the ability to communicate “Southbound” into the array and manage it with ViPR-as-a-SDS-controller.  This is a VERY different animal and instantly provides ViPR a wide range of heterogeneous storage management and automation capabilities.

Added to this common industry storage support, the ViPR team has now announced a plugin support (rich data services support) for Hitachi arrays, with more on the way.  Great stuff from the EMC ViPR team.

Size Does Matter

A colleague of mine Sean Cummins recently posted a very good summary of VMAX FAST VP best practices which got a little derision from some of the competition.

bitingoffMuch of the competitive argument is from those who sell an all flash array that can store about 100TB after deduplication. That’s a single RAID type.  That’s a single tier.  That’s a single price per GB.  That’s a single IO per GB. That’s great, but what about the other 90%+ of the capacity of the VMAX, and all of the other workloads that don’t require the highest of IOPS and lowest of latencies? EMC marketing has focused on the “optimization of flash” within the VMAX, but the flip side of that point-of-view is to drive costs out by a massive introduction of SATA without suffering the performance hit.

For the last several years, and for some time to come flash has been a means to an end. The real value of flash has been to enable the placement of as much capacity as possible onto cheap, high capacity, slow SATA. Customers can’t afford to place petabytes of data on all-flash solutions. There aren’t all-flash solutions that can accommodate that scale. The petabyte workloads have not required all-flash performance.

This to shall pass. Every major storage vendor has an all-flash product. There are even customers today (who shall not be named) that purchase all-flash VMAX! We’re starting to see how Latency is becoming the name of the game as opposed to IOPS. This will drive all-flash array consumption for those smaller workloads that can fit on an all-flash solution that don’t require enterprise federated consistency protection for BC/DR (that’s a whole other topic).

hammerThe world is changing rapidly. Data growth is exploding. Companies want to store everything in the hopes that they’ll be able to mine their data for customer insights. We’ll see what storage architectures emerge as the most effective for each of the various workloads in the world.  My warning is this. Whoever is selling you “only one way” of doing things is SELLING YOU on an idea that all of your workloads are the same. They have the proverbial hammer, and they’re trying to convince you all your problems are nails. Don’t buy it.

EMCs OpenStack strategy

EMC Openstack Like most things at EMC, the OpenStack integration strategy is focused on customer choice. It’s an attempt to meet the customer where they leveraging IT assets, specific workloads, and integration points into all of our products. Additional automation and deeper integration we provided by our software defined storage can control plane software called ViPR. Obviously all of the deep reporting, SRM tools, and storage data protection services will remain and be brought into the OpenStack integration.

To jump right into EMC integration with OpenStack Cinder, we provide native plug-ins for all of our block storage products, iSCSI Isilon and Scale I/O. Several major distributions such as Rack Space, Red Hat, and Canonical distribute the plug-in for VNX and VMAX, and versions of the Cinder plug-in support fiber Channel are available from Github or the EMC web site. Third parties have written the Cinder plug-ins for Isilon iSCSI and Swift support and iSCSI support for Scale I/O.

image EMC’s recommended approach would be to use the Cinder plug-in for our ViPR storage control-plane product, since through that single plug-in iSCSI and fiber channel support are provided across all our products, as well as object storage support for Swift, S3, and our own Atmos APIs. That plug-in is available from Github and will be coming to the major distributions soon. The advantage of using ViPR especially in fiber Channel environments is that ViPR will automate the zoning and the definition of rich service pools that provide fully automated storage tiers, replication settings, and high availability configuration.

Not only is there a single integration point with the ViPR Cinder plug-in for block storage, but with ViPR data services we bring Swift API and Amazon S3 API support. Finally, ViPR brings this automation support to all consumers of block storage, not only OpenStack but VMware and other virtualized or bare metal hosts.

Community for Sharing Storage Resource Management (SRM) Custom Reports

EMC SRM 3.0I was in an EMC and EMC Partner training class around SRM 3.0 (Storage Resource Management Suite) today, and came away very excited about one of the latest software assets in EMC’s portfolio (SRM home page).  SRM 3.0 is a comprehensive tool for reporting on capacity and performance of the largest storage enterprises in the world.  Using passive or standards based active discovery, SRM can provide comprehensive capacity and performance reporting and trending across your entire IT stack: VM’s, physical servers, SAN infrastructure, and storage elements.

EMC is becoming very good at providing a simple and easy out-of-the-box experience for its software products while preserving a very rich underlying functionality and flexibility that elevates EMC products above the competition. This balance is very evident in SRM 3.0, where predefined active dashboards show summary data, and allow click-through to the underlying metrics and detailed reports. (Watch a demo video here).

SRM-ReportDuring the class lecture, (and the first hands-on experience I had with the tool) I was able to modify one of the standard storage pool reports to show the data reduction associated with Thin Provisioning on VMAX and VNX. I discussed some of these formulas in a previous post (Some EMC VMAX Storage Reporting Formulas).

One of the best parts of SRM 3.0 is the ability to export and import Report Templates. As such, I exported the report I customized, and posted the XML here for you to download and import into your own SRM 3.0 installation.

I’d like to invite all of you to export reports that you have customized for your environment with the community, so that we can engage the power of everyone to advance the art of storage resource management.

My first python and JSON code

I’m not a developer. There I said it.

I’m a presales technologist, architect, and consultant.

But I do have a BS in Computer Science. Way back there in my hind brain, there exists the ability to lay down some LOC.

I was inspired today by a peer of mine Sean Cummins (@scummins), another Principal within EMC’s global presales organization. He posted an internal writeup of connecting to the VMAX storage array REST API to pull statistics and configuration information. Did you know there is a REST API for a Symmetrix?! Sad to say most people don’t.

He got me hankering to try something, so I plunged into Python for the first time, and as my first example project, I attached to Google’s public geocoding API to map street addresses to Lat/Lng coordinates. (Since I don’t have a VMAX in the basement)

So here it is. I think it’s a pretty good first project to learn a few new concepts. I’ll figure out the best way to parse a JSON package eventually. Anyone have any advise?

###################
# Example usage
###################
$ ./geocode.py
Enter your address:  2850 Premiere Parkway, Duluth, GA                 
2850 Premiere Parkway, Duluth, GA 30097, USA is at
lat: 34.002958
lng: -84.092877

###################
# First attempt at parsing Google's rest api
###################
#!/usr/bin/python

import requests          # module to make html calls
import json          # module to parse JSON data

addr_str = raw_input("Enter your address:  ")

maps_url = "https://maps.googleapis.com/maps/api/geocode/json"
is_sensor = "false"      # do you have a GPS sensor?

payload = {'address': addr_str, 'sensor': is_sensor}

r = requests.get(maps_url,params=payload)

# store the json object output
maps_output = r.json()

# create a string in a human readable format of the JSON output for debugging
#maps_output_str = json.dumps(maps_output, sort_keys=True, indent=2)
#print(maps_output_str)

# once you know the format of the JSON dump, you can create some custom
# list + dictionary parsing logic to get at the data you need to process

# store the top level dictionary
results_list = maps_output['results']
result_status = maps_output['status']

formatted_address = results_list[0]['formatted_address']
result_geo_lat = results_list[0]['geometry']['location']['lat']
result_geo_lng = results_list[0]['geometry']['location']['lng']

print("%s is at\nlat: %f\nlng: %f" % (formatted_address, result_geo_lat, result_geo_lng))

Enterprise Storage Scale Out vs Scale Up

A lot of what we hear in the cloud washed IT conversation today is that scale out architectures are the preferred method of deployment over scale up designs. As the theory goes, scale out architectures provide more and more compute connectivity and storage capacity as you add nodes.

Scale outMost vendor marketing has jumped onto the scale out bandwagon leaving implementations that also scale up indistinguishable from the rest. Properly conceived scale out architectures allow us to add additional discrete workloads to an existing system by adding nodes to the underlying infrastructure. The additional nodes accommodate processing required to support the additional separate workloads. This kind of share nothing design, allows us to stack a large quantity of discrete workloads together and manage the entire system is one entity, even though it can be very hard to balance workload across the nodes (silo’ed).

But what about the large quantity of monolithic workloads that need scalable performance but are not able to be distributed across independent nodes? Take for example a single database instance that begins life with just a few users, and is supported well on a single node of a scale out architecture. As this instance grows, more users are added, the database grows beyond the confines of the single node. In many pure scale out implementations, adding additional nodes is ineffectual. The additional nodes do not share in the burden of the workload from the first node. They are designed merely to accommodate additional discrete workloads.

Scale upEMC scale out and up products such as VMAX Isilon, provide the cost benefits of a scale out model with the performance benefits of a scale up model. Purchasing only a few nodes initially, IT gains the cost benefit of starting small, and as workloads increase, additional nodes can be brought online, and data and workload is automatically redistributed across all nodes. This creates a share everything approach that supports the ability to scale up discrete applications and while also providing the ability to scale out stacking additional applications into the array along with the first. Software-based controls provide for multi-tenancy, tiering and performance optimization, workload priorities, and quality of service. This is as opposed to scale out architectures where rigid hardware defined partitions separate components of the solution.

I hope this helps demystify “scale out” a little. Just because you can manage a cluster of independent nodes, doesn’t mean it’s an integrated system. Unless workload tasks are shared across the entire resource pool, scale out does not confer the benefits of scale up, where scale up-and-out designs provide the best of both worlds.