EMC Presence at OpenStack Tokyo

Every OpenStack summit there’s more and more to talk about with EMC and the Federation.  I’m sure much of the hallway chatter will be around welcoming our potential new private equity overlords.

The real meat of the conference will be inside the sessions, and here is how you can find EMC at the summit:

We are excited to announce that EMC is a Headline sponsor at OpenStack Tokyo!  Team OpenStack @ EMC will be present, ready to engage and collaborate on our OpenStack contributions, integrated solutions and industry-leading software-defined infrastructure.

How to find EMC at the event:

  • Come by our EMC booth to chat with our experts, see live demos (i.e. watch ScaleIO trump Ceph time and again), schedule customer meetings, and get lots of EMC gear.
  • Set up some time to meet with an EMC subject matter expert, email sweeney@emc.com or your EMC account manager with any inquiries or questions.

Attend one of EMC’s sessions:

Battle of the Titans:  Real-Time Demonstration of Ceph vs. ScaleIO Performance for Block Storage

Orchestrate ALL The (Storage) Things: OpenStack Data Availability with CoprHD

Operating at Web-scale: Will Containers Crush the Openstack Ecosystem?

Cloud Storage in your datacenter: Geo-scale SWIFT, S3 and more for Exabytes of Multi-tenant, Hadoop ready data

For complete and most up to date information on EMC at OpenStack Summit Tokyo, follow our ECN Community Page.


Do you really think brokerage of commodity clouds will be a thing?

Why do we think that one day we’ll broker usage of commodity clouds against each other for the lowest price per usage?

I’m in tune with the clouderatii.  I’ve read the books equating the current transformation of IT into a continuous delivery pipeline as analogous to the recent past transformation of manufacturing into just in time.  I’ve participated in the conferences.  I’ve read the pundits blogs and tweets.  I’ve listened to the podcasts.

I still have a question…

Is there really a precedent for this idea of brokering multiple commodity clouds, and deploying workloads anywhere you get the best price assuming everyone provides basically the same features?

Icloud-coupons it like an integrator sourcing the same component from multiple manufacturers?  Is it like market brokering for the best price electricity from the grid?  It’s like hedging fuel if you’re an airline?

Maybe, but I’m not sure those are truly analogous.  Is there an IT based precedent?

Is it like having a multi-vendor strategy for hardware, software, networking, or storage?  (Does that really work anyway?)

I don’t think so.

I’ve had conversations of late about the reality (or not!) of multi-vendor cloud, and the need for vendor management of multiple cloud offerings.  The idea seems to be that there will be a commoditization of cloud service providers, and through some market brokerage based solution, you’ll be able to deploy your next workload wherever you can get the best deal.

Keep in mind, I deal primarily with enterprise customers… not Silicon Valley customers.

There are still very few workloads that I see in the real world that fit this deploy-anywhere model.  If they exist, these apps are very thin links in the value chain limited by network complexity and proximity to data.  They typically exist to manage end-user interaction, pushing data to a central repository that doesn’t move.  Maybe there are some embarrassingly parallel workloads that can be farmed out, but how big of a problem is data gravity?  (the payload can’t be that big or the network costs eat you up)  How often do you farm out work to the cloud?  Is it so frequent that you need a brokerage tool, or will you simply negotiate for the best rate the next time you need it?

Multi-vendor IT management doesn’t come close to the dynamism suggested by such brokerage.  In my experience individual management ecosystems are developed by each vendor to differentiate and make themselves “sticky” to the consumer.  Some would say “lock-in,” but that’s not really fair if real value is gained.

Are all server vendors interchangeable such that you could have virtualization running on several vendor machines all clustered together?  Only if you want a support nightmare.  No one really does this at scale, do they?  They may have a few racks of red servers and a few racks of blue, but they’re not really interchangeable due to the way they are managed and monitored.

Linux is a common example of this commoditization in action… Are all Linux OS’s the same?  Can you just transition from RedHat to Ubuntu to SUSE on a whim?  Do you run them all in production and arbitrage between them?  You don’t, because even though they might be binary compatible Linux, each distribution is managed differently.  They have different package management toolchains.  They’ll all run KVM, true; but you’ll need to manage networking carefully.  What about security and patching?  Are they equally secure and on the same patch release cycle?

Shifts between vendors only really happen across multiple purchase cycles each with a 3-5 year term and with the cost of a lot of human effort.

Will the purchase cycle of cloud be 3-5 months of billing?  Yes, this I could imagine.  This would allow multiple cloud vendors to compete for business, and over 12 months or so a shop could transition from one cloud to another (depending on how frequently their workload instances are decommissioned and redeployed).  And yet, network complexity and data gravity imply extreme difficulty in making the switch between clouds if app instances are clustered or must refer to common data sets; or if the data sets themselves must transition from one cloud to another (again with the network complexity and data gravity).

The only way to really engage in the brokerage model is to have very thin apps whose deployment does not rely on differentiated features of the cloud providers.  You’ll have to create deployment artifacts that can run anywhere, and you’ll have to use network virtualization to allow them all to communicate back to the hub systems of record.  Then there’s the proximity to the data to be processed.  You’d better not be reliant on too much centralized data.

It’s too early to have all the answers, but I’m suggesting that the panacea of multi-cloud brokerage imagined by the pundits will never really materialize.  If the past is any guide to the future, the differentiation of the various systems won’t allow easy commoditization.  They’ll be managed differently, and it’ll be hard to move between them.  Any toolset that provides a common framework for management will reduce usage to the least common denominator functionality.  And nothing so far is really addressing network complexity or data gravity.

The issue is more complex than the pundits and the podcasts would have you believe.  I don’t know the answers, do you think you do?  I’d love to hear your opinions.

Big IT is Swallowing OpenStack Upstarts

OpenStack partners getting swallowed by big corporations
OpenStack partners getting swallowed by big corporations

And with that, OpenStack is now unquestionably a big vendor driven set of projects.  EMC acquired CloudScaling a while back, Cisco has announce their acquisition of Piston Cloud, and IBM is acquiring Bluebox.  The only meaningful independent OpenStack generalist company now is Mirantis.  (props to HP, RedHat, and Canonical, but they also do other things).

It’s not that anyone really ever questioned that OpenStack was being driven by corporate interests.  The cliche has always been that it has more vendor sponsors than customers.  But does that matter?  The point of OpenStack seeks to provide a common IaaS layer that’s not owned by Amazon, so that all these corporate interests can collectively catch up to the head start Amazon enjoys.  A the same time, corporations that feel like hosting their own IaaS is strategic to their business are encouraged to consider OpenStack since their traditional IT vendors are also leveraging it as an emerging standard.

What do traditional IT vendors want with OpenStack upstarts?
What do traditional IT vendors want with OpenStack upstarts?

What are these traditional players going to do with these OpenStack upstarts?  ensure compatibility with existing solutions… build OpenStack-in-a-box products… provide service and support offerings around the platform… make sure that there’s just enough innovation within OpenStack solutions to be competitive, but not too much that would devalue existing products too quickly… you know, the standard stuff.

From the Vancouver Summit, though there appear to be more direct customers using “OpenStack,” it’s more nuanced than that.  The nuance is OpenStack is not a product or a single project.  OpenStack is a collection of projects that encompass compute, networking, and storage.  Customers do not have to swallow the whole pill.  Many of the customers “using OpenStack” at the summit are really only using Nova, Glance, and Cinder; or Swift; or Ceph (not OpenStack BTW); and very few are leveraging most or all the projects for an all-encompassing deployment.

I think OpenStack has a future.  It’ll be up to the governance model to ensure that OpenStack remains a common playing field or diverges into separate incompatible offerings.  It’ll be fun to watch the run-by-committee model and see if it can produce a truly viable IaaS before while such a thing is still a relevant need in IT.

Measuring the Value of Corporate Data

Increasingly, smart people are taking up the mantle of assigning economic value to the data or information assets within organizations. Steve Todd is producing a blog series on the topic.  Dave McCrory has begun looking at gravity theory applied to data (Data Gravity) in an attempt at some point to aid valuation or at least guide investments.  Various scholarly papers have been written since the 90’s discussing methods of economic valuation.

First, who cares?  We are fully involved in the digital economy. How do we estimate the value of an information based company?  Why isn’t the very information asset against which many digital businesses are based represented on the balance sheet?  For how much should we insure our data assets against loss?  What is a fair price to charge for access to information?  What value can we assign information to use to collateralize a loan?

And on the flip side, what is the negative value of leakage of information into the hands of bad guys?  What about the tax implications of an information economy?  What value is being traded by an information currency every time I barter my personal data for online services or discounts at the local grocery?

Economists, venture captialists, insurers, CFO’s, shareolders all care how valuable a company’s information assets are, but how can they assign fair value?  That’s the topic of an EMC research study in partnership with University of California San Diego researcher Dr. Jim Short to explore “all things data value”. Jim is the Research Director for the Global Information Industry Center (GIIC).

To further this discussion, I’d like to comment on a paper I read several years ago on this topic titled “Measuring The Value Of Information: An Asset Valuation Approach” presented at the European Conference on Information Systems (ECIS ’99) by Daniel Moody and Peter Walsh.

In short, information is an Asset in the economic technical term, and it has measurable value.  Only the method of measurement is in question.  Moody and Walsh identify 7 “laws” of information that I will make further comment on

  1. “Information is infinitely shareable.”  I agree, information is not generally “appropriable” in the sense of exclusive possession.  Anyone can make a perfectly valid copy or share access to the original if they are within the network universe.  In this sense the value can be cumulative of all shared points of use, and is more valuable the more it is shared.  This is the primary driver of “value in use” of information.  On the other hand, what about copies of data?  Is duplicated information more valuable?  What if it is duplicated for DR purposes?  Perhaps only if it is still “owned” by the original party.  Data piracy is an example of copies of information returning no or negative value to the orignial owner.
  2. “Value of Information increases with use.”  Yep, absolutely as opposed to other more tangible assets that depreciate over time.  This law also leads to one of the most important methods of valuation—measuring the frequency of access of information and by whom.
  3. “Information is Perishable.” They are suggesting that information’s value decreases over time.  I would tend to agree, but with the caveat that Big+Fast Data analytics are extracting value from old data in ways never imagined before.  There needs to be a method of predicting future value of presently unused information.  This type of future value speculation may be very hard to do.  I wonder if there are economic forces similar to holding land for long periods, in the hopes that one day gold may be discovered.  There may be categories of information holdings that are assumed to be more valuable due to past discoveries of value in similar types of datasets.
  4. “The value of information increases with accuracy.”  So I tend to think accuracy is overrated.  I mean, have you seen all the hoax articles on facebook recently?  Fabrications and lies are valuable too.  But I get it.  Generally you want accracy in your datasets.  This builds trust which in turn builds value.
  5. “The value of information increases when combined with other information.”  Absolutely a gold star on this one.  This is much of the premis of Data Gravity.  Information pulls other information to it, and data sets become ever larger.  This accumulation of valuable information increases its pull of other information and applications.  Another name for this is “context.”  Information is much more valuable in context, and the buying and selling of information is a huge business today.  This law drives the “value in exchange” portion of information valuation as companies buy and sell data.
  6. “More is not necessarily better.”  Hmmm.  This law is beginning to seem a little outdated.  The authors discuss human psychology and information overload.  These days the machines are doing the analytics on our behalf, and more of this good thing does seem to be more of a good thing.
  7. “Information is not depletable.”  You don’t reduce the quantity of information as it is used.  In fact the opposite is true.  More information is created through the use of information.  This “metadata” (information about information) is often as valuable or more valuable than original content (just ask the NSA).  In fact, it is through this metadata on information usage that many aspects of value themselves are determined.

Since information is sharable, imperishable, and nondepletable, we can look at summing both the market “value in use” and utility “value in exchange” to find the true value of a dataset.  I wonder if any of the following industries have valuation models that are similar:

  • Research libraries and their value to a community
  • Land holdings and their value for mineral rights
  • Methods of appraisal of antiques or other goods of subjective worth

It’s an interesting topic to begin dialoguing about.  Much more needs to be discussed.  What do you think?


EMC VMAX3 is a Data Services Platform

VMAX3 is now Generally Available and represents the next generation of market leading platforms from EMC that have established themselves as the most reliable and highest performing data storage platforms on the planet. Let’s examine what makes VMAX3 the first Data Services Platform in the industry and why VMAX3 is an even more revolutionary step from VMAX, as VMAX was from DMX. Along the way let’s dispel some myths about the product.

At launch VMAX3 is ready for the most mission critical workloads.  At its introduction, the original VMAX was a revolutionary platform, virtualizing the matrix interconnects between internal components and allowing for both scale-up and a scale-out expansion based entirely on Virtual Provisioning.  These were enormous leaps forward in the simplicity and scalability of Symmetrix.  EMC has developed rigorous testing, manufacturing processes, and incredible architecture designs that gave VMAX and then VMAX2 the title of “Most Reliable Symmetrix” ever produced.

VMAX3 is standing on the shoulders of the VMAX2 and is taking Reliability, Availability, and Serviceability to the next level.  Additional redundancy has been added on the backend DA connections to the DAEs.  More upgrades and service events leave the director online.  Tons of serviceability design changes for CS
have been added such as rear facing light bars (ever try to find one of these in a datacenter?), a work tray in the cabinets, etc. that set the VMAX3 as the RAS benchmark in the industry.

A Data Services Platform must be available, but it must also take functionality to the next level.  EMC’s goal for VMAX3 HyperMax OS is to make it the foundational component in our customer’s data center providing access to all of our best capabilities, FAST, SLO, FTS, Cloud, SRDF, TimeFinder, ProtectPoint, etc…).

HyperMax is our most comprehensive OS rewrite to date that takes the massively parallel preemptive multitasking kernel of VMAX3, and opens the door to running other data services apps inside the array.  The VMAX3 Data Services Platform can be extended with additional software functionality online as it comes available.  Most critically to our customers, code upgrades are done without taking a single component offline.  There is no failover/failback, one-at-a-time reboots, ports offline, etc. No other array can do this!

VMAX3 is designed for Always On operations.  Combined with SRDF, VPLEX + RecoverPoint, or ProtectPoint customer operations are always protected from outage, site failure, or any other outage situation.

We talk to our customers about “always on” Platform 3 infrastructure, and now we’ve built VMAX3 into an “always on” Data Services Platform.

VMAX3 Always On

Our phased release schedule enables us to get products to market faster.  We tell our customers how Agile development has revolutionized the way apps are brought to market and how the new normal is “fast.”   Get products out fast and iterate fast to
ramp up to provide the most important features first, prioritized by customer demand.  It’s one thing to do this in the Platform 3 web space, because the service is designed to be always on, and upgradable without degradation to the end user experience.  To do this in the hardware market, the platform must provide the same Always On availability to allow upgrades that enable the additional functionality.

In exactly this way, VMAX3 provides us a platform that is always available, and through HyperMax, additional data services can be added later online. Getting the base platform right is critical.   As previously discussed the upgradeability of VMAX3 allows us to introduce this revolutionary product now, which is crucial to maintaining market leadership position and building momentum while simultaneously creating a revolutionary new data services platform.

VMAX3 will carry customers toward Storage as a Service.  Fundamental to the value proposition of VMAX3, like VMAX Cloud Edition before it, is the idea that purchase decisions and provisioning are based on Service Level Objectives as opposed to rotational speed of the drives.  This outcome-based thinking is a wave carrying our customers toward the beachhead of Storage aaS.

Three years ago, EMC’s message for introducing ITaaS Transformation was 1) Have C-level sponsorship, 2) Pick a project and grow it, 3) Let the technology get you there.  Well, VMAX3 is getting our customers there.  Whether or not they have financial models that equate to the service levels inside VMAX3, they gain the simplicity and ability to automate processes that Service Level based provisioning provides.

The VMAX3 Data Services Platform can handle any performance Service Level Objective required by the customer.  Designed as an all-flash capable array, any of the VMAX3 family can support all-flash configurations.  Should the customer chose a higher capacity or lower cost design, spinning disk can be used in combination to provide various SLO’s of performance and capacity within the array.  Front-end and Back-end CPU cores are now pooled giving any port full line rate capability, doubling the IOPs capable on VMAX2.  PCI Gen 3 and 6Gb SAS connections to the DAE’s deliver incredible bandwidth for DSS workloads, tripling the throughput of VMAX2.  VMAX3 provides both the rich data services functionality and the performance required to process the avalanche of new data in the datacenter.

In summary, VMAX3 is an “always-on” scale-up-and-out Data Services Platform
that can be online extended via HyperMax OS to provide additional advanced software functionality within the array over its lifespan with incredibly easy to use SLO based provisioning meeting any performance requirements of SAN or NAS attached hosts. This redefinition of the storage array makes VMAX3 a larger step forward from VMAX as VMAX was from DMX.


Daily Narrative rather than Daily Todo List

I’ve come to realize that I am much more motivated to perform routine tasks if I can identify two or more beneficial reasons for performing the task. If the task only has one use or benefit of its outcome, then I’m likely to think I can live without that outcome, and not perform the task.  I need these multiple input  motivations to stack together and push me to perform.

A recent convergence on multiple benefits for doing a task as emerged recently, in planning for my day. One thing I would like to do, is to be more organized and create a daily plan or task list to help me prioritize my goals for the day. Another thing I’d like to be able to do is practice writing on a daily basis, so that I can more frequently create blog posts and more comfortably articulate my thoughts in writing in the business world. Finally I’ve always wanted to create a journal that perhaps I can pass on to my kids or to use in long-term planning or personal review, and to look back and see the trends of my life.

Combining all of these things has led me to a practice of not creating a task list, but a narrative story of how I see my day unfolding. By avoiding the dreaded task list, visualizing the flow of my day, and practicing a storytelling narrative style all at the same time gives me many reasons for engaging in the habit of daily planning. I think the narrative visualization is a very important aspect of daily planning, and forming what you expect to happen in your day into a story really engages that side of your brain to rehearse your day and make it flow that much more easily.

The day never works out as I had envisioned. And that’s okay. At the end of the day I can read my narrative and make notes and updates in the same document that described what really happened, see differences between how I thought the day would unfold and how they actually unfolded, and then in the nightly review naturally see how my expectations are thwarted by the events of the day. This helps me plan the next day that much more effectively. 

Go update your iPhone Syncplicity client now

syncplicityWow. Don’t read this. Just go get the new EMC Syncplicity iPhone client now. It’s amazing. It isn’t so much as a sync and share app as it is a sexy remote control for enterprise document management policies.

It’s better than Finder.

Sure, it does enterprise grade sync and share, allowing users to securely collaborate with each other ad hoc, and corporate to push down their own folders of content.

Sure, without having to rely on “the magic folder” into which you put all of the stuff to be synced, it acts very much like the easiest backup and recovery client around.

Sure, it’s at the top right of Gartner’s Magic Quadrant (TM Gartner). (Dropbox was not top-right)

Sure, it allows your teams to find each other’s content by gravatar (if you can’t quite remember their name.

Sure, it adds native PDF and M$oft document annotation capabilities, and lets me add new slides to my powerpoint while traveling.

Sure, it adds contextual menu options to keep clutter out of the interface, while exposing an extremely powerful set of document management tools.

Sure, it provides intelligent insights to recognize (for instance) by looking at your calendar, that you were just in a meeting, and have been creating documents. Would you like to share this document with the meeting participants? (and it pre-populates the email with their addresses).

Sure, it tracks who’s looking at the shared links you provide, and lets you track who is downloading your content and from where in the world (plotted on a map).

Yes, it has all that, but, man is it a sexy app to use! I just loaded it onto my phone, and I swear I think I can navigate my folder tree on my phone better than I can in Finder on the Mac.

What it does is provide a continual context for where you are in the folder tree by using the “cards” interface more commonly in use these days. Riffle the DeckAll parent folders are cards that can be viewed by “riffling the deck from above” or by “fanning the cards” left to right.

I love that when you “fan the cards” the parent card UI is active, in IMG_1387that you can scroll down and jump straight into another folder without first having to “select that card, and then start scrolling”.

A few screenshots don’t really do it justice, so check out this UI video.

Great job Syncplicity.  You got this one right!

How I installed OpenStack at a Service Provider in 20 mins

I’ve been playing around with my skillz in python, puppet, OpenStack, deployment methods, etc for a few weeks.  Here’s a small example of deploying OpenStack in a non-production environment (no hardening or customization of any kind) in 20 mins.  Maybe it’ll help you in your self education.

For testing purposes I use a service provider called Digital Ocean. I’m sure what they do is competitive to all things EMC, so this is not an endorsement. They do, however have VERY CHEAP costs if you’re just doing testing (on the order of < 1 penny per hour) with linux based services (again YMMV, I have no idea how good they are).

Now when I say 20 mins, that’s how long it takes for the procedures to run once you figure out what to do. I’ve spent a couple of days playing with the site, VM’s, python, devstack, etc. to get the procedure in place. That said, it took 20 mins of wall clock time to get OpenStack running from the devstack.org package on a single machine deployment (ie: not production ready).

At the end of this post is the python code to talk to the Digital Ocean API and deploy the instance. Since it’s not very sophisticated code, I just edit it directly to do what I want to do as opposed to passing command line arguments (list, deploy, destroy).

So, to get a VM I called ‘devstack’ I executed my little script with settings hard-coded:


The script returned the following output, parsed from the JSON response: devstack 1790811

Then I did the following to install OpenStack:

$ sudo echo " devstack" >> /etc/hosts
$ ssh root@devstack
# adduser stack
# echo "stack ALL=(ALL:ALL) ALL" >> /etc/sudoers
# apt-get install git
# su - stack
$ git clone https://github.com/openstack-dev/devstack.git
$ cd devstack && ./stack.sh
Output:lots of output from stack.sh... 
Horizon is now available at
Keystone is serving at
Examples on using novaclient command line is in exercise.sh
The default users are: admin and demo
This is your host ip:
stack.sh completed in 794 seconds.

Here is the python script I mentioned above, It’s not great, but maybe it’ll give you some idea how to talk JSON, and return results to an online API.


import requests, json, pprint, time, socket, sys

## make the json call to the public api
def jsonRequest(targetUrl):
#set the headers for how we want the response
headers = {'content-type': 'application/json','accept':'application/json'}

# make the actual request
r = requests.get(targetUrl, headers=headers, verify=False)

#take the raw response text and deserialize it into a python object.
responseObj = json.loads(r.text)
print "Exception"
print r.text
#print json.dumps(responseObj, sort_keys=False, indent=2)
return responseObj

## return list of instances
def getDroplets(apiKey, clientId):
#set the target url for the query
targetUrl = "https://api.digitalocean.com/v1/droplets/?client_id=%s&api_key=%s" % (clientId, apiKey)

# make the actual request
resultList = jsonRequest(targetUrl)
return resultList['droplets']

## Deploy a new instance
def deployDroplet(apiKey, clientId, hostName):
sizeId = '66' #smallest; 62 = 2GB Ram
imageId = '3240036' #ubuntu 64bit
regionId = '4' #NY region
sshKey1 = '153816' #brighs ssh key
sshKey2 = ''
privateNetworking = 'true' # create private network

#set the target url for the query
targetUrl = "https://api.digitalocean.com/v1/droplets/new?client_id=%s&api_key=%s&name=%s&size_id=%s&image_id=%s®ion_id=%s&ssh_key_ids=%s,%s&private_networking=%s" % (clientId, apiKey, hostName, sizeId, imageId, regionId, sshKey1, sshKey2, privateNetworking)

# make the actual request
responseObj = jsonRequest(targetUrl)
return responseObj['droplet']

## destroty an instance
def destroyDroplet(apiKey, clientId, dropletId):
#set the target url for the query
targetUrl = "https://api.digitalocean.com/v1/droplets/%s/destroy/?client_id=%s&api_key=%s" % (dropletId, clientId, apiKey)

# make the actual request
responseObj = jsonRequest(targetUrl)
return responseObj



droplets = getDroplets(apiKey, clientId)
for droplet in droplets:
#for key in droplet:
# print key
print "%s %s %s" % (droplet['ip_address'], droplet['name'], droplet['id'])
#print "destroying droplet %s" % (droplet['name'])
#destroyResult = destroyDroplet(apiKey, clientId, droplet['id'])
#print destroyResult['status']

#droplet = deployDroplet(apiKey, clientId, 'devstack')
#print "%s %s" % (droplet['name'], droplet['id'])
#deployDroplet(apiKey, clientId, 'tester4')
#deployDroplet(apiKey, clientId, 'tester5')
#destroyDroplet(apiKey, clientId, '1786240')