Do you really think brokerage of commodity clouds will be a thing?

Why do we think that one day we’ll broker usage of commodity clouds against each other for the lowest price per usage?

I’m in tune with the clouderatii.  I’ve read the books equating the current transformation of IT into a continuous delivery pipeline as analogous to the recent past transformation of manufacturing into just in time.  I’ve participated in the conferences.  I’ve read the pundits blogs and tweets.  I’ve listened to the podcasts.

I still have a question…

Is there really a precedent for this idea of brokering multiple commodity clouds, and deploying workloads anywhere you get the best price assuming everyone provides basically the same features?

Icloud-coupons it like an integrator sourcing the same component from multiple manufacturers?  Is it like market brokering for the best price electricity from the grid?  It’s like hedging fuel if you’re an airline?

Maybe, but I’m not sure those are truly analogous.  Is there an IT based precedent?

Is it like having a multi-vendor strategy for hardware, software, networking, or storage?  (Does that really work anyway?)

I don’t think so.

I’ve had conversations of late about the reality (or not!) of multi-vendor cloud, and the need for vendor management of multiple cloud offerings.  The idea seems to be that there will be a commoditization of cloud service providers, and through some market brokerage based solution, you’ll be able to deploy your next workload wherever you can get the best deal.

Keep in mind, I deal primarily with enterprise customers… not Silicon Valley customers.

There are still very few workloads that I see in the real world that fit this deploy-anywhere model.  If they exist, these apps are very thin links in the value chain limited by network complexity and proximity to data.  They typically exist to manage end-user interaction, pushing data to a central repository that doesn’t move.  Maybe there are some embarrassingly parallel workloads that can be farmed out, but how big of a problem is data gravity?  (the payload can’t be that big or the network costs eat you up)  How often do you farm out work to the cloud?  Is it so frequent that you need a brokerage tool, or will you simply negotiate for the best rate the next time you need it?

Multi-vendor IT management doesn’t come close to the dynamism suggested by such brokerage.  In my experience individual management ecosystems are developed by each vendor to differentiate and make themselves “sticky” to the consumer.  Some would say “lock-in,” but that’s not really fair if real value is gained.

Are all server vendors interchangeable such that you could have virtualization running on several vendor machines all clustered together?  Only if you want a support nightmare.  No one really does this at scale, do they?  They may have a few racks of red servers and a few racks of blue, but they’re not really interchangeable due to the way they are managed and monitored.

Linux is a common example of this commoditization in action… Are all Linux OS’s the same?  Can you just transition from RedHat to Ubuntu to SUSE on a whim?  Do you run them all in production and arbitrage between them?  You don’t, because even though they might be binary compatible Linux, each distribution is managed differently.  They have different package management toolchains.  They’ll all run KVM, true; but you’ll need to manage networking carefully.  What about security and patching?  Are they equally secure and on the same patch release cycle?

Shifts between vendors only really happen across multiple purchase cycles each with a 3-5 year term and with the cost of a lot of human effort.

Will the purchase cycle of cloud be 3-5 months of billing?  Yes, this I could imagine.  This would allow multiple cloud vendors to compete for business, and over 12 months or so a shop could transition from one cloud to another (depending on how frequently their workload instances are decommissioned and redeployed).  And yet, network complexity and data gravity imply extreme difficulty in making the switch between clouds if app instances are clustered or must refer to common data sets; or if the data sets themselves must transition from one cloud to another (again with the network complexity and data gravity).

The only way to really engage in the brokerage model is to have very thin apps whose deployment does not rely on differentiated features of the cloud providers.  You’ll have to create deployment artifacts that can run anywhere, and you’ll have to use network virtualization to allow them all to communicate back to the hub systems of record.  Then there’s the proximity to the data to be processed.  You’d better not be reliant on too much centralized data.

It’s too early to have all the answers, but I’m suggesting that the panacea of multi-cloud brokerage imagined by the pundits will never really materialize.  If the past is any guide to the future, the differentiation of the various systems won’t allow easy commoditization.  They’ll be managed differently, and it’ll be hard to move between them.  Any toolset that provides a common framework for management will reduce usage to the least common denominator functionality.  And nothing so far is really addressing network complexity or data gravity.

The issue is more complex than the pundits and the podcasts would have you believe.  I don’t know the answers, do you think you do?  I’d love to hear your opinions.

How I installed OpenStack at a Service Provider in 20 mins

I’ve been playing around with my skillz in python, puppet, OpenStack, deployment methods, etc for a few weeks.  Here’s a small example of deploying OpenStack in a non-production environment (no hardening or customization of any kind) in 20 mins.  Maybe it’ll help you in your self education.

For testing purposes I use a service provider called Digital Ocean. I’m sure what they do is competitive to all things EMC, so this is not an endorsement. They do, however have VERY CHEAP costs if you’re just doing testing (on the order of < 1 penny per hour) with linux based services (again YMMV, I have no idea how good they are).

Now when I say 20 mins, that’s how long it takes for the procedures to run once you figure out what to do. I’ve spent a couple of days playing with the site, VM’s, python, devstack, etc. to get the procedure in place. That said, it took 20 mins of wall clock time to get OpenStack running from the devstack.org package on a single machine deployment (ie: not production ready).

At the end of this post is the python code to talk to the Digital Ocean API and deploy the instance. Since it’s not very sophisticated code, I just edit it directly to do what I want to do as opposed to passing command line arguments (list, deploy, destroy).

So, to get a VM I called ‘devstack’ I executed my little script with settings hard-coded:

./go.py

The script returned the following output, parsed from the JSON response:

107.170.89.157 devstack 1790811

Then I did the following to install OpenStack:

$ sudo echo "107.170.89.157 devstack" >> /etc/hosts
$ ssh root@devstack
# adduser stack
# echo "stack ALL=(ALL:ALL) ALL" >> /etc/sudoers
# apt-get install git
# su - stack
$ git clone https://github.com/openstack-dev/devstack.git
$ cd devstack && ./stack.sh
Output:lots of output from stack.sh... 
Horizon is now available at http://107.170.89.157/
Keystone is serving at http://107.170.89.157:5000/v2.0/
Examples on using novaclient command line is in exercise.sh
The default users are: admin and demo
The password: NOTFORYOUREYESTOSEE
This is your host ip: 107.170.89.157
stack.sh completed in 794 seconds.

Here is the python script I mentioned above, It’s not great, but maybe it’ll give you some idea how to talk JSON, and return results to an online API.

#!/usr/bin/python

import requests, json, pprint, time, socket, sys

################
## make the json call to the public api
################
def jsonRequest(targetUrl):
#set the headers for how we want the response
headers = {'content-type': 'application/json','accept':'application/json'}

# make the actual request
r = requests.get(targetUrl, headers=headers, verify=False)

#take the raw response text and deserialize it into a python object.
try:
responseObj = json.loads(r.text)
except:
print "Exception"
print r.text
#print json.dumps(responseObj, sort_keys=False, indent=2)
return responseObj

################
## return list of instances
################
def getDroplets(apiKey, clientId):
#set the target url for the query
targetUrl = "https://api.digitalocean.com/v1/droplets/?client_id=%s&api_key=%s" % (clientId, apiKey)

# make the actual request
resultList = jsonRequest(targetUrl)
return resultList['droplets']

################
## Deploy a new instance
################
def deployDroplet(apiKey, clientId, hostName):
sizeId = '66' #smallest; 62 = 2GB Ram
imageId = '3240036' #ubuntu 64bit
regionId = '4' #NY region
sshKey1 = '153816' #brighs ssh key
sshKey2 = ''
privateNetworking = 'true' # create private network

#set the target url for the query
targetUrl = "https://api.digitalocean.com/v1/droplets/new?client_id=%s&api_key=%s&name=%s&size_id=%s&image_id=%s®ion_id=%s&ssh_key_ids=%s,%s&private_networking=%s" % (clientId, apiKey, hostName, sizeId, imageId, regionId, sshKey1, sshKey2, privateNetworking)

# make the actual request
responseObj = jsonRequest(targetUrl)
return responseObj['droplet']

################
## destroty an instance
################
def destroyDroplet(apiKey, clientId, dropletId):
#set the target url for the query
targetUrl = "https://api.digitalocean.com/v1/droplets/%s/destroy/?client_id=%s&api_key=%s" % (dropletId, clientId, apiKey)

# make the actual request
responseObj = jsonRequest(targetUrl)
return responseObj

######################################

apiKey = "GOGETYOUROWNAPIKEY"
clientId = "GOGETYOUROWNCLIENID"

droplets = getDroplets(apiKey, clientId)
for droplet in droplets:
#for key in droplet:
# print key
print "%s %s %s" % (droplet['ip_address'], droplet['name'], droplet['id'])
#print "destroying droplet %s" % (droplet['name'])
#destroyResult = destroyDroplet(apiKey, clientId, droplet['id'])
#print destroyResult['status']

#droplet = deployDroplet(apiKey, clientId, 'devstack')
#print "%s %s" % (droplet['name'], droplet['id'])
#deployDroplet(apiKey, clientId, 'tester4')
#deployDroplet(apiKey, clientId, 'tester5')
#destroyDroplet(apiKey, clientId, '1786240')

EMC ViPR Leverages 3rd Party OpenStack Cinder Plugins

EMC ViPR 2.0 was just announced today at EMC World.  It brings a distinct level of maturity to a product that was only just been brought to market, but is already the most advanced storage controller of heterogeneous arrays.  One of the most interesting aspects of the announcement was the leverage of Cinder to provide ViPR a much wider support capability for heterogeneous arrays!

ViPR LogoIf you haven’t heard of ViPR, it’s not “storage virtualization” in the sense that you’re familiar with.  It’s a control-plane product like Virtual Center rather than a virtualization product like the hypervisor ESXi.  ViPR discovers storage, SAN, and hosts and controls the provisioning and connectivity of those assets together.  It virtualizes (in a sense) heterogeneous storage “southbound” and provides the common higher function API “northbound” to any framework product that wants an easy way to communicate with the Software Defined Storage layer.

Enough similar products and management frameworks have been created over the years for the SNIA storage standards organization to create an API standard called SMIS.  EMC VMAX and VNX have SMIS providers for instance that ViPR can use to manage provisioning on these arrays.  Other vendors do not so easily provide or so fully support the SMIS standard, and as such ViPR can’t use a standards-based approach to managing them.  Thus the EMC ViPR team was left with creating plugins for each and every array to provide provisioning support–a daunting task indeed.

Enter the OpenStack community.  I’m sure as we were throwing our weight into the OpenStack community, some smart-gal-or-guy in our Advanced Software Division said, “Hang on a sec.  All these vendors are producing Cinder plugins for OpenStack.  Why can’t ViPR just leverage them?”  And where SMIS couldn’t deliver, a viable open source community has created a widely adopted API for OpenStack.

Let’s make sure you understand this.  We’ve had Cinder API support for “Northbound” into OpenStack.   This provides OpenStack the ability to provision block storage against ViPR-as-a-virtual-storage-array.  This new announcement leverages every other vendor’s Cinder support to provide ViPR the ability to communicate “Southbound” into the array and manage it with ViPR-as-a-SDS-controller.  This is a VERY different animal and instantly provides ViPR a wide range of heterogeneous storage management and automation capabilities.

Added to this common industry storage support, the ViPR team has now announced a plugin support (rich data services support) for Hitachi arrays, with more on the way.  Great stuff from the EMC ViPR team.

My first python and JSON code

I’m not a developer. There I said it.

I’m a presales technologist, architect, and consultant.

But I do have a BS in Computer Science. Way back there in my hind brain, there exists the ability to lay down some LOC.

I was inspired today by a peer of mine Sean Cummins (@scummins), another Principal within EMC’s global presales organization. He posted an internal writeup of connecting to the VMAX storage array REST API to pull statistics and configuration information. Did you know there is a REST API for a Symmetrix?! Sad to say most people don’t.

He got me hankering to try something, so I plunged into Python for the first time, and as my first example project, I attached to Google’s public geocoding API to map street addresses to Lat/Lng coordinates. (Since I don’t have a VMAX in the basement)

So here it is. I think it’s a pretty good first project to learn a few new concepts. I’ll figure out the best way to parse a JSON package eventually. Anyone have any advise?

###################
# Example usage
###################
$ ./geocode.py
Enter your address:  2850 Premiere Parkway, Duluth, GA                 
2850 Premiere Parkway, Duluth, GA 30097, USA is at
lat: 34.002958
lng: -84.092877

###################
# First attempt at parsing Google's rest api
###################
#!/usr/bin/python

import requests          # module to make html calls
import json          # module to parse JSON data

addr_str = raw_input("Enter your address:  ")

maps_url = "https://maps.googleapis.com/maps/api/geocode/json"
is_sensor = "false"      # do you have a GPS sensor?

payload = {'address': addr_str, 'sensor': is_sensor}

r = requests.get(maps_url,params=payload)

# store the json object output
maps_output = r.json()

# create a string in a human readable format of the JSON output for debugging
#maps_output_str = json.dumps(maps_output, sort_keys=True, indent=2)
#print(maps_output_str)

# once you know the format of the JSON dump, you can create some custom
# list + dictionary parsing logic to get at the data you need to process

# store the top level dictionary
results_list = maps_output['results']
result_status = maps_output['status']

formatted_address = results_list[0]['formatted_address']
result_geo_lat = results_list[0]['geometry']['location']['lat']
result_geo_lng = results_list[0]['geometry']['location']['lng']

print("%s is at\nlat: %f\nlng: %f" % (formatted_address, result_geo_lat, result_geo_lng))