<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Joe's Blog! &#187; Operations</title>
	<atom:link href="http://www.joeandmotorboat.com/category/operations/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.joeandmotorboat.com</link>
	<description></description>
	<lastBuildDate>Wed, 28 Jul 2010 03:00:34 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>SurgeCon 2010</title>
		<link>http://www.joeandmotorboat.com/2010/07/27/surgecon-2010/</link>
		<comments>http://www.joeandmotorboat.com/2010/07/27/surgecon-2010/#comments</comments>
		<pubDate>Wed, 28 Jul 2010 03:00:34 +0000</pubDate>
		<dc:creator>joe</dc:creator>
				<category><![CDATA[Dev]]></category>
		<category><![CDATA[Operations]]></category>

		<guid isPermaLink="false">http://www.joeandmotorboat.com/?p=1038</guid>
		<description><![CDATA[If you haven&#8217;t heard about Surge, it&#8217;s a new web operations conference presented by the smart folks at OmniTI. They have amassed a good list of speakers including guys like John Allspaw and Theo Schlossnagle. I also happen to have been invited to talk about the cloud, Cloudant and all sorts of good stuff.]]></description>
			<content:encoded><![CDATA[<p><img class="alignnone" title="surge" src="http://s.omniti.net/surge/i/present/logo-main.png" alt="" width="271" height="123" /></p>
<p>If you haven&#8217;t heard about <a href="http://omniti.com/surge/2010">Surge</a>, it&#8217;s a new web operations conference presented by the smart folks at OmniTI. They have amassed a good list of speakers including guys like John Allspaw and Theo Schlossnagle. I also happen to have been invited to talk about the cloud, <a href="https://cloudant.com/">Cloudant</a> and all sorts of good stuff. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.joeandmotorboat.com/2010/07/27/surgecon-2010/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Adding Health Checks to Deckard from Chef.</title>
		<link>http://www.joeandmotorboat.com/2010/07/19/adding-health-checks-to-deckard-from-chef/</link>
		<comments>http://www.joeandmotorboat.com/2010/07/19/adding-health-checks-to-deckard-from-chef/#comments</comments>
		<pubDate>Mon, 19 Jul 2010 20:52:09 +0000</pubDate>
		<dc:creator>joe</dc:creator>
				<category><![CDATA[Dev]]></category>
		<category><![CDATA[Erlang]]></category>
		<category><![CDATA[Operations]]></category>
		<category><![CDATA[Ruby]]></category>

		<guid isPermaLink="false">http://www.joeandmotorboat.com/?p=1029</guid>
		<description><![CDATA[Recently, we (at Cloudant) open sourced Deckard, a HTTP content check monitoring system based on CouchDB. One of the best bits about using Couch is that it gives you a ReST API and with Deckard it can be used to add new health checks. Doing a simple PUT adds new URLs to monitor. At Cloudant we [...]]]></description>
			<content:encoded><![CDATA[<p>Recently, we (at <a href="https://cloudant.com/">Cloudant</a>) <a href="http://www.joeandmotorboat.com/2010/06/04/just-opensourced-gaff-and-deckard/">open sourced Deckard</a>, a HTTP content check monitoring system based on CouchDB. One of the best bits about using Couch is that it gives you a ReST API and with Deckard it can be used to add new health checks. Doing a simple PUT adds new URLs to monitor. At <a href="https://cloudant.com/">Cloudant</a> we love <a href="http://www.opscode.com/">Chef</a> and use it for everything. Chef has things called resources and providers. <a href="http://wiki.opscode.com/display/chef/Resources">Resources</a> are abstractions that describe the state you want a machine to be in. <a href="http://wiki.opscode.com/display/chef/Providers">Providers</a> perform the actions described by a resource. A good example is using the <a href="http://wiki.opscode.com/display/chef/Resources#Resources-Package">package</a> resource on Centos uses yum while on Ubuntu it uses apt-get. The resource abstracts that away, letting the provider (and node) deal with the specifics on how to install the package. This makes your recipes nice and DRY, use the same code to install packages on all sorts of platforms. There are resources and providers for anything from installing packages to even one I wrote for executing Erlang code via erl_call. One resource that works well with Deckard is the <a href="http://wiki.opscode.com/display/chef/Resources#Resources-HTTPRequest">HTTP request resource</a>, using it makes it very easy to add health checks from your cookbooks. We use something like the following code to add checks to new nodes at Cloudant:</p>
<p><script src="http://gist.github.com/481962.js"> </script></p>
<p>This code will add the document describing the check to the monitor_content_check database and then create a file so we can use &#8220;not_if&#8221; and Chef won&#8217;t attempt to add the check twice. Pretty cool stuff and even more reason that everything should have an API. Even cooler than this example would be to use Chef Search to do the same thing but I&#8217;ll save that for another blog post.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.joeandmotorboat.com/2010/07/19/adding-health-checks-to-deckard-from-chef/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Availability, the Cloud and Everything</title>
		<link>http://www.joeandmotorboat.com/2010/05/31/availability-the-cloud-and-everything/</link>
		<comments>http://www.joeandmotorboat.com/2010/05/31/availability-the-cloud-and-everything/#comments</comments>
		<pubDate>Sat, 01 May 2010 17:37:42 +0000</pubDate>
		<dc:creator>joe</dc:creator>
				<category><![CDATA[Dev]]></category>
		<category><![CDATA[Operations]]></category>

		<guid isPermaLink="false">http://www.joeandmotorboat.com/?p=1016</guid>
		<description><![CDATA[Finally posted my presentation at Erlang Factory, WTIA Cloud SIG and Seattle Scalability Meetup here on the blog. Availability, the Cloud and Everything View more presentations from logicalstack.]]></description>
			<content:encoded><![CDATA[<p>Finally posted my presentation at <a href="http://erlang-factory.com/conference/SFBay2010">Erlang Factory</a>, <a href="http://www.washingtontechnology.org/">WTIA Cloud SIG</a> and <a href="http://www.meetup.com/Seattle-Hadoop-HBase-NoSQL-Meetup/">Seattle Scalability Meetup</a> here on the blog.</p>
<div style="width:425px" id="__ss_3567217"><strong style="display:block;margin:12px 0 4px"><a href="http://www.slideshare.net/logicalstack/availability-the-cloud-and-everything" title="Availability, the Cloud and Everything">Availability, the Cloud and Everything</a></strong><object id="__sse3567217" width="425" height="355"><param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=availability-100326174512-phpapp02&#038;stripped_title=availability-the-cloud-and-everything" /><param name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/><embed name="__sse3567217" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=availability-100326174512-phpapp02&#038;stripped_title=availability-the-cloud-and-everything" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="355"></embed></object>
<div style="padding:5px 0 12px">View more <a href="http://www.slideshare.net/">presentations</a> from <a href="http://www.slideshare.net/logicalstack">logicalstack</a>.</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.joeandmotorboat.com/2010/05/31/availability-the-cloud-and-everything/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Beyond BigData.</title>
		<link>http://www.joeandmotorboat.com/2010/05/31/beyond-bigdata/</link>
		<comments>http://www.joeandmotorboat.com/2010/05/31/beyond-bigdata/#comments</comments>
		<pubDate>Mon, 31 May 2010 16:54:23 +0000</pubDate>
		<dc:creator>joe</dc:creator>
				<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Dev]]></category>
		<category><![CDATA[Operations]]></category>

		<guid isPermaLink="false">http://www.joeandmotorboat.com/?p=979</guid>
		<description><![CDATA[BigData is a big deal. It&#8217;s changing how we look at data and analytics, but it isn&#8217;t the end. What are the enablers of BigData? First and foremost, cheap computing resources (CPU, disks, memory, bandwidth, etc) all thanks to Moore&#8217;s Law. Today even startups have the ability to afford huge amounts of computing power, the [...]]]></description>
			<content:encoded><![CDATA[<p>BigData is a big deal. It&#8217;s changing how we look at data and analytics, but it isn&#8217;t the end. What are the enablers of BigData? First and foremost, cheap computing resources (CPU, disks, memory, bandwidth, etc) all thanks to <a href="http://en.wikipedia.org/wiki/Moore's_law">Moore&#8217;s Law</a>. Today even startups have the ability to afford huge amounts of computing power, the likes previously only the big boys could afford. Additionally, this has given rise to commodity hardware and cloud computing, which only furthers the proliferation of large amounts cheap, quickly-provisioned, computing resources. Second, to apply all that power, we have open source data processing systems based on years of distributed systems research, like <a href="http://hadoop.apache.org/">Hadoop</a>, and many incarnations of <a href="http://en.wikipedia.org/wiki/Nosql">NoSQL</a>. The development of open source data processing sytems has allowed proliferation of systems that scale, which only the highly capitalized could afford, until recently. These two things alone have allowed for the democratization of BigData. A guy in a garage can process terabytes of data with little more than a credit card and elbow grease.</p>
<p>With all these tools and recently acquired computing power, where are we going? Of course we can expect datasets to continue to grow, and the computational complexity of our data processing to increase, as well as compute power to continue to rise (GPGPUs, multicore and so on). In addition, I anticipate the emergence of something I&#8217;m calling <em>NewData</em>. NewData will build on what we have currently with the BigData, but will include some trends just beginning to take off. First, the development of ubiquitous public APIs (<a href="http://stochasticresonance.wordpress.com/2009/04/01/meatcloud-manifesto/">Meatcloud Manifesto</a>). Public APIs have yet to proliferate to all online systems. As a consequence, there is still a lot of screen scraping going on. By having easily query-able and parse-able datasets available through ubiquitous APIs, consuming the internet with machines is easier making the application of BigData more powerful. <a href="http://developer.netflix.com/">Netflix</a> is a good example of this. Second and similarly enabling will be the development of standardized public datasets. Current datasets are generally hard to find and use, standardized dataset formats will enable BigData analysis to be more productive and not waste time munging. <a href="http://www.data.gov/">Data.gov</a> is a start. These two developments are yet to be fully realized in current systems but will allow for the rise of NewData. As these developments begin to roll out we will begin to see changes to how our BigData systems look. NewData systems will be less concerned with how big the data is and what it looks like, but will emphasize derivation of more information from the data. <a href="http://techcrunch.com/2010/03/16/big-data-freedom/">Bradford Cross gets this</a>, and as a result <a href="http://flightcaster.com/">FlightCaster</a> is an early example of what I mean by <em>NewData</em>.</p>
<blockquote><p>The scale of data and computations is an important issue, but the data age is less about the raw size of your data, and more about the cool stuff you can do with it.</p></blockquote>
<p>Asking the right questions of the data is important, especially if you&#8217;re trying to do cool stuff. The <a href="http://freakonomics.blogs.nytimes.com/">Freakonomics</a> guys proved this a few times over. NewData will be about creating value from data, and asking the right questions is worth as much as the answers. The key enablers of this will be using new found APIs and datasets to combine data from disperate sources in ways that BigData couldn&#8217;t. Asking questions that we wouldn&#8217;t have thought to ask of BigData. Where BigData was about a handful of datasets at most, NewData will be about dozens of datasets. The mashup is the cornerstone of NewData.</p>
<p>That being said, we will need new systems to process this data and enable us to ask these questions. NewData analysis will need inter-process communication and collaboration. Currently, systems like Hadoop process data by splitting the data up and processing chunks in parallel on hundreds to thousands of machines. Processes are isolated from the other processes. This will continue, but NewData will require more from these systems to ask deeper questions. Complex inter-process communication will be needed to ask these questions. Think of the simplicity of writing Map/Reduce jobs, the robustness of Hadoop, the workflow and dataflow of <a href="http://www.cascading.org/">Cascading</a> and <a href="http://research.microsoft.com/en-us/projects/dryadlinq/">DryadLINQ</a>, respectively, and the power of a message passing system like <a href="http://en.wikipedia.org/wiki/Message_Passing_Interface">MPI</a>. These jobs will likely include large in-memory collaborative computations across thousands of machines. Where data locality was key in BigData, both data and memory-locality (<a href="http://en.wikipedia.org/wiki/Non-Uniform_Memory_Access">NUMA/ccNUMA</a>) will be important in NewData.</p>
<p>It is clear that BigData still has some runway before NewData takes over. However, if the trends in the democratization of compute and processing continue (beyond Hadoop and EC2), and the opening of APIs and datasets proliferate online and off, NewData and it&#8217;s new questions, mashups, and systems are inevitable. Where having readily available compute resources and the software to use it defined BigData, NewData will be defined solely by asking the right questions, the algorithms to derive answers, and the systems used to produce them.</p>
<p><em>Thanks to <a href="http://twitter.com/mlmilleratmit">Mike Miller</a>, <a href="http://twitter.com/lusciouspear">Bradford Stephens</a> and my awesome wife <a href="http://twitter.com/xprimerw">Erin</a> for the help on this article.</em></p>
<p><strong><em>Follow me on <a href="http://twitter.com/williamsjoe">twitter</a>.<br />
</em></strong></p>
]]></content:encoded>
			<wfw:commentRss>http://www.joeandmotorboat.com/2010/05/31/beyond-bigdata/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Biodynamic Agriculture Applied to Datacenters.</title>
		<link>http://www.joeandmotorboat.com/2009/12/15/biodynamic-agriculture-applied-to-datacenters/</link>
		<comments>http://www.joeandmotorboat.com/2009/12/15/biodynamic-agriculture-applied-to-datacenters/#comments</comments>
		<pubDate>Wed, 16 Dec 2009 06:51:36 +0000</pubDate>
		<dc:creator>joe</dc:creator>
				<category><![CDATA[Operations]]></category>

		<guid isPermaLink="false">http://www.joeandmotorboat.com/?p=952</guid>
		<description><![CDATA[While listening to the Green HPC podcast I had the thought that biodynamic agriculture could be applied to managing datacenters. Now I might be off my rocker but I think it might be a worthwhile way to think about it, hopefully without getting too hippy-ish. From wikipedia: Biodynamic agriculture is a method of organic farming [...]]]></description>
			<content:encoded><![CDATA[<p>While listening to the <a href="http://insidehpc.com/2009/12/15/episode-5-of-the-green-hpc-podcast-series-turning-up-the-heat/">Green HPC podcast</a> I had the thought that <a href="http://en.wikipedia.org/wiki/Biodynamic_agriculture">biodynamic agriculture</a> could be applied to managing datacenters. Now I might be off my rocker but I think it might be a worthwhile way to think about it, hopefully without getting too hippy-ish.</p>
<p>From wikipedia:</p>
<blockquote><p>Biodynamic agriculture is a method of organic farming with homeopathic composts that treats farms as unified and individual organisms, emphasizing balancing the holistic development and interrelationship of the soil, plants, animals as a self-nourishing system without external inputs insofar as this is possible given the loss of nutrients due to the export of food.</p></blockquote>
<p>To me this totally has an analog in datacenters, server farms (pun intended) and machine rooms. To paraphrase the above wikipedia quote:</p>
<blockquote><p>An <em><strong>electrodynamic</strong></em> datacenter is one that is treated as a unified and individual organism. That is each datacenter is an autonomous entity and needs to be thought about as an organism where all the components (CRACs, servers, network, power, etc) are balanced and interrelated without external inputs insofar as this is possible given the loss of capacity (bandwidth, compute, storage, etc) due to export of data, compute or another resource.</p></blockquote>
<p>Putting it like that seems pretty reasonable and would seem to lean toward making datacenters as efficient as possible. The goal being reducing external inputs (power, bandwidth and etc) while still getting the desired amount of output. Practices such as running datacenters hot, data locality optimization or shutting down part (or all) of a datacenter while not needed would be common place. This would require tight monitoring, analysis, controls and automation on inputs and outputs. This also means developing a quantitative relationship between consumption/utilization and production, ie how much input is required for X amount of output. Certainly an interesting problem to solve and system to build although I imagine some level of this has been implemented by the Googles of the world. While datacenters will likely never be self-sustaining in the end this may be a reasonable way to think about datacenter controls and management especially as we all try to go green for monetary and environmental reasons.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.joeandmotorboat.com/2009/12/15/biodynamic-agriculture-applied-to-datacenters/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
