Thursday May 21, 2009
I'm not generally passionate about network and system operations, I prefer to focus my attention and creativity on system and software architectures. However, infrastructure provisioning, application deployment, monitoring and maintenance are facts of life for online services. When those basic functions aren't functioning well, then I get passionate about them. When service continuity is impacted and operations staff are overworked, it really bothers me; it tells me that I or other developers I'm working with are doing a poor job of delivering resilient software. I've had many conversations with folks who've accepted as a given that development teams and operations teams have friction between them; some even suggest that they should. After all, so goes that line of thinking, the developers are graded on how rapidly they implement features and fix bugs whereas the operators are graded on service availability and performance. Well, you can sell that all you want but I won't buy it.
In my view, developers need to deliver software that can be operated smoothly and operators need to provide feedback on how smoothly the software is operating; dev and ops must collaborate. I accept as a given that developers
So I've been giving Chef a test-drive for this infrastructure-on-EC2 management project that's been cooking. The system implemented the following use cases:
There's a lot of energy in the Chef community (check out Casserole), combined with monitoring, log management and cloud technologies, I think there's a lot of IT streamlining ahead. Perhaps the old days of labor and communication intensive operations will give way to a new era of autonomic computing. I'll post further about some of the mechanics of working with ruby, rails, chef, EC2, chef-deploy and other tools in the weeks ahead (particularly now that EC2 has native load balancing, monitoring and auto-scaling capabilities). I'll also talk a bit about this stuff at a Velocity BoF. If you're thinking about attending Velocity, O'Reilly is offering 30% off to the first 30 people to register today with the code vel09d30 today (no I'm not getting any kinduva kickback from O'Reilly). And you can catch Infrastructure in the Cloud Era with Adam Jacob (Opscode), Ezra Zygmuntowicz (EngineYard) to learn more about Chef and cloud management.
chef puppet cfengine ec2 aws ruby cloud computing velocity
( May 21 2009, 12:30:07 PM PDT ) Permalink
Tuesday March 31, 2009
I bought a copy of VMware Fusion on special from Smith Micro (icing on the cake: they had a 40% off special that week) specifically so I could simulate a network of machines on my local MacBook Pro. While I've heard good things about Virtual Box, one of the other key capabilities I was looking for from MacIntosh virtualization software was the ability convert an existing Windows installation to a virtual machine. VMware reportedly has the best tools for that kind of thing. I have an aging Dell with an old XP that I'd like to preserve when I finally decide to get rid of the hardware; when it's time to Macify, I'll be good to go.
I started building my virtual network very simply, by creating a CentOS VM. Once I had my first VM running, I figured I could just grow the network from there; I was expecting to find a "clone" item in the Fusion menus but alas, no joy. So, it's time to hack. Looking around at the artifacts that Fusion created, a bunch of files in a directory named for the VM, I started off by copying the directory, the files it contained that had the virtual machine name as components of the file name and edited the metadata files ({vm name}.vmdk/.vmx/.vmxf). Telling Fusion to launch that machine, it prompted if this was a copy or a moved VM - I told it that it was copied and the launch continued. Both launched VM's could ping each other so voila: my virtual network came into existence.
I've since found another procedure to create "linked clones" in VMware Fusion. It looks like this will be really useful for my next scenario of having two different flavors of VM's running on my virtual network. The setup I want to get to is one where I can have "manager" host (to run provisioning, monitoring and other management applications) and cookie-cutter "worker" hosts (webservers, databases, etc). Ultimately, this setup will help me tool up for cloud platform operations; I have more Evil Plans there.
So all of this has me wondering: why doesn't VMware support this natively? Where's that menu option I was looking for? Is there an alternative to this hackery that I just overlooked?
vmware virtualization centos vmware fusion vm cloning
( Mar 31 2009, 09:12:22 AM PDT ) Permalink
Thursday March 12, 2009
Those 25 things you should know about me memes circulating rarely interest me (honestly, I don't care that you have a collection of rare El Salvadoran currency). However, one thing that my friends know but regular readers may not is that I have a fairly eclectic background. Did you know that I used to hang around the art department's hot glass studio in college to blow glass? Did you know that I learned to program in Pascal when I was in college and hated it? Yea, yea, I don't care much anymore either. But anyway, back in the 80's I was friends with this Danish dude from LA who shared my interest in the underground heavy metal scene that was burgeoning, particularly in Britain ("New Wave of British Heavy Metal" AKA NWOBHM) and Europe. We used to trade records and demos (the first Def Leppard 3 song EP on 9" vinyl, I was tired of it so I traded him for a bunch of Tygers of Pan Tang and other crap I didn't own already). I think he, like myself, used to pick up copies of Melody Maker and Sounds at the local record store to read about what was going on overseas. Eventually, Kerrang! came out providing fuller coverage of the metal scene, complete with glossy pictures. But in the meantime back in San Francisco, I helped a friend of mine (Ron Quintana) operate his fanzine Metal Mania (don't be confused, the name was re-appropriated by various larger publishing concerns at different times in the years since but none of them had any relationship to the original gangstas).
Back in the day, Howie Klein was a muckety muck in the music industry, haunting the local clubs like The Old Waldorf and Mabuhay Gardens. Howie hooked us up with a show on KUSF. I dubbed the show Rampage Radio, it ran in the wee hours every Saturday night (right after Big Rick Stuart finished up his late night reggae show with those dudes from Green Apple Records on Clement Street). In between hurling insults at "album oriented rock" and big-hair metal bands (posers!), we played a lot of stuff you couldn't hear anywhere else. Among the many obscure noises we aired were demos from East Bay metalheads Exodus. Amazingly, Rampage Radio is still on the air. Well, that Danish kid and one of the guys I befriended from Exodus were Lars Ulrich and Kirk Hammett, respectively. In short order, they would be playing together in a band Lars named Metallica (after haggling with Ron about not taking that name for the 'zine).
I eventually lost interest in the metal scene (not enough innovation, too much sound-alike derivatives to keep me listening); even though the music from then is still on my playlist, my repertoire has broadened widely (talk to me about gypsy style string jazz, please). I've been peripherally in touch with friends from back then. Over the years, I'd go to a few Metallica shows but the guys are always mobbed at the backstage parties, there's not much of an opportunity to actually talk about anything. Anyway, we have little in common now. I develop software and crazy assed online services; they tour the world to perform in front of throngs. And I don't drink Jaegermeister anymore. In 2000, I introduced one of the friends I've stayed in touch with, Brian Lew, who also had a fanzine Back In The Day, to editors at salon.com (where I was working at the time). He contributed a great article expressing a sentiment that I shared, dismay at Metallica's war on Napster. I don't think I've actually talked to Lars in 15 years. After seeing news coverage of him ranting about how people (his most valued asset: his fans) where ripping him off, I'm not sure I wanted to. But I think we're all over that now, let's just play Rock Band and fuhgedaboutit.
So here we are decades later and Metallica hasn't just warped the music industry, they are the industry. They're up there with Elvis and the Beatles and all of that (except, barring Cliff Burton, they're not dead). Last week, Brian pings me that Q-Prime (Metallica's management company in New York) is trying to reach me. After a few phone calls, it turns out that Metallica is honoring a handful of us old-schoolers by inviting us to a big shindig in Cleveland for their Rock and Roll Hall of Fame induction next month. How cool is that?! I'm still kind blown away that this is really happening (am I being punkd??).
So, I may be leaving Technorati but I'm going to the Rock and Roll Hall of Fame! w00t! That tune keeps humming through my conscience, "...living in sin with a safety pin, Cleveland rocks! Cleveland rocks!" but the way it sounds in my head, it's ganked up, roaring from a massive PA and a wall of Marshall stacks. So now you know what my plans will be in a few weeks and now you've learned a dozen or so things about me (if not 25) that you may not have known before.
metallica rock and roll hall of fame metal kusf cleveland nwobhm
( Mar 12 2009, 12:22:51 PM PDT ) Permalink
Tuesday March 10, 2009
My post last week focused on some of the technology changes that I've been spearheading at Technorati but this time, I have a personal change to discuss. When I joined Technorati in 2004, the old world of the web was in shambles. The 1990's banner-ads-on-a-CPM-basis businesses were collapsed. The editorial teams using big workflow-oriented content management system (CMS) infrastructure (which I worked on in the 90's) were increasingly eclipsed by the ecosystem of blogs. Web 2.0 wasn't yet the word on everyone's lips. But five years ago, Dave Sifry's infectious vision for providing "connective tissue" for the blog ecosystem, tapping the attention signals and creating an emergent distributed meta-CMS helped put it there. Being of service to bloggers just sounded too good, so I jumped aboard.
Through many iterations of blogospheric expansion, building data flow, search and discovery applications, dealing with data center outages (and migrations) and other adventures, it's been a long strange trip. I've made a lot of fantastic friends, contributed a lot of insight and determination and learned a great deal along the way. I am incredibly proud of what we've built over the last five years. However today it's time for me to move on, my last day at Technorati will be next week.
Technorati has a lot of great people, technology and possibilities. The aforementioned crawler rollout provides the technology platform with a better foundation that I'm sure Dorion and the rest of the team will build great things on. The ad platform will create an abundance of valuable opportunities for bloggers and other social media. I know from past experiences what a successful media business looks like and under Richard Jalichandra's leadership, I see all of the right things happening. The ad platform will leverage Technorati's social media data assets with the publisher and advertiser tools that will make Technorati an ad delivery powerhouse. I'm going to remain a friend of the company's and do what I can to help its continued success, but I will be doing so from elsewhere.
I want to take a moment to thank all of my colleagues, past and present, who have worked with me to get Technorati this far. The brainstorms, the hard work, the arguments and the epiphanies have been tremendously valuable to me. Thank You!
I'm not sure what's next for me. I feel strongly that the changes afoot in cloud infrastructure, open source data analytics, real time data stream technologies, location based services (specifically, GPS ubiquity) and improved mobile devices are going to build on Web 2.0. These social and technology shifts will provide primordial goo out of which new innovations will spring. And I intend to build some of them, so brace yourself for Web 3.0. It's times like these when the economy is athrash that the best opportunities emerge and running for cover isn't my style. The next few years will see incumbent players in inefficient markets crumble and more powerful paradigms take their place. I'm bringing my hammer.
( Mar 10 2009, 02:06:20 PM PDT ) Permalink
Wednesday March 04, 2009
Since its inception just 6 weeks ago, the White House Blog has had a tremendous center of gravity. I noted the volume of links coming in to the White House Blog after the first week. This is an existential moment for the blogosphere because today the White House Blog has 3216 links from 2323 blogs. And so it's official: the White House Blog has reached the Technorati Top 100.
I find myself reflecting on what the top 100 looked like four years ago, after the prior presidential inauguration, and what it looks like today; the blogosphere is a very different place. Further down memory lane, who recalls when Dave Winer and Instapundit were among the top blogs? Yep, most of the small publishers have been displaced by those with big businesses behind them. Well, at least BoingBoing endures but Huffpo and Gizmo better watch out, here comes Prezbo.
technorati white house inauguration blog
( Mar 04 2009, 10:59:16 PM PST ) Permalink
A lot of changes are afoot at Technorati. Over the last year or so, we've been looking inward at the infrastructure and asking ourselves, "How can we do this better?". The data spigot that Technorati builds on was the first thing to focus on, it's a critical part in one leg of the back-end infrastructure tripod. The tripod consists of data acquisition, search and analytics Technorati; while the ping handling and queuing are relatively simple affairs the crawler is the most sophisticated of the data acquisition subsystems. It's proper functioning is critical to the functioning of the other legs; when it doesn't function well, search and analytics don't either (GIGO="garbage in/garbage out").
As Dorion mentioned recently, we're retiring the old crawler. Why are we giving the old crawler getting an engraved watch and showing it to the door? Well, old age is one reason. The original spider is a technology that dates back to 2003, the blogosphere has changed a lot since then and we have a much better developed understanding of the requirements. The original spider code has presented a sufficient number of GIGO-related and code maintenance challenges to warrant a complete re-thinking. It contrasts starkly with the replacement.
Another change that we've made is to the legacy assumption that everything that pings is a blog. That assumption proved to be increasingly untenable as the ping meme spread amongst those who didn't really understand the difference between some random page and a blog, nefarious publishers (spammers) and other perpetrators of spings. Over 90% of the pings hitting Technorati are rejected outright because they've been identified as invalid pings. A large portion of the remainder are later determined to be invalid but we now have a rigorous system in place for filtering out the noise. We've reduced the spam level considerably (as mentioned in a prior post). For instance, there's a whole genre of splogs that are pornography focused (hardcore pictures, paid affiliate links, etc) that previously plagued our data; now we've eliminated a lot of that nonsense from the index.
Here are a pair of charts showing the daily occurrence of a particular porn term in the index.
As you can see, that's an order of magnitude reduction; 90% of the occurrences of that term was spam.
So what's next for the crawler? We've got some stragglers on the old spider, we're going to migrate them over in the next few days. There are still a lot of issues to shake out, as with any new software (for instance, there are still some error recovery scenarios to deal with). But it's getting better all of the time (love that song). We'll be rolling out new tools internally for identifying where improvements are needed, ultimately we'd like to enable bloggers to help themselves to publish, get crawled, be found and recognized more effectively. And there are more changes afoot, stay tuned.
technorati web crawling software spam splogs
( Mar 04 2009, 08:31:16 PM PST ) Permalink
Friday February 27, 2009
Like many software developers, I confess that I have found myself on occasion coming back to old code that I wrote asking, "What was I thinking?" Where it's my code, this WWIT question doesn't happen very often these days but in general I can't count the number of times I've encountered and had to work on code that was not written to be read. I think sometimes programmers write terse code and regard its brevity as a badge of honor, "If I'm wizardly enough to write this, then only True Wizards will read it." Or maybe it's just laziness or hurriedness, these code mysteries are akin to omitting comments, API documentation and other communication artifacts. When I see non-descriptive variable names, gratuitous indirection, excessive right indenting, monkey patching or unnecessary cyclomatic complexity, it's almost anti-social behavior; it's a communication fail more than a functional one. Likewise, gratuitous verbosity stifles communication in the opposite manner; this isn't pre-school - grasp of the ABC's is assumed. So spelling out what code is doing in this belabored fashion is just silly:
# an array to collect permalinks
permalinks = []
# loop over the feed entries
for entry in feed.entries:
permalinks.append(entry.link)
Whereas this is clear
# extract an array of permalinks from the feed entries permalinks = [ entry.link for entry in feed.entries ]OK, I'm assuming the reader knows what a Python list comprehension does. The first one is using a lot of vertical space to satisfy a very simple intent. I often find the opposite problem, excessive brevity, is authored by those enamored with their language's idioms. Software written with scripting languages often exhibit this; Perl is famous for expressiveness (I say this lovingly as a repentant x-Perl Wizard) but even the languages with adherents claiming their tongue is more "readable" have those same users donning wizard hats, trying to be clever. Ever tried to maintain Python code riddled with nested list comprehensions containing lambdas? Ruby, similar idiomatic norms abound, 'nuf said.
I've appreciated celebrations of wizardry (see A folding language) but there's more to wizardry than meta-programming and brevity. Coding like a wizard doesn't mean being so clever that only other wizards can collaborate. In my view, a true wizard has the wisdom to steer clear of verbose indulgences and terse spells; the wizard walks the middle path of clarity so that the code is not dumbed-down but the apprentice will grasp the intent. The wizard's code should read as poetry.
When code is unsocial (or anti-social), the quality suffers. Complex software needs a gene pool - lots of eyeballs, lots of variant perspectives and experiences. A small gene pool leads to in-bred ideas. Thus code from a lone wolf (even a kick-ass wolf) will usually be of lower quality than code developed by a plurality (unless its a plurality of novices, then all bets are off).
My plea to fellow crafters of bits: please code for clarity. Don't be so brief that your intentions are unclear. And don't be so garrulous that your intention is lost in the verbiage. Again, I'm not claiming innocence of these sins of code. But over the years I've become considerably more aware of the costs and benefits in the choices between brevity and verbosity. Perhaps clarity is in the eye of beholder or perhaps more narrowly, in the eye of the author. But I try to look at my own code objectively and ask, "If I don't see this code for six months and then come back to it to do some maintenance, will today's intent be clear?" I hope the code I write will be approachable by those who come behind me to work on it, especially if it's me lest I ask myself the WWIT question.
Sigh, I'm venting because I just got side tracked refactoring some program code (and its single test) that lacked clarity. Thanks for indulging me this far. I'm gonna go listen to some old Social Distortion now, have a great weekend!
programming wizardry perl python ruby
( Feb 27 2009, 06:26:15 PM PST ) Permalink
Tuesday January 27, 2009
As mentioned last week on the Technorati blog, Technorati was crawling the new White House blog within a day of its launch. Most of the blogosphere doesn't require individual customization in our crawling framework but in some special cases, it must be done. The White House blog is a prime example of why this is so but I'm pleased to report that Technorati's new crawling technology makes what was impossible with our old crawler easy with the new one.
Given the volume of moderation that'd be required, it doesn't surprise me that the posts don't take comments. But there are other more basic blogging practices of concern:
rel="alternate" elements in the head section of the HTML document.So how has the blog done in the last week? Well, it emerged in the top 1000 just five days after its inception. Keep an eye on the blog's Technorati blog info page - it's currently ranked 882 (1,876 links from 1440 blogs). Many of the links are to main blog page, citing its existence. But the majority of the links were to the kick-off post (date line reads: "Tuesday, January 20th, 2009 at 12:01 pm") and the inaugural address. Here are the posts and their link counts:
It's really encouraging to see the White House, specifically Macon Phillips, posting updates about President Obama's policies and political activities. Going forward, I hope to see more consistent publishing practices. For instance, should slug words be separated by underscores or hyphens? Should the proper names in URL slugs be mixed case (e.g. Timothy_Geithner_sworn_in) or down cased (e.g. now-comes-lilly-ledbetter)? It's less important which one but just pick one and stick to it! Also, the White House Blog should use durable URLs for permalinks: there's a posting with the path /blog_post/PressBriefingHighlights/, another with /president-obama-delivers-your-weekly-address - what are they going to use for the next post with highlights from a press briefing or next week's weekly address? If Mr. Phillips needs any further advise on effective blog publishing technology, I'm easy to find and happy to be of service.
whitehouse white house blog blogging blog+technology obama technorati inbloguration
( Jan 27 2009, 12:41:39 PM PST ) Permalink
Thursday December 25, 2008
Just the other day, Data Center Knowledge asked Are Colocation Prices Heading Higher? My immediate reaction was, that's a stupid question: last time VC funding went into hibernation, data center space was suddenly cheap and abundant. The article suggested that companies operating their own data centers will run to the colos as a cost cutting measure. Maybe, but I'm not so sure. Data center migrations can be expensive, risky operations. Methinks that the F500's inclined to undertake a migration would have done so already. The article cited a report emphasizing a shift from capital expenses to operating expenses.
Tier 1 says demand for data center space grew 14 percent over the past 12 months, while supply grew by just 6 percent, "exacerbating an already lopsided supply/demand curve."However, Tier 1 attributed the demand, "especially, (to) the primacy of the Internet as a vehicle for service and application delivery." With the litany of Techcrunch deadpool reports, I'm finding it difficult to believe that the data center space supply/demand will continue skewing.
Sure, it's not all bad news. Fred Wilson reports that Union Square Ventures will be Investing In Thick and Thin. Acknowledging that, "it is easier to invest in thin times. The difficult business climate starts to separate the wheat from the chaff and the strong companies are revealed." Wilson goes on to say
I don't feel that its possible, or wise, or prudent to attempt to time these (venture investment) cycles.Yes, the economy is gyrating in pain, but the four horsemen aren't galloping nearby. So take a pill, catch your breath and deal with it: the sun will come out, just don't bother trying to time it too carefully.
Our approach is to manage a modest amount of capital (in our case less than $300 million across two active funds) and deploy it at roughly $40 million per year, year in and year out no matter what part of the cycle we are in.
That way we'll be putting out money at the top of the market but also at the bottom of the market and also on the way up and the way down. The valuations we pay will average themselves out and this averaging allows us to invest in the underlying value creation process and not in the market per se.
Now, there's no shortage of reasons for gloom and doom: mega-ponzi schemes collapsing, banks and real estate combusting, the big 3 in various states of failure, yet BMW North America will raise list prices 0.7%. Before the complete credit breakdown, real estate volume was actually rising in a lot of places (ergo: prices were aligning supply and demand). I was at a William-Sonoma store in Albuquerque the other day, the place was mobbed. My point is that while the economy is retrenching (or the country is rebooting), the detritus will be separated (Wilson's wheat from chaff) and data center space should be cheap and abundant. Everything seems fine to me. At least until the next bubble.
For those of you observing that sort of thing, Merry Christmas!
colocation data centers economy
( Dec 25 2008, 01:20:14 PM PST ) Permalink
Tuesday December 23, 2008
This post to one of the Hadoop mailing lists caught my eye, Announcing CloudBase-1.1 release. Wait, wasn't Cloudbase the embedded database company that IBM acquired several years back but ended up donating the product to the Apache Software Foundation as Derby? No, not that Cloudbase. This is apparently another project that aims to provide data warehousing on top of Hadoop.
I've been watching the emergence of HBase, Hypertable and most recently the proposed incubation of Facebook's Cassandra with great interest. The first two are modeled from Google's BigTable but all are essentially horizontally scalable column oriented databases. The developers of these systems explicitly steer away having their technologies pegged as relational databases, with the refrain: "We don't do joins." What the CloudBase project aims to do is not model themselves on BigTable but to explicitly support joins between tables built on top of an HDFS cluster. It looks like they've posted extensive documentation and have released a JDBC driver, pretty cool! This is the most interesting database initiative I've seen since GreenPlum announced their support for mapreduce.
Yes, as far as scale-out data analytics, we live in interesting times.
mapreduce hadoop hbase hypertable jdbc cloudbase bigtable derby greenplum
( Dec 23 2008, 04:02:21 PM PST ) Permalink
Sunday December 21, 2008
I have some code I want to noodle on outside of work. Since I'm on a holiday break, I'm doing a bit of that (yes, this is what I do for fun, so?). In the past, I had used my own private CVS server for those kinds of things but these days, I could just as well live without CVS. I decided to roll a subversion server into my Apache build (the latest Apache + other modules aren't in the yum repositories for my distro, so I roll my own). While I'm putting a subversion server up, why not trac, too? Heh, that's where things got stuck.
When I installed the subversion dependencies (specifically, neon), I just used vanilla build params. After installing subversion, I was surprised that Trac couldn't access it. It turns out that the litmus test was this:
$ python Python ... >>> from svn import core...it failed miserably. Various recompile efforts seemed to move the problem around. I saw a variety of the symptoms described in the Trac-Subversion integration docs troubleshooting section. The missing
gss_delete_sec_context symbol error was apparently the telltale critical one, it originated from neon having been compiled without SSL support. The neon compile config that led to success was
./configure --enable-shared --enable-static --with-ssl=opensslThen the real key was to completely start over with the subversion compile, not just the swig python bindings.
make clean ./configure \ --with-berkeley-db=/usr/local/BerkeleyDB.4.7 \ --prefix=/usr/local \ --with-apxs=/usr/local/httpd2.2.11/bin/apxs \ --with-apr=/usr/local/apr \ --with-apr-util=/usr/local/apr make make swig-py make check-swig-py make install make install-swig-py ldconfigOnly then did the litmus test above pass. One of the things about this setup that is kind of a nuisance is that the python bindings didn't get installed into
site-packages, therefore mod_python was quite unhappy. Also, trac seemed to want to put its eggs in the root directory. So the Apache server's envvars script has these variables exported to work around those issues
PYTHONPATH=/usr/local/lib/svn-python LD_LIBRARY_PATH=/usr/local/lib/svn-python/libsvn PYTHON_EGG_CACHE=/data1/egg_cacheThe result (including the requisite
httpd.conf tweaks) is a working subversion 1.54 and trac 0.11 setup. It was more fiddling for the evening than I'd hoped for and I'm not sure my foibles and remedies were optimal (clearly, I missed an RTFM somewhere) but I hope this resolution helps at least one reader.
Happy Hannukah and winter solstice!
apache mod_python trac neon subversion swig python
( Dec 21 2008, 02:55:24 PM PST ) Permalink
Monday December 15, 2008
Will Chuck-The-Shoe-At-The-World-Leader be an Olympic sport in the years ahead? Since finishing with dinner this evening, I've found no less than three flash games and a compendium of animated GIF satires.
Sunday December 14, 2008
The Wall Street Journal reported today that Google Wants Its Own Fast Track on the Web, describing it as an example of the decline of support for net neutrality amongst the The Powers That Be (the usual suspects: Google, Yahoo, Microsoft, Amazon). Plenty of deals have been getting struck anyway between TPTB and data carriers (most prominently AT&T + Yahoo DSL) but outright transit preference doesn't seem to be an issue here. What Google appears to be getting into, called OpenEdge, sounds like an arrangement that amounts to co-locating their gear in the major carrier's datacenters. This would move serving capacity closer to the end-users of their services and thereby accelerate the user experience. Since it doesn't concern transit per se, this actually doesn't sound like a net neutrality issue at all, it sounds like another form of datacenter dispersion.
So what exactly is the big deal? All of the TPTB and loads of other online services have content delivery network (CDN) deals. Yahoo, Amazon, Facebook... they all operate or partner with a CDN in some shape or form (full disclosure: I've been working on a CDN evaluation for Technorati). With a CDN, publishers pay specifically to have their content cached at points-of-presence (PoP) around the intertubes that, through some DNS and routing magic, enables web content to get to end-users more quickly. The next step beyond a CDN is to put equipment in the carrier's datacenter. Here's what WSJ said
Google's proposed arrangement with network providers, internally called OpenEdge, would place Google servers directly within the network of the service providers, according to documents reviewed by the Journal. The setup would accelerate Google's service for users. Google has asked the providers it has approached not to talk about the idea, according to people familiar with the plans.It seems perfectly logical, actually.
Asked about OpenEdge, Google said only that other companies such as Yahoo and Microsoft could strike similar deals if they desired. But Google's move, if successful, would give it an advantage available to very few.
Nonetheless, I am concerned about wavering support for net neutrality. Lawrence Lessig, fresh off of his Big News post concerning setting up shop at Harvard Law School, is quoted as saying
There are good reasons to be able to prioritize traffic. If everyone had to pay the same rates for postal service, than you wouldn't be able to differentiate between sending a greeting card to your grandma versus sending an overnight letter to your lawyer.But the counter argument says that there's a big difference. Grandma isn't trying to compete with your attorney (at least, not usually). If the big guys are paying more to be faster, who will be able afford to challenge them? The intertubularly rich will get richer, the poor will be stay poor. The TPTB will ensconce themselves as dynastic media walking on paths paved with gold while all of us commoners walk in the gutter.
The dumb pipes should stay dumb. If an internet service wants to operate out of multiple datacenters, lease dedicated pipes to accelerate their inter-datacenter data distribution and peer with the carrier's PoPs proximate to their datacenters, mazel tov. This can be augmented with CDNs. It can even be taken to the next step by directly installing the carrier's datacenters. But at the network exchanges and pipes connecting them, everyone's packets should remain equal.
UPDATE GigaOM posted about a clarification from Google which says that the WSJ was "confused". The hubbub in that article really was misplaced, it's a CDN deal.
google cdn content delivery network net neutrality wall street journal
( Dec 14 2008, 10:52:25 PM PST ) Permalink
Wednesday December 10, 2008
In the last few years, the scope of Amazon Web Services (AWS) has broadened to cover a range of infrastructure capabilities and has emerged as a game changer. The hype around AWS isn't all wrong, a whole ecosystem of tools and services has developed around AWS that makes the offering compelling. However, the hype isn't all right either. At Technorati, we used AWS this year to develop and put in production a new crawler and a system that produces the web page screenshot thumbnails now seen on search result pages. But now that that chapter is coming to a close, it's time to retrospect.
There's a prevailing myth that using the elasticity of EC2 makes it cheaper to operate than fixed assets. The theory is that by shutting down unneeded infrastructure during the lulls, you're saving money. In a purely fixed infrastructure model, Technorati's data aquisition systems must be provisioned for their maximum utilization capacity threshold. When utilization ebbs, a lot of that infrastructure sits relatively idle. That much is true but the reality is that flexible capacity is only saving money relative to the minimum requirements. So the theory only holds if your variability is high compared to your minimum. That is, if the difference between your minimum and maximum capacity is large or you're not operating a 365/7/24 system but episodically using a lot of infrastructure and then shutting it down. Neither is true for us. The normal operating mode of Technorati's data acquisition systems follows the ebb and flow of the blogosphere, which varies a lot but is always on. The sketch to the left shows the minimum capacity and the variable capacity distinguished.
In response to some of the fallacies posted on an O'Reilly blog the other day (by George Reese), On Why I Don't Like Auto-Scaling in the Cloud, Don MacAskill from SmugMug wrote a really great post yesterday about his SkyNet system, On Why Auto-Scaling in the Cloud Rocks. Don also emphasizes SmugMugs modest requirements for operations staff. In an application with sufficient simplicity and automation around it, it's easy to imagine a 365/7/24 service having meager ops burdens. I think we should surmise that the cost of operating SmugMug with autonomic de/provisioning works because it fits their operating model. I understand Reese's concern, that folks may not do the hard work of really understanding their capacity requirements if they're too coddled by automation. However, that concern comes off as a shill for John Allspaw's capacity planning book (which I'm sure is great, can't wait to read it). Bryan Duxbury from RapLeaf describes their use of AWS and how the numbers work out in his post, Rent or Own: Amazon EC2 vs. Colocation Comparison for Hadoop Clusters. Since the target is to serve a Hadoop infrastructure, AWS must get a thumbs down in their case. Hadoop's performance is impaired by poor rack locality and the latencies of Amazon's I/O systems clearly drags it down. If you're going to be running Hadoop on a continuous basis, use your own racks, with your own switches and your own disk spindles.
At Technorati, we're migrating the crawl infrastructure from AWS to our colo. While I love the flexibility that AWS provides and it's been great using it as a platform to ramp up on , the bottom line is that Technorati has a pre-existing investment in machines, racks and colo infrastructure. As much as I'd like our colo infrastructure to operate with lower labor and communication overhead, running on AWS has amounted to additional costs that we must curtail.
Cloud computing (or utility computing or flex computing or whatever its called) is a game changer. So when do I recommend you use AWS? Ideally: anytime. If your application is architected to expand and contract its footprint with the demands put upon it, provision your minimum capcacity requirements in your colo and use AWS to "burst" when your load demands it. Another case where using AWS is a big win is for a total green field. If you don't have a colo, are still determining the operating charactersics of your applications and need machines provisioned, AWS is an incredible resource. However, I think the flexibility vs. economy imperatives will always lead you to optimize your costs by provisioning your minimum capacity in infrastructure that you own and operate.
There's also another option: instead of buying and operating your own machines and racks, you may be able to optimize costs by renting machines provisioned to your specs in a contract from the services that have established themselves in that market (Rackspace, Server Beach, ServePath, LayeredTech, etc). Ultimately, I'm looking forward to the emergence of a compute market place where the decisions to incur capital expense, rent by the hour or rent under a contract will be easier to traverse.
amazon web services aws cloud computing rapleaf technorati smugmug hadoop oreilly data centers capacity planning
( Dec 10 2008, 11:53:19 PM PST ) Permalink
Tuesday December 09, 2008
Ten years ago, you might have been advised that solar energy, while sounding nice, was a bad investment. The installations were failure prone and not cost effective. I don't know if I bought that then, I know of solar panels in San Francisco installed in the 80's that paid for themselves, just slowly. But what we're seeing isn't your father's solar panel. From Google's solar panels to residential rooftops, it seems pretty clear that the
Economics of Solar Power Are Looking Brighter. Fast Company is running an article The Solar Industry Gains Ground that sounds a chord that we're hearing a lot of. Solar energy is getting more and more cost effective. What's projected is that the cost of solar power may share up-and-to-the-right properties of Moore's Law. The fabs that make the silicon enabling you to read this may also enable an energy giant leap forward. The Germans have their own "Solar Valley" and their industry projection graph appears pretty Moorish (look at the large yellow area).
The big lift off is 10 years away but the investment that has been made in the area and the advances being made seem to put the benefits close at hand. But the big win, when dependence on fossil fuels are on a clear decline, is at leat 10 and 20 years out. But I think it can happen, I think the solar decade is coming. It should be the coming decade. However, it will require an Apollo-mission like focus from the Obama administration to succeed. And I hope we can make it a reality.
( Dec 09 2008, 11:55:24 PM PST ) Permalink