Joyent

Joyent Weblog

Structure 09 in SF

I’m moderating the first panel of the day at Om’s Structure09 conference today.

If you’re at the conference please make sure you say “Hi”.

Part 2, On Joyent and Cloud Computing "Primitives"

In the first part of this series I made a key list of some of the underlying ideas at Joyent, that we believe that a company or even a small development team should be able to:

  1. Participate in a multi-tenant service
  2. Have your own instantiations of this service
  3. Install (and “buy”) the software to run on your own infrastructure
  4. Get software and APIs from Joyent that allows for the integration of all of these based on business desires and policies.

And said

The successful future “clouds” have to be more accessible, easier to use and to operate, and every single part of the infrastructure has to be addressable via software, has to be capable of being introspected into and instrumented by software and this addressability means that one can write policies around access, performance, privacy, security and integrity. For example, most of our customer really don’t care about the details, they care in knowing what is capable of providing 99.99% of their end users some great experience 99.99% of the time. These concepts have to be bake in.

I continue to think that from a developer’s perspective the future is closer to the SMART platform where Ramin’s comment on an older Joyeur article about EC2 versus Accelerators is relevant, let me quote him:

Whoever has the fewest number of steps and the fastest build/deploy time is likely to attract the most developers. Whoever can show that the operating cost scales linearly with use will have developers casting flower petals in their path :-)

As an app developer, I don’t care that it runs on Solaris, FreeBSD, or Mac-OS. I want it to work. I want an optimized deployment workflow and a simple way to monitor and keep things running.

That all said.

In the second part to this series I wanted to start talking about “primitives”. I’m saying “start” because we’re going to be going to be covering primitives over the next couple of posts.

I’m going to loosely define “Primitives” (now with a capital P) as of all the stuff underneath your application, your language and the specific software you’re using to store your data. So yes, we’re talking about hardware and the software that runs that hardware. Even though most Primitives are supposed to eventually be hidden from a developer they’re generally important to the business people and those that have to evaluate a technology platform. They are important parts of the architecture when one is talking about “access, performance, privacy, security and integrity”.

Previously, I’ve talked about a bit about Accelerators ( On Accelerators) and that fundamentally we deal with 6 utilities in cloud computing.

The fermions are the utilities where things take up space

1) CPU space
2) Memory space
3) Disc space

The bosons are the utilities where things are moving through space and time

4) Memory bus IO
5) Disc IO
6) Network IO

All of these utilities have physical maximums dictated by the hardware, and they have a limit I’d like to call How-Likely-Are-You-To-Do-This-From-One-Machine-Or-Even-At-All.

I’ll admit at this point of a particular way of thinking. I think “what is the thing?”, “how it is going to behave?”, “what are the minimums and maximums of this behavior?” and finally “why?”.

The minimum for us is easy. It’s zero. Software using 0% of the CPUs, 0 GB of memory, doing 0 MB/sec of disc IO and 0 Gbps of network traffic.

The maximums:

  1. Commercially available CPUs typically top out in the 3s of Ghz
  2. “Normal” servers typically have <128 GB of memory in them and the ratio of 4GB of memory per CPU core is a common one from HPC (we use this and it would mean that a 128 GB system would have 32 cores)
  3. Drives are available up to a terabyte in size but as they get larger you’re making performance trade-offs. And while you can get single namespaces into the petabyte range, even though ones >100 TB are still irritating to manage (for either the increased fragility of a larger and larger “space”, or the variation in latencies between a lot of independent “storage nodes”).
  4. CPUs and memory talk at speeds set by the chip and hardware manufacturers. Numbers like 24 Gbps are common.
  5. Disc IO can be in the Gbps without much of an issue
  6. For a 125 kb page with 20 objects on it, 1 Gbps of traffic will give you 122,400,000 unique page views per day and that in a 30 day month this is 3,672,000,000 page views (Check my math). Depending on how much stuff you have going on, this basically puts you in as a top 100 web property. With the number of public website is ~200 million (source), being in the top 200 is what … 0.00001% of the sites?

As something to think about and as an anchor, I remember seeing a benchmark of a “Thumper JBOD” attached to a system capable of saturating the 4×10Gbps NIC cards in the back of it. Yes the software was special, yes it was in C, and yes it was written with the explicit purpose of pushing that much data off of discs; however, think about that for a minute.

Imagine having a web property doing 120 billion monthly page views coming off of a single “system” that you can buy for a reasonable price. Starting from there, expand that architecture and I wonder with the “right software” and “primitives” where you would end up. If we change it from a web property to a gaming or a trading application, where would you end up? What is the taxonomy of applications out there (common and uncommon) and do we come up with the best architectures for each branch and leaf on that tree?

Please think about that anchor and a taxonomy for a few days and then I’m going to get into some of the key differentiators of our Primitives and answer some of the “Why?”.

Contestant Winners, Free Social Apps Infrastructure, Upcoming Events

The “Answer Questions about Jason” contest was a success and all the contest winner received a free entry-level Joyent Accelerator. Congratulations. I was proud to give away Accelerators to celebrate the launch of JSBin on Joyent.

Keeping with the free theme, we will be expanding the number of free slots we have open for developers of social networking applications (thanks to Sun and the Sun Startup Essentials partnerships). We plan to accommodate the entire waiting list for the Facebook, Bebo and Opensocial programs. We look forward to seeing many more great applications developed on Joyent.

The marketing team at Joyent has been very busy. Here’s a list of upcoming events Joyeurs will be attending:

Hope to see you out there.

On Joyent and "Cloud Computing", Part 1 of Many

With this post, I’m starting a series where we’re going to be much more explicit about what we’re thinking, what we’re doing, how we’re doing it and where we’re going. I’m not interested in any of it being thought of as impersonal “marketing material”, so I hope you’ll allow the occasional use of “I” and for the interweaving of my perspectives and perhaps a story or two. As the CTO and one of the founders whose lives has been this company, I’m going to go ahead, be bold and impose.

In May of this year, we’ll have been in business for five years, and looking back at it, it’s amazing that we’re still here. Looking back at it, it makes complete sense that we’re still here. Our idea was pretty simple: let’s take what it normally BIG infrastructure (i.e. 64GB of RAM, 16 CPU servers, 10 Gbps networking), virtualize the entire vertical stack and then go to people who normally buy some servers, some switches and some storage, and sell them complete architectures. All done and ready to go.

It is commonly said that businesses only exist out of necessity, that the best ones take away or care of “pain”. Makes sense, doctors only exist because there is disease.

It is a pain for most companies (“Enterprises”) to expertly own and operate infrastructure that can scale up or down depending on their business needs. Infrastructure is more a garden then a monument but rarely treated as one.

It is difficult to be innovative in the absence of observing tens of thousands of different applications doing tens of thousands of different things.

It is easy to make the same mistakes that others have made.

That final point is an important one, by the way. I’ll tell you why in a slightly roundabout way.

I’m a scientist by training and have always been a consumer of “large compute” and “big storage” not just a developer of them (I can write about why that is a good thing another time). While at UCLA as a younger man, I was lucky enough to have had a graduate physiology course where one of the lecturers was Jared Diamond and later on I went to an early reading of one of his books that was coming out, Guns, Germs, and Steel. What reached out and permanently imprinted itself in my mind from that book was the Anna Karenina principle. It comes from the Tolstoy quote, “Happy families are all alike; every unhappy family is unhappy in its own way.”, and simply put it means that success is purely the absence of failure.

Success is purely the absence of failure.

Education and “products” are meant to save people and enterprises from mistakes. Both the avoidance of “known” mistakes (education) and the surviving of “unknown” mistakes (luck, will power, hustle, et al).

Internally I commonly say,

It’s fine to make mistakes as long as 1) no one has ever made that mistake before, if they have, you need to question your education or get some, 2) we’re not going to be repeating this mistake, if we do, we need to question our processes, our metrics and how we communicate and educate and 3) the mistake does not kill us

This is all important because I believe the primary interest in “cloud computing” is not cost savings, it’s in the participation of a larger network of companies and people like you. It is the consumption of products that

  1. Are more accessible and easy to use (usually from a developers perspective)
  2. Are easier to operate

However, that’s not it. I can say with complete certainty that main barrier towards adoption is trust. Trust specifically around privacy, security and integrity.

The successful future “clouds” have to be more accessible, easier to use and to operate, and every single part of the infrastructure has to be addressable via software, has to be capable of being introspected into and instrumented by software and this addressability means that one can write policies around access, performance, privacy, security and integrity. For example, most of our customer really don’t care about the details, they care in knowing what is capable of providing 99.99% of their end users some great experience 99.99% of the time. These concepts have to be bake in.

These are some of the underlying ideas behind everything that we’re doing. At Joyent, we believe that a company or even a small development team should be able to

  1. Participate in a multi-tenant service
  2. Have your own instantiations of this service
  3. Install (and “buy”) the software to run on your own infrastructure
  4. Get software and APIs from Joyent that allows for the integration of all of these based on business desires and policies.

Sometimes people refer to this as the “public and private cloud”. Oooook.

I happen to believe we’re capable of doing this quite well compared to most of the large players out there. We own and operate, and we’re making those APIs and software available (often open source).

Amazon Web Services is a competent owner and operator and allows you to participate in a multi-tenant service like S3, but there are no API extensions for integrity, you cannot get your own S3 separate from other people, you cannot install “S3” in your own datacenter, and of course there is no software that allows an object to move amongst these choices based on your policies.

Question: “If you have a petabyte in S3, how do you know it’s all there without downloading it all?”, I can answer that question if I had a petabyte on Netapps in-house.

The Cisco, Sun, HPs and IBMs (I’ll toss in the AT&Ts too) of the world want to sell you more co-lo, perhaps something best called co-habitation, more hardware, more software, less innovation and no ease of operating.

Larry Ellison says that this is all a marketing term. I think he’s wrong. We think he’s wrong, and that Joyent seems to be unique in focusing on making the above a reality.

Next week, I’ll be discussing “primitives” and “architectures”.

Why Joyent Banned all Employees from Attending South-by-Southwest Interactive This Year

We have been asked a number of times whether Joyent is going to the South-by-Southwest Interactive (aka SXSW) festival this year. The answer is “no”. All Joyent employees are, in fact, banned from SXSW for the following lucky seven reasons:

1) Drinking. There is lots and lots of drinking of alcoholic beverages. I think this is the most important thing to understand about SXSW. Lots and lots of drinking. Joyent co-sponsored the 16-bit party last year (I think they’re calling it 32-bit this year, haha, get the joke?) and I remember standing in this junk yard (the location of the party) being shocked while hundreds of people were jumping the fences to get into the party to: drink. Lots and lots. Then we got into these bicycle-drawn-carts and rode around in the dark. I couldn’t believe that ride cost $180. Seems high. Then I’m on an outside patio and there’s John Gruber and his lovely wife Melissa and they’re both talking about pixels. Too much. It just went on and on.

2) BBQ. As a native Texan (Dallas, 1966), I makes me sick to hear Yankees (non-Texans) talk about Bar-B-Que. Believe me, that is about all ya’ll hear about during SXSW when ya’ll not drinking and drinking. “Oh, we went to Salt Lick and had BBQgasm.” No self-respecting Texan talks like that. In fact, only the folks that moved to Texas from New Jersey go to the Salt Lick. I don’t care what Matt Mullenweg says. The “good food” in Texas is found in the back yard of someone’s house and only I and a few other folks know where to find it. If the SXSW crowd is there, well, I need a drink.

3) Social media. Be careful, we’re still on solid ground, but if you actually go to any of the presentations at SXSW you will tumble right into rapturous discussions of “starting the conversation” which is difficult to get excited about after all that drinking. So you’re sitting there in the audience and someone is going on about “bizarre versus convention center” when, I swear you look around and you realize you’re smack dab in the middle of…

4) San Francisco. What a pathetic excuse of a city. On almost all the levels and altitudes. It is no New York on the west coast. That would be Los Angeles. I can only repeat what my daughter recently said when I asked if she wanted to drive around San Francisco. “No, let’s go to the airport.” Amen. I need a drink.

5) Muxtape. If you aren’t already using Muxtape, I beg/urge you to get over there right now. However, if Muxtape were to break out during SXSW (Interactive AND Music), I don’t know that we could be so enthusiastic. Muxtape right now is like that silly little bar in the Bowery. We don’t want people streaming in from New Jersey muxing it all up. I have a bad feeling about this.

6) More than the “One Accelerator”? Microsoft, you’re got to be kidding. There is only one Accelerator.

7) Austin. Finally, Austin the city and its environs. This may seem a strange reason to ban Joyent employees from SXSW. Don’t get me wrong. I personally love Austin. It is a wonderful city with rich cultural, and historic offerings. My brother went to the University of Texas. Hook’em horns and all that. You can visit a French embassy to the Republic of Texas in Austin. Nice. But let’s face it. Austin is not Texas. It is cartoon Texas. I wouldn’t want to saddle New Jersey with Newark any more than I want to saddle Texas, and Joyent employees’ understanding of Texas, with Austin.

Maybe one day Joyent will be back at South-by-Southwest Interactive. I’m sure that will be the year before it winds down.

Light-weight, Collaborative Javascript Debugging: JS Bin on Joyent Accelerators

JS Bin is a very useful utility offering collaborative JavaScript debugging. From the About section:

JS Bin allows you to edit and test JavaScript and HTML (reloading the URL also maintains the state of your code – new tabs doesn’t). Once you’re happy you can save, and send the URL to a peer for review or help. They can then make further changes saving anew if required.

At Joyent we’re pleased to be offering the cloud infrastructure for JS Bin. We have high hopes for the future of JavaScript both on the client and the server.

To celebrate JS Bin on Joyent Accelerators we will give away five one-year subscriptions to entry-level Accelerators to people that answer one of the following five questions:

1) What place did Jason Hoffman come in for the Joyent 2007 weight loss contest? Jason came in ___________.
2) Please provide a URL to pictures showing Jason Hoffman before/after his tremendous weight gain? URL: _________________.
3) Much of Jason Hoffman’s earlier weight gain came from eating ____________.
4) Jason Hoffman’s current weight is _____________.
5) Jason Hoffman smokes ___________ packs of cigarettes daily.

You can answer questions in the comments. The first one to answer a question correctly will win an Accelerator. One win per player. No Joyent employees, thanks.

Update: congratulations to all the winners. Contest done. Prizes will be sent out soon.

Update: all Prizes have been sent to the winners

Google App Engine Misfit Toys: Come to Jill

Huh?

You’ve got to be kidding (from today’s O’Reilly Radar):

Google released App Engine less than a year ago (Radar post). It was the first chance for external developers to use the power of Google’s servers. The powerful platform supported Python and was free (within limits). It now supports 45,000 apps and those apps get over 100 million page views per month. Those pageviews were all free, but they had limits.

Only 100 million page views per month? Across 45,000(?) applications? Isn’t that something like 2222 page views per month per application (or 74 per day)? Is this the auto-scale platform we’ve been waiting for? (Please see update below.)

Google also announced pricing for App Engine. It is essentially in-line with Amazon’s Web Services pricing.

App Engine Running on Joyent

I was interested to understand what Google App Engine applications, all of them, running on Joyent Accelerators might cost. We currently have a customer that pushes 2 billion page views a month. That customer spends $60,000 per month or $0.00003 per page per month with Joyent. (Bandwidth and storage are bundled into the pricing.) This means the entire Google App Engine application portfolio, all 45,000 applications or 100 million page views, could run of Joyent for $3000 per month. Astonishing. Dear Google Operations: if you are spending more than $3000 per month running Google App Engine, please give us a call. We can save you some money. (Please see update below.)

These realities are layered on top of a closed, proprietary platform. Truly misfit.

Yes, Joyent is investing in a platform, based on Javascript (to begin with) that will compete with Google App Engine. It will be priced aggressively, will be completely open, and run on the same blazingly fast Joyent Accelerators serving up customers that need real (2 billion page view) performance.

Citizens of App Engine: come to Jill!

Update: the traffic for App Engine has been updated from 100 million pages per month to 100 million pages per day. So, if App Engine costs more than $90K/month to run…the offer remains the same.

Open, Loving, Just Workingness: The Smart Platform and Javascript

About a month ago Joyent acquired Reasonably Smart and we were happy with the breadth of the coverage (e.g. @GigaOm). Much of the feedback was positive and there were also some important questions and comments: many around “what is open?” and the current choice of Javascript as the server-side programming language for the platform.

There were a few other factors that were important to us at Joyent, and that weren’t covered. David and I, the founders, happened to get along great with Reasonably Smart’s founders and could see ourselves working together for a long time. Basically, if I started a company today, I’d give those guys a call, and try and get them involved. I’d say that we’re also pretty happy about being able to make smart, targeted acquisitions in today’s economy, considering everything that’s going on in regards to technology companies.

But.

Back to the Big Technology Things.

On Javascript

There’s a few things that have been sitting in my mind for a bit of time:

1) Our connector suite of software happens to be nearly 10% javascript (see the ohloh analysis). I’d find myself wondering why not 20%, 50% or 100%? Why aren’t we just writing all of our applications in straight Javascript? Once you begin to think of it as a server-side language, it gets pretty interesting.

2) Joel Spolsky wrote this great little piece called Can Your Programming Language Do This? and all the neat examples actually used Javascript as the programming language.

3) When talking to the guys at Sun’s Fishworks Project, I learned that they were writing their restricted shell for their storage appliance in Javascript (the CLI is 24227 lines of javascript). A shell. A command line shell. It’s in javascript. Javascript.

4) The realization that the problem with Javascript wasn’t the language, it was the DOM (Document Object Model) and how it’s implemented in different browsers.

Bruce Tate hits the nail on the head here

The beleaguered language sags under the weight of a complex programming model called the document object model (DOM), poor tools for implementation and debugging, and inconsistent browser implementations. Until recently, many developers had all but written off JavaScript as a necessary evil at best or a toy at worst.

Our own James puts it nicely “if you’ve not programmed JavaScript without all that tedious mucking about in the DOM, try it – you’ll be pleasantly surprised!”.

5) And yet … Javascript finds itself in the top 3 or 5 languages to know for the future (Red Canary’s survey and GigaOm coverage is just one example.)

6) In the same article, James points out that Javascript is a hardened language. That’s great for a service provider, nothing to strip away from a language like what had to be done with Python in Google App Engine.

7) And finally, the Javascript VM used in the Reasonably Smart platform could find itself supporting the syntax of other languages (for example, python and ruby) in the near future.

And we have the potentially interesting adoption pattern of client-side javascript to server-side javascript, and the possibility of more and more functionality being added to applications by just having the javascript that’s already there be extended and hooked directly into data stores et cetera. It was feasible to even look at our own current applications (or geez, future command line shells …) and see how they would evolve this way give the right platform, frameworks, backends.

Javascript, at least for me, emerged then as a fascinating language. And fascinating in several different ways: technically, culturally (in lack of a better word) and as a core for one of our service businesses.

On Smart Platforms

Now to that Service.

We’re calling it the Joyent Smart Platform now (don’t quite know if it’ll be the Joyent Smart, Joyent SMART or Joyent S.M.A.R.T platform — likely the first one because wouldn’t one expect S.M.A.R.T to mean something? — but I’ll let David and Bryan figure that one out).

And it’s actually a “platform”.

Joyent has been working on providing a rock, solid set of “primitives”: powerful, secure and flexible infrastructure (as a service …). These primitives have been able to crank away on respectably sized sites and problems. [Author’s note: I’m trying to limit the “as a service” trailer].

We’ve always wanted things to Just Work™, to be open, flexible, loving and happy.

Now why would one write to a platform?

You can write an application, use documented APIs and interfaces in that platform to do everything you need to do, and boom bam magic bim bam it’s deployed, up, running, instrumented and scaleable. Fast, easy, beautiful, and you can focus on your actual business.

Why wouldn’t one write to a platform?

Three reasons: vendor lock-in, portability and junk parts.

A Loving Cloud is what I want, and Simon Wardley wants Happiness. Honestly I often think of Simon’s rules when looking at our own roadmap.

Rule 1: I want to run the service on my own machine.

Rule 2: I want to easily migrate the service from my machine to a cloud provider and vice versa with a few clicks of a button.

Rule 3: I want to easily migrate the service from one cloud provider to another with a few clicks of a button.

The Smart Platform, when deployed, will be provided as an supported and open source offering, and will always strive to use the best components. This means you can make use of your own installation of the platform or our service (we think the value in software emerges when done as a service) or both. You can easily access your application code and it’s data, and when integrated with all the primitives Joyent has underneath you will be able to clone the entire platform stack (from load balancers to smart datastores). As long as you have access to raw iron you will be able to do what’s best for you, and do it in the same exact way.

We’ll be Open, you’ll be Loved and hopefully from that, you’ll be very Happy.

The Dark Star Is Born

With Cisco’s announcement that the company is getting into the server business (news here), it is easily seen that all infrastructure businesses are collapsing into vertically integrated entities. This, in my opinion, is especially true/possible in the cloud. Why not offer all the primitives: compute, networking, storage? Why not offer cloud management software? Why offer all the web services need by developers and customers? Everything and all collapses into a dense, dark star.

Cisco just became a non-virtual, fully integrated systems company…just as everyone is moving on to be completely virtual.

The new library

Time was you did systems and networking work, you had The Library. Programming Perl , sendmail , DNS and Bind , TCP/IP Illustrated , UNIX System Administration Handbook . These books documented the core knowledge to operate effectively in our industry. You’d add others as your level of badassery increased, perhaps Mastering Regular Expressions , Internet Routing Architectures , or APUE , depending on your passions and your role.

Times have changed. You still need to have all the old knowledge, but you need quite a bit more to meet the bar in the modern, massive-scale, online services world. Thousands of servers, distributed storage, databases with billions of rows, real-time monitoring of it all. The books below should be in your library and considered core for you and anyone else in the field.

Scalable Internet Architectures by Theo Schlossnagle

Architecture is the most important thing in building and operating a scalable service, and this book is the best book around on the topic. Read it, internalize its message, build better systems. I’ve never met him, but all signs point to Theo being an awesome guy, to boot. His blog is here .

Solaris Performance and Tools by Richard McDougall, Jim Mauro, and Brendan Gregg

You’ve got your badass architecture laid out, code written, now what? You need visibility as far into your system as you can manage so you know the state of the world and where you can make the most impact in server and application performance. Enter DTrace. Yes, this is a book about Solaris, but DTrace is built into OS X and FreeBSD (and Apache, thanks, Theo!) now, as well. DTrace is a quiet revolution in system performance monitoring. Master it.

A couple of the authors blog often enough to matter: Richard McDougall’s is here and Brendan Gregg’s is here .

High Performance MySQL by Baron Schwartz, Peter Zaitsev, Vadim Tkachenko, Jeremy Zawodny, Arjen Lentz, and Derek Balling

Database tuning has always been a bit of voodoo to me, perhaps because most of my exposure to it has been ninjas like the DB architects at Amazon tuning Oracle to within an inch of its life. This book demystifies the space, and thank goodness, because you are going to need it to get the most out of your infrastructure. How can you go wrong with folks like Jeremy Zawodny? You can’t. Here’s his blog .

High Performance Web Sites by Steve Souders

You’ve mastered your architecture, instrumented your servers, and learned to tune the crap out of your databases. Now you can, and should, turn your attention to the performance experienced by your users. That was the whole point, right? Steve Souders created YSlow while Chief Performance Yahoo! at the purple giant, and is now doing similarly useful work at the GOOG. Follow him here .

The Art of Capacity Planning by John Allspaw

Let’s get this out of the way: I love this book! Allspaw knows capacity planning for online services because he lives and breathes it and his passion comes through clearly in his writing. Think “the cloud” world of dynamic, on-demand resources free you from having to do real capacity planning? Think again. Now you have to do it even faster and the flexibility of the new tools means you will discover and exploit all sorts of new capacity planning and management techniques. Get the latest from John here .

So, there you have it! Get to reading and then get to building.

Previously