Joyent

Joyent Weblog

Part 3, On Joyent and Accelerators as Cloud Computing "Primitives"

In the last part of this series we ended by talking about 6 “simple” utilities that software uses on “servers”. They were

1) CPU space
2) Memory space
3) Disc space
4) Memory bus IO
5) Disc IO
6) Network IO

Along with their natural minimums (zero) and maximums.

Providing compute units that do these utilities

What we’ve always wanted to do at Joyent was provide “scalable network appliances”: online servers that just worked for given functions and were capable of both handling bursts and serving as logical lego-like building blocks for new and legacy architectures. Sometimes these appliances might contain our own software, sometimes not. They would be on a network where it would be difficult for a given piece of software to saturate the immediate parts of it.

For most workloads, a ratio of 1 CPU:4GB:2 spindles works pretty well. The faster the CPU (constrained by power) the better, memory is well … memory, and the size and speed of those spindles can be varied depending on what a node is going to be doing (one end of a workload possibility or the other). In other words, 1) CPU space, 2) Memory space and 3) Disc space aren’t terribly interesting or difficult to schedule and manage in most environments (in some ways, they’re a purchasing decision), and 4) Memory bus IO is set in silicon stone by their creators.

Which ones matter?

We’re left with disc and network IO as being the key utilities and the ability to move things on-and-off disc and in-and-out of the network comes from using more CPU. So when an application experiences a surge in activity, it typically results in more use of the CPU, disc IO and network IO. We decided the best approach was to put CPU and disc IO on a fair-share system, and standardize on an overbuilt network (10 Gbps interconnects, multi-gigabit out of the back of each physical node). Our physical servers exclusively use intel NICs in a PCIe format. This way we can have multiple gigabits per node, physical segregation and failover as options, and we can standardize on a network driver and get away from on-board NICs changing from server to server.

The fair share system overall is supposed to provide for short-term (could just be minutes) bursting needed to handle spikes in demand. Spikes that occur too fast for either hardware upgrades or even to spin up new VMs on most systems (and are often too fast to be noticed in 5-10 minute monitoring pings). This allows for you to stop thinking about the servers that you pay for as being these constrained maximums, and start thinking about a “1 CPU, 4GB of RAM” server as being a guaranteed minimum allotment.

This was at least why we stopped calling them “Grid Containers” and started calling them “Accelerators”.

Moving up the stack

The next installments are going to be talking about some kernel and userland experiences and choices, our desire to use ever more “dense” hardware, 32 and 64 bit environments, and rapid reboot and recovery times.

Google App Engine Misfit Toys: Come to Jill

Huh?

You’ve got to be kidding (from today’s O’Reilly Radar):

Google released App Engine less than a year ago (Radar post). It was the first chance for external developers to use the power of Google’s servers. The powerful platform supported Python and was free (within limits). It now supports 45,000 apps and those apps get over 100 million page views per month. Those pageviews were all free, but they had limits.

Only 100 million page views per month? Across 45,000(?) applications? Isn’t that something like 2222 page views per month per application (or 74 per day)? Is this the auto-scale platform we’ve been waiting for? (Please see update below.)

Google also announced pricing for App Engine. It is essentially in-line with Amazon’s Web Services pricing.

App Engine Running on Joyent

I was interested to understand what Google App Engine applications, all of them, running on Joyent Accelerators might cost. We currently have a customer that pushes 2 billion page views a month. That customer spends $60,000 per month or $0.00003 per page per month with Joyent. (Bandwidth and storage are bundled into the pricing.) This means the entire Google App Engine application portfolio, all 45,000 applications or 100 million page views, could run of Joyent for $3000 per month. Astonishing. Dear Google Operations: if you are spending more than $3000 per month running Google App Engine, please give us a call. We can save you some money. (Please see update below.)

These realities are layered on top of a closed, proprietary platform. Truly misfit.

Yes, Joyent is investing in a platform, based on Javascript (to begin with) that will compete with Google App Engine. It will be priced aggressively, will be completely open, and run on the same blazingly fast Joyent Accelerators serving up customers that need real (2 billion page view) performance.

Citizens of App Engine: come to Jill!

Update: the traffic for App Engine has been updated from 100 million pages per month to 100 million pages per day. So, if App Engine costs more than $90K/month to run…the offer remains the same.

Joyent Two for One Christmas Sale!

Tis’ the season and Joyent is happy to announce the always popular and often demanded TWO FOR ONE Joyent Accelerator sale!

This sale is a limited time from now through the end of December.

How it works is that you get 24 months of Joyful Accelerator goodness for the price of 10 months. If you do that the math, that’s quite a deal.

Let’s look at a 1 GiB example:

Regular Monthly Pricing
1 GiB Monthly = $125 / month or $1500 / yr

Regular Annual Pricing
1 GiB Annual = $125 / month or $1250 / yr

December Sale Pricing
1 GiB Sale = equivalent to $52.08 / GiB per Month when you pay $1250 for 24 months!

  • All pricing includes 10 TB of data transfer per month per customer.
  • All Accelerator plans have a one-time setup fee equal to the regular one month charge for that size accelerator. For example, the set up fee on a 1GB accelerator is $125.

To Order, just order from our secure order form:

https://secure.joyent.com/

All Our Best During the Holiday Seasons,

Team Joyent

Lessons Learned

This video from a Frontline documentary speaks volumes about what the state-of-the-art is for computers providing autoscale in the cloud. It still comes down to piloting (architectures).

A Loving Cloud

Yesterday several members of Joyent’s team attended Structure ’08. Jason Hoffman was on a panel that produced some interesting debate about whether clouds should aim to be open. The story was even picked up by the Wall Street Journal’s Don Clark in an article entitled Finding A Friendly Cloud

Jason Hoffman, founder and chief technology officer of a cloud-computing specialist called Joyent, was particularly pointed in warning that Google’s App Engine could represent a lock-in to developers. It is possible to build “a loving cloud,” he argued, that would make it easier to create applications that could be easily moved among different services. Other panelists kept calling Google’s App Engine “proprietary,” which to many techies is equivalent to labeling it both evil and outdated at the same time.

Here’s video of the exchange Jason had with Google’s Christophe Bisciglia

1 Billion Page Views a Month

Here’s a video detailing how LinkedIn built an application (Bumpersticker on the Facebook platform) using Rails (and C Ruby!) that serves up more than 1 billion page views a month.

In my opinion, this ends the debate about whether Rails scales. Rails is a component, it is how the components are architected and delivered that comprises the magic. LinkedIn did amazing work taking advantage of Joyent’s technology stack including innovative ways of leveraging Joyent Accelerators and our BigIP load balancers.

Congratulations to the LinkedIn team. Great accomplishment! You can read more about the how LinkedIn scaled bumpersticker on Joyent in a post on their blog entitled Web Scalability Practices: Bumper Sticker on Rails

View the video.

Update: ZDNet writes about the story.

Amazon Web Services or Joyent Accelerators: Reprise

In the Fall of 2006, I wrote a piece On Grids, the Ambitions of Amazon and Joyent, and followed up with Why EC2 isn’t yet a platform for ‘normal’ web applications and the recognition that When you’re really pushing traffic, Amazon S3 is more expensive than a CDN.

The point of these previous articles was to put what wasn’t yet called “cloud computing” into some perspective and to contrast what Amazon was doing with what we were doing. I ventured that EC2 is fine when you’re doing batch, parallel things on data that’s sitting in S3, and that S3 is economically fine as long as you’re not externally interacting with that data to a significant degree (then the request pricing kicks in). Basically it is incorrect that each are universally applicable to all problems and goals in computing, and that they’re cost-effective. An example of a good use case is a spidering application: one launches a number of EC2 instances, crawls a bunch of sites, puts that information into S3, and then launches a number of EC2 instances to build an index of that data and further store it on S3.

Beyond point-by-point features and cost differences, I believe there are inherent philosophical, technical and directional differences between Joyent and Amazon Web Services. This is and has been our core business, and it’s a business model, in my opinion, that competes directly with hardware vendors and customer taking direct possession of hardware and racking-and-stacking it in their own datacenters.

Cloud computing is meant to be inherently “better” than what most people can do themselves.

What’s changed with S3 and EC2 since these articles?

For S3? Nothing really. There are some additional data “silo” services now. SimpleDB is out and there has been some updates to SQS, but I would say that S3 is by far the more popular of the three. The reason is simple: it’s still possible for people to do silly things when storing files on a filesystem (like put a million directories in one directory), but it’s more difficult to do things as silly with a relational database (you still can, but they’re ultimately handled within the RDMS itself, for example, bad queries).

I’m consistently amazed by how many times I have to go over the idea of hashed directory storage.

For EC2 there’s been some improvements.

Annotating the list from “Why EC2 isn’t yet a platform for “normal” web applications we get:

1. No IP address persistence. EC2 now NATs and EC2 instances are on a private network. That helps. Are you able to get permanently assigned, VLAN’ed network address space? It’s not clear to me.

2. No block storage persistence. There is now an option to mount persistent storage in a “normal” way. Presumably it’s block storage over iSCSI (there’s not many options for doing this), hopefully it’s not a formalized FUSE to S3. We’ll see how this holds up performance-wise, now there’s a bit more predictability in data stored in EC2 but experience has shown me that it only takes one really busy database to tap out storage that’s supposed to be serving 10-100 customers. Scaling I/O is still non-trivial.

3. No opportunity for hardware-based load balancing. This is still the case.

4. No vertical scaling (you get a 1.7Ghz CPU and 1 GB of RAM, that’s it). There are now larger instances but the numbers are still odd. 7.5GB of RAM? I like powers of 2 and 10 (so does computer science).

5 & 6. Creation and handling of AMIs. Experience like this is still quite common, it seems.

Structure of modern applications

The three tiers of “web”, “application” and “database” are long dead.

Applications that have to serve data out (versus just pulling in like the spidering example earlier) are now typically structured like: Load Balancers/Application Switches (I prefer the second term) <-> Cache <-> Application <-> Cache <-> Data. Web and gaming applications are exhibiting similar structures. The caching tiers are optional and either can exist as a piece of middleware or as part of the one of the sandwiching tiers. For example, you might cache as part of the application, or in memcached, or you might just be using the query cache in the database itself. And while there are tiers, there are also silos that exist under their own namespaces. You don’t store static files in a relational database, your static assets are CDN’ed and served from e.g. assets[1-4].yourdomain.com, the dynamic sites from yourdomain.com and users logged-in at login.yourdomain.com. Those are different silos.

How to scale each part and why do people have problems in the first place?

Each tier either has state or not. Web applications are over HTTP, an inherently stateless protocol. So as long as one doesn’t introduce state into the application, the application layer is stateless and “easy” to horizontally scale. However, since one is limited in the number of IP addresses one can use to get to the application, and network latency will have an impact at a point, the “front” has state. Finally, the back-end data stores have state, by definition. We end up with: stateful front (Network) <-> stateless middle <-> stateful back. So our options for scaling would be: Load Balancers/Application Switches/Networking (Vertical) <-> Cache (Horizontal or Vertical) <-> Application (Horizontal) <-> Cache (Horizontal or Vertical) <-> Data (Vertical).

The limit to horizontal scale is the network and its latency. For example, you can horizontally scale out multi-master MySQL nodes (with a small and consistent dataset), but you’ll reach a point (somewhere in the 10-20 node range on a gigabit network) where latency now significantly impacts replication time around that ring.

Developing and scaling a “web” application means that you (or someone) has to deal with networking and data management (and different types of data for that matter) if you want to be cost-effective and scalable.

The approach one takes through this stack matters: platform directions

With the view above you can see the different approaches one can take to provide a platform. Amazon started with data stores, made them accessible via APIs, offered an accessible batch compute service on top of those data stores, introduced some predictability into the compute service (by offering some normal persistence), and has yet to deal with load-balancing and traffic-direction as a service. Basically they started with the back and should be working their way to the front.

At Joyent, we had different customers, customers making the choice between staying with their own hardware, or running on Joyent Accelerators. We started with the front (great networking, application switching), persistence, we let people keep their normal backends (and made them fast) and we are working for better solutions (horizontal) for data stores. Solving data storage needs weren’t as pressing because many were already wedded to a solution like MySQL or Oracle. An example of solving problems at the outermost edge of the network would be the article, The wonders of fbref and irules serving pages from Facebook’s cache. This is an example of programming in application switches to offload 5 pages responsible for 80% of an application’s traffic.

Joyent product progression is the opposite of AWS’s. We solved load-based scale with a platform that starts with great networking, well performing Accelerators, Accelerators that are more focused to do particular tasks (e.g. a MySQL cluster). We are working on data distribution for geographic scale, and making it all easier to use and more transparent (solve the final “scale”, administrative scale).

The technology stack of choice does matter: platform technology choices

Joyent Accelerators are uniquely built on the three pillars of Solaris: ZFS, DTrace and Zones. This trio is currently only present in OpenSolaris. What you put on metal is your core “operating system”. Period. Even if you call it a hypervisor, it’s basically an OS that’s running other operating systems. We put a solid kernel on our hardware.

Accelerators are meant to be inherently more performant then a XEN-based EC2 instance per unit of hardware, and to do so within normal ratios: 1 CPU/4GB RAM, utilities available in 1,2,4,8,16,32,64 gb sized chunks. The uniqueness of DTrace adds unparalleled observability, it makes it possible for us to figure out exactly what’s going on in kernel and userland and act upon it for customers in production.

ZFS lets us wrap each accelerator in a portable dataset, and as we’ve stated many times before, it makes any “server” a “storage appliance”.

Add to this Joyent’s use of f5 BigIP load-balancers, Force10 networking fabric, and dual-processor, quad-core, 32GB RAM servers.

Open and portable: platform philosophy

At Joyent, I don’t see us having an interest in running large, monolithic “services” for production applications and services. Things need to remain modular, and breakage in a given part needs to have zero to minimal impact on customers. Production applications shouldn’t use a service like S3 to serve files, they should have access to software with the same functionality and being able to run it on their own set of Accelerators.

We want software that powers services to be open, available, and enable you to run it yourself here on Accelerators, or actually anywhere you want. We develop applications ourselves exactly like you do, we tend to open source them and this is exactly what we would want from a “vendor”. This route also minimizes request (“tick”) pricing. We don’t want to entirely replace people choices in databases, instead Accelerators have to be made to be a powerful, functional base unit for them. Want to run MySQL, PostgreSQL, Oracle, J-EAI/ejabberd, … then by all means do that. No vendor lock-in.

For both platforms, we have our work cut out for us.

Joyent Mentioned in Forbes.com

Andy Greenberg just wrote a piece called Tiny Firms Offer Big Computing Services in which Joyent gets a little shout out.

The word ‘tiny’ in the title makes me giggle a bit as I guess five (5) billion page views a month on our infrastructure is considered tiny to some folks. Has me wondering how many Web properties actually get more views than that? Can’t be more than 30, right?

From that perspective, we should not be considered tiny. Nimble, yes. Tiny, no.

Fermions, Bosons and the 6 Utilities

When I used to teach university chemistry, I’d always start with the statement:

The universe (at one level) is made of two things and two things only: fermions and bosons.

Fermions are the things that have “stuff”: they have mass and can be charged (or not). Bosons are the things that have no “stuff”: they do not have mass nor do they have charge. Bosons in many ways are the things that move fermions. This comes from Quantum Mechanics, where we see that Fermions have spins of +1/2 or -1/2, and Bosons have a spin of 1. This the baseline and the binary division is given to us by the Standard Model.

We also already had an understanding of this division: Fermions are matter, and Bosons are energy. Matter is the stuff of the universe, and energy moves matter.

A simple, appealing, mutually exclusive, yin-and-yang description of things. I don’t mind things that end up being in powers of 2 or 10, or form a nice little tree.

I like to think we have a similar division in compute utilities: things that take up space (Fermions/Matter) and things that move or are the movement of stuff (Bosons/Energy).

Conceptually I group them as

The fermions

1) CPU space
2) Memory space
3) Disc space

The bosons

4) Memory bus IO
5) Disc IO
6) Network IO

This in my mind forms the 6 Utilities that we must have fine-grained, differential controls and metrics on in a “cloud computer” that fairly serves many people. We have to understand the possible minimum and maximum values, and we have to figure out how to balance them all with real workloads. These are the prerequisites that we watch, measure and learn from so we can ask and answer questions such as “How do I pair together one customer that’s CPU-intensive and another that’s disc IO-intensive and have the sum appear just like a single, well performing CPU- and Disk IO-intensive application?”.

The reality is that most operating systems still don’t have a complete set of tools around the 6 Utilities in terms of resource management, QoS (quality of service), virtualization and teasing these apart in a way that serves a number of people sharing physical resources. Operating systems still are basically for a single person using a single “computer” at a time, and there’s real challenges around saying that we should just use BIG servers and divvy them up. There’s even challenges around many cores and lots of RAM.

I wonder if we can in fact have a single purpose operating system that serves both the single user and the “cloud”, and based on the work we’ve been doing with OpenSolaris, I’d say “No”.

NetApp versus Sun, Sun versus NetApp, and Both versus Common Sense

As you might have heard and likely read in the back-and-forth blogging of Dave Hitz (a NetApp founder) and Jonathan Schwartz (CEO of Sun Microsystems), the two are at each other’s throats. Well not really at each other’s throats: NetApp went nuclear and Sun hit back even harder.

Basically NetApp says that Sun’s ZFS steals it’s existence from NetApp’s WAFL, an earlier Copy-On-Write (COW) file system. Sun in return wants to ensure that NetApp cannot sell its filers and pull NetApp’s ability to legally use NFS.

For those of you that need reminding, Sun invented NFS (the wikipedia page has a sufficient history) but NetApp did the best commercial appliance implementation. I forget if NetApp handed Sun their asses in the marketplace with a superior product, or if Sun just never delivered a filer-like appliance (I say “I forget” because while I can recall using NetApp, I can’t remember using a single offering from Sun beyond just a workstation).

NetApp filers do happen to be one of the best NFS appliances out there, and they happen to combine that with iSCSI and Fiber Channel in one rig. Pretty flexible, and if you look around you’ll see that it’s pretty unique. For example, an EMC Clariion does iSCSI or Fiber Channel, and if you want it to do NFS, then you basically buy a “gateway” (just a server attached by fiber).

We at Joyent are in a bit of a odd and unique position (I think). We happen to use what I call The Holy Z2D Trinity (ZFS, Zones and DTrace), and not just that we use ZFS on top of block storage from NetApp filers. And the team here from myself, to Mark, to Ben have written tools for filers, have managed many petabytes worth of them (often petabytes all at one time), and have been around them since they came out.

Hmm.

In fact, besides my common boast that we’re one of the larger or the largest OpenSolaris/Solaris Nevada installations in the world, I’d venture to guess that we have more ZFS on top of NetApp filers via iSCSI than just about anyone else.

Now let’s think how we even got here in the first place.

  • NetApp developed a nice little filesystem they call WAFL.
  • NetApp contributes to FreeBSD and FreeBSD is the core OS in some of their products (yes please don’t try and “educate” me on a rig I’ve used for a decade, I said “some”, I know they have their own OS).
  • WAFL or something like it has presumably been ported to FreeBSD.
  • When pulling up block storage LUNs from our filers, we still need to a filesystem on it. (Got that? Not everything is NFS, There still needs to be a filesystem on the server.)
  • Nothing like WAFL or a nice consistent COW filesystem was ever contributed back to FreeBSD.
  • FreeBSD instead just has UFS2, and while it has soft updates, you’ll still to need to run a fsck in single user mode for hours once you’re in the 100s of GBs, and ironically the busier the system is, the worse that is (again, yes I know there’s a background fsck, try running that on a busy mail server, it’ll crash, guaranteed).

So larger drives (or LUNs) plus a shitty, non-resilient file system meant that all of FreeBSD had to go. Despite everything great about it and a lot of time invested by the team over a decade in using it. We had to leave FreeBSD and go to wherever we could to get a good, easy to use, resilient, modern file system.

That file system happened to be ZFS and the operating system was OpenSolaris. The additions of DTrace and Zones (we used Jails before) formed our three pillars.

But stop and imagine for a minute that NetApp had been kind enough to give WAFL or a similar filesystem back to the FreeBSD project. Imagine that we had something like WAFL to put on top of our block storage LUNs that were coming up from the NetApp filers.

Got that?

We didn’t want or need WAFL on a real operating system to develop a competitive storage product, we needed WAFL-like on a real operating system in order to simply use our NetApps.

Then we wouldn’t have ever made our first step to OpenSolaris. We’d currently have a business based on FreeBSD 7 with WAFL, Jails and DTrace (hooray for that port!), and believe me leaving FreeBSD was painful in many ways, with the salve being ZFS and DTrace.

While I think there is a good degree of posturing going on betwee the two companies, and it’s fascinating to see it going on in blogs, both parties are full of it and don’t quite get it.

NetApp Dave:

You should have given WAFL or a WAFL-lite file system to FreeBSD and then all of us could happily use it on top of iSCSI or fibre-channel block storage. You would have made it the best operating system to put on top of any networked block storage, being smart guys, you could have figured out how to do it while making NetApp even more money, and without spawning a bunch of FreeBSD storage appliance clones. The fact that you didn’t do this is why Joyent uses Solaris, and you’re responsible for the new market need that’s out there for ZFS and via ZFS, Solaris itself.

Stop being so shortsighted and give a little, think FreeBSD 7.5 + WAFL.

We’re the poster child for OpenSolaris adoption, and that fact that we’re using it … you could have prevented it.

Sun Jonathan:

It’s great that ZFS was developed, magnificent that it was open-sourced, and I know that it was valid and true creation of Jeff Bonwick (who we count as a good friend). You make hardware, you created NFS, you’ve always touted the importance of the network in computing, yet I’ve had to always use NetApps to get fast and reliable NFS and iSCSI block storage (even to put my ZFS filesystems on).

Ship a decent piece of hardware + software capable of fast and reliable NFS + iSCSI. NetApp only exists because of Sun’s failures in actual product development.

“What Ifs”

If NetApp wins, there goes our beloved ZFS (yes I understand indemnity, but I care more about continued development) and NetApp, you don’t have an alternative for us. Thanks for nothin’.

If Sun wins, there’s goes our NetApps, and Sun, we can’t get an alternative from you. Thanks for nothin’.

And finally

I think we’ve been a good customer of both of you, when you fight, it hurts us more than anyone else.

Previously