Joe Miller: Lead Systems Engineer at Pantheon
Matt Pfeil: Co-Founder at DataStax
TL;DR: Pantheon is a platform for Drupal web sites; they have a platform that allows Drupal developers and Drupal development shops to easily and cost effectively build web sites on the Internet based around Drupal.
Pantheon chose Cassandra becauase they needed a database technology that could be hosted in the cloud and that could scale. The first use of Cassandra is their core API and core database of customer data, sites, things that we need to provision, and web sites on the platform; the Cassandra cluster for this feature contains 3 nodes. Their other Cassandra cluster is for the Valhalla file system, which contains 6 nodes.
Hello, Planet Cassandra, this is Matt Pfeil. Today I’m here with Joe Miller, Lead Systems Engineer at Pantheon. Joe, thanks for taking some time with us this morning; why don’t you tell everyone what you do and what Pantheon does?
Sure. What I do at Pantheon is primarily operations engineering and Pantheon is a platform for Drupal web sites. We have a platform that allows Drupal developers and Drupal development shops to easily and cost effectively build web sites on the Internet based around Drupal; we also launch and scale those web sites. We have some great features and we help them with automating those practices and spinning up dev, test, and live environments with just a few clicks of a button.
For the few people out there who aren’t aware of what Drupal is, why don’t you give them a quick overview of that, as well?
Sure. Drupal is a content management system built on PHP. It’s also an open source project. It’s used primarily on content heavy web sites. Some popular ones are the New Republic, whitehouse.gov, and the Economist.com.
That’s awesome. I wasn’t aware that whitehouse.gov is running atop Drupal.
Yeah, Drupal has seen a lot of deployment in the government sector and with higher education institutions.
That’s awesome. At Pantheon, what was your motivation for looking at Cassandra?
For Cassandra, we were looking for something that would work well in the cloud. We are primarily hosted in a cloud environment, so we needed something that could scale well with us as we grew with the business. Like most start-ups, we started small and had a feeling we were going to keep growing; Cassandra was definitely right up there on the top on our list as a necessity. The ease of scaling and the ease of adapting and building our core API and our file systems around Cassandra were actually pretty easy; primarily our attraction to it was the scale and ease of use.
Very cool. So when you say scaling, is it primarily produced off of either read/write requests, or just sheer volume of data?
We have two main use cases with Cassandra. The first is our core API and core database of customer data, sites, things that we need to provision, and web sites on the platform. The other is for our Valhalla file system. We actually use Cassandra to scale out the meta data aspect of this clustered scalable file system in the cloud.
So you’re using Cassandra for all of the meta data around the file itself?
Right. We’ve actually used it to cache file content as well but will be moving away from that and storing only metadata. The core API’s dataset is much smaller in terms of raw data, but very high in terms of read and write operations; it’s very transactional.
That’s funny, talk about full circle. Jonathan Ellis and I, before we started DataStax, met at Rackspace and at the time he was actually working on Cassandra. One of the first use cases was for a meta data store for Rackspace’s block storage in the cloud. That’s actually how we met, was over a conference call on that project. It’s good to see that the idea has lived on elsewhere.
Interesting, yeah. It’s pretty cool and a very core part of our platform that no other PaaS (platform as a service) has right now.
Can you tell us a little information about what your deployment looks like?
Sure. We’re a on the smaller side, in comparison to some of the massive data pipeline-type use cases that you typically see with Cassandra; we have two primary clusters at the moment hosted in Rackspace cloud.
We are evaluating physical hardware but have no current plans to move Cassandra. Our API system that I had mentioned earlier is a three-node cluster and our file system cluster, the Valhalla system, is six nodes. What we do with these nodes is spin up the largest nodes we can in the cloud in order to get the most disk IO. The API cluster is especially CPU intensive but deploying on the largest nodes is the best way we have found to avoid bad or noisy neighbor effects for both CPU and IO.
Excellent. Joe, I think that’s all the questions I have for you today. Thank you very much for your time. For everyone out there, check out GetPantheon.com if you have any needs.