We are the number one source for weather on all platforms: TV, web, mobile, and API. I manage a team in our digital group that provides backend services across all our platforms (around 10 billion requests per day). Our team is also responsible for delivering scheduled and severe weather alerts.
As the number one destination for weather we have a global brand that’s on 24×7. Downtime any time of day means someone isn’t getting served, which means lost revenue, credibility, and important notifications to people who are expecting us to tell them when the weather gets rough. The ability to easily scale now and in the future is critical for us, as we are experiencing rapid growth on our digital platforms as we increase our market share and expand internationally.
We started running Cassandra in a limited capacity in spring of 2013, primarily because of my past experience and desire to introduce it at TWC (The Weather Channel). That first use case was for tracking application statistics – basically keeping all sorts of metrics about what the system was doing and how well it was performing.
A few months later we introduced a second use case: creating data mashups in our content generation system (CGS). Various applications make request to CGS to populate weather data templates, and the system is then responsible for gathering that data from various disparate services and producing a sort of materialized view for the consumer. We use Cassandra under the hood to cache those mashups for faster lookup. Supporting nearly every imaginable type of content: observations, forecasts, marine data, pollen, video content, ads, etc.
Our newest feature, Social Weather, was also launched using Cassandra.
Our number of transactions varies significantly over time, but we get about 100M transactions per day on average against our busiest Cassandra-backed service, with a heavy day seeing more like 180-200M transactions. This load tends to be bursty rather than constant.
We have grown our node count from 3 at this time last year to 36 today, and this growth has happened incrementally over time. VNodes enable us to successfully scale no matter how large our cluster becomes. They also make the initial bootstrap and repair processes less cumbersome for the cluster. And frankly, they also make it much easier to sell Cassandra to the operations guys!
We are now running 36 nodes in AWS, distributed over US East, US West, and EU West. We use c3.2xlarge instance types with two ephemeral SSDs (one for data and one for commit log). We are on version 2.0.5.
If you had a look in the past, you may have found Cassandra had a high learning curve and a fair amount of complexity. CQL3, the native drivers, and virtual nodes have changed the game entirely, making Cassandra a much more accessible and friendly platform. But CQL’s strong resemblance to SQL can lure an unsuspecting developer into a relational data model that will perform quite poorly. When modeling data, I try to think of it as a set of covering indexes–where the data is written multiple ways to directly answer whatever queries I’ll need to write. This is absolutely critical if you want to be successful using Cassandra.
I have been involved since the 0.5 release, so I’ve been around a while. I used to spend all day in the IRC channel, have contributed a little to the Cassandra codebase, attended Cassandra Summit, founded the Atlanta Cassandra Users Group, spoken about and evangelized Cassandra at local functions, answered lots of Stack Overflow questions, written an early C# client library, implemented composite support for the Cassie library, and was actually the very first certified Cassandra developer.