Mina Naguib: Director of Infrastructure at AdGear Technologies
AdGear is an independent advertising platform for publishers and advertisers. We empower digital media innovators by offering a wide range of ad technologies that spans across display advertising, video, mobile and real-time bidding technology.
As Director of Infrastructure Engineering, I co-ordinate the teams of software developers, systems admins and developer operations to get everything to work well together. I do a little bit of development myself, but mostly it’s helping everyone get their pieces to work and grow well within our ecosystem.
Targeted ads with Cassandra
The primary use case of Apache Cassandra is what we call our user data store. For online advertising, we collect some information about our users. In our sphere, a “user” isn’t an identifiable person, but simply a web browser or device. Cassandra is the data store where we collect several pieces of data about these browsers to help us deliver better ads, better-targeted ads or to enable some of the functionality that AdGear relies on.
Path to Cassandra
For the most part, it was a volume game. At one point, when AdGear was small and we were not interacting with that many web browsers, we actually did not have a server-side data store so we completely relied on cookies; that only works up until a certain point. When we looked at their options for actually hosting this data in a data store, instead of distributing in the web browsers themselves, there were not that many options at the time. Cassandra, at that point, was in version 0.6.
I’d say between three and four years ago is when we seriously looked into Cassandra and played with it. It was still a little bit hairy at that point in time, but it was still a great product. Honestly, there weren’t that many candidates out there that compared.
Availability, latency with Cassandra
I hate to gush, but all of the features that Cassandra offers are features we needed. We manage four distinct data centers and there are different replication properties that we needed per data center. We also are extremely latency sensitive, where the different calls coming into the AdGear platform have different characteristics. In some particular cases, especially first party ad serving, there are some cases where simply AdGear cannot be down.
We invest heavily in making sure that we are highly available and that we’re responding fast on different layers. We needed a database that could keep up and, when it comes to the actual database operations, not slow us down or force us to make many compromises. Cassandra fit the bill perfectly.
We’re in Montreal, Quebec, Canada. We also have a data center here in Montreal and a secondary backup data center in the city as well. We also operate data centers in Los Angeles, New York, and London.
We run on some medium and high end Dell servers. Initially, everything was on spinning disks. When we took a closer look at our Cassandra histograms and actually engaged DataStax, the biggest recommendation was to switch to SSDs. We made that switch around a year-and-a-half ago. Since then, our read latencies are steady and low.
The best advice I can give is simply read the documentation, as cliché as it is. It’s fairly well detailed what to expect from this database, what the tunables are and, most importantly, don’t treat it as a relational database; it doesn’t work that way. You have to accommodate your patterns to how the database is designed. Once you do that, it will serve you very well.
My impression of the community and its involvement has mostly been online, whether it’s the mailing list, contacting DataStax professionally, or joining a chat on the IRC Channel.
Join the Montreal Cassandra Users group.