Parse started out as a platform that focused on providing tools for a lot of mobile developers to focus on building the apps, rather than worrying about databases and handling servers. Since that initial goal, they’ve expanded to multiple platforms and they also now support some web products. The things Parse offers: a data store, an API for sending push notifications to the various platforms, wrappers for common social APIs, web hosting, file-hosting, custom server code and hooks, and now they also offer analytics products. Christine is a Software Engineer, focused on analytics.
We used MongoDB, which works really well for developers who want to store their data in flexible schemas and not have to worry about migrations, but it is not as good for the load that we’re anticipating we’ll need to provide for a solid analytics product; namely high write capacity and high availability that will be able to capture as much data as we’re going to be throwing at them.
We researched a number of different databases and Cassandra continued to surface as an option that the community was very happy with and it seemed particularly suited for our use cases.
Our architecture actually has several other components, not just the MongoDB and Cassandra components. We also use Redis to help store some of our push notification information and generic queues. From working with both Mongo and Redis, we are very familiar with the trials and tribulations of keeping clusters up. We knew that we wanted something that would be able to handle nodes arbitrarily going down and the ability to grow the cluster flexibly.
Specifically, when requests come in, we have a bunch of app servers which handle most API requests; depending on what part of the system they’re hitting: whether they’re trying to access their app’s data, whether they’re trying to write something to analytics, or whether they’re trying to queue something, it’ll be handled in the background. That gets routed to the appropriate data store, and the request comes back.
From a developer perspective working with our platform, you don’t need to worry at all about the data model, the schema you’re working. You can just get coding and then pass handles under the covers, where data gets stored. The only thing you need to worry about is that once you define a column as a particular data type, if you start storing integers under keys, you won’t be able to suddenly start storing strings; but otherwise, when you first start playing with a Parse object, it’s intended to just feel like a dictionary. You can just put objects in and they get saved to our backend.
Again, to make it even easier for developers, we provide save-in-background functionality. We also provide a “save eventually” option which allows the request to be serialized in case the device doesn’t have connectivity at the time, and sent to the server when the device is able to connect to the internet again.
In terms of what our environment looks like, we’re using ephemeral storage on AWS, and are across three availability zones.
We see a lot of game companies that use Parse for Facebook games. We see a lot of photo-sharing apps actually, which may just be an indication of the overall app economy as a whole.
I think my favorite example is that Sesame Street recently built a Cookie Monster app and an Elmo app. They’re a great example of a company where, you can imagine, they don’t want to spend time and engineers worrying about servers and file storage and billing. They want to devote their software engineers to building the best mobile experience they can for their users, and we can handle all the backend for them to have it be a seamless experience.
Because we’re only storing basic data for analytics, everything is counts and counters right now. It means our data is actually much more dense than someone who’s storing more complex information.
We’re currently using 12 nodes, and I believe somewhere on the order of hundreds of GBs, so nothing unreasonable. Our use right now is not huge, but we also have been slowly ramping up our analytics product and recently released custom analytics, which we expect will grow over the next several months as users start taking advantage of able to store free-form analytics. We expect that our total usage will grow in the near future.
Presently, we are running on version 1.1.8 of Cassandra. I think 1.2 came out right after we felt like our system was stable enough to start using Cassandra internally, and we didn’t feel like we were ready quite yet to move to 1.2. As for whether we’ll skip that and just go to 2, I know there are a number of things in 1.2 that are tempting. I think, at this point, we will explore that as our needs also grow. Also, with 2.0, the driver support is just getting better and better, so that’s something we may see benefits from. However, right now I think 1.1.8 is solid enough to make sure it can handle all the growth over the next few months, and we have enough on our plate with the move to Facebook.
One more thing: I have to thank you guys. DataStax documentation was immensely helpful, essentially in just becoming familiar with the ability and limitations of Cassandra, so I wanted to thank you guys for that also.