Matt Pfeil: Co-Founder at DataStax
Bryan Hawkins: Senior Software Engineer at Proofpoint
Jeff Sabin: Senior Software Engineer at Proofpoint
Matt: My name is Matt Pfeil and I’m here today with Bryan Hawkins and Jeff Sabin of Proofpoint. These are guys that have built the KairosDB utility and I’d like to learn more from both of them about what they’re doing. First of all guys, thank you for joining us. Why don’t you tell a little bit about what Proofpoint does and what your role is at the company?
Bryan: My name is Bryan Hawkins. I’m a Senior Software Engineer here at Proofpoint. Proofpoint actually has several products around email and spam filtering. Internally we want to track utilization of our systems and find out what’s going on. So we’re currently working on a project to monitor internal systems, to find out whether things are performing or not and that’s where we’re starting to use KairosDB.
Jeff: Proofpoint is security as a service, so we host most of our solutions. It’s important for us to be able to monitor the status of our applications, to make sure that customers’ response time is very quick.
Matt: This work with email and security led to the evolution of KairosDB. You guys built this. What is its history and can you tell us more about exactly the specific use case, what it was built for?
Bryan: Yes. We were looking at specifically monitoring service level agreements that we have with our products. I had experience with Cassandra at my last company. We first started out looking at open TSDB, which is a time series database that stores data points associated with time. Open TSDB works on Hbase and we decided that from my experience with Cassandra that Cassandra would be a better fit for this application. So we forked the original project, rewrote it, put it on Cassandra and it’s been really nice to work with Cassandra from that point on.
Jeff: I’m new to Cassandra. This is my first experience but it’s been a very enjoyable experience. It’s very easy to use. It’s performed very well for us. It’s very quick. We’ve been amazed at how quick the response times are. We’re really pleased that we’re using Cassandra.
Matt: Other than performance, what makes Cassandra such a great fit for KairosDB and time series data in particular?
Bryan: Time series data, like I said, is a data point in time, so you have a timestamp, value and with KairosDB you can associate a key value pair with each of your data points. With Cassandra, because of the wide rows that it supports, I can write a lot of data to a single row for a particular metric and that makes getting the data out really fast and putting it in really fast.
Matt: In other words, it’s something like you have a given thing that’s producing metrics and you assign that to a row and use columns to assign the individual metric points. Is that the general data model?
Bryan: Yes, the columns represent the timestamp, basically.
Matt: Cassandra’s ability to support two billion of those per row really comes in handy.
Bryan: It does. Being able to pull out a lot in one chunk because of the way you can slice columns, it works really fast for us.
Matt: That’s awesome. In terms of when the rest of the world should be looking at KairosDB, can you maybe give some insight or some flavor into some generic use cases you see where it’s a good fit?
Bryan: Yes. If ever you have a question of what happened to my system as far as an application or whatever, you can use KairosDB. Actually, you should have been using KairosDB before that. KairosDB can be storing metrics, such as CP utilization, query times from the web server. That kind of information can be supported in the system and then you can query that back and then correlate events that happened within your network and say “hey, I have a high CPU utilization on my cluster, what’s going on?” You can also find out, “hey, I have a backup on my SQL Server. The response times went way up on that, maybe I’m running out of disk space.” You can find solutions to your problems based on that information.
Jeff: We actually had an interesting experience where we were demoing the product to some internal employees. One of them was from our operations team and he was looking at one of the charts that Kairos produced and asked us if that was live data and we said yes, this is live data. He asked if we could show him the data for today. We showed him the data for today and he was interested because they had just made some changes to their servers and he wanted to see if the changes they had made had made an effect or not. It was very easy for him to see based on the charts that Kairos had built for him.
Matt: That’s awesome. If someone wanted to get started with technology right now, what should they do?
Bryan: We have a project up on Google code. Googlecode/p/kairosdb, there’s a getting started guide. You can download KairosDB. You can start it even without running a Cassandra cluster. The Kairos system actually runs by default on an internal memory database, so you can just play with the architecture and then hook in a Cassandra node afterwards.??There’s also a discussion group link off the main page. Click on that and ask questions. There’s lots of people. We’re getting a good following. People are out there wanting to help out.
Jeff: We’re interested in suggestions and enhancements to the product, so please offer suggestions and join our discussion group.
Matt: Guys, I really want to thank you for your time today. I’ll say to this to everyone that’s listening or reading this, open source projects such as KairosDB really thrive not only from other developers helping build it, but also from people who use it and provide their feedback. Whenever it comes to things like time series data, please check out this project and please see if it’s a great use case for you, because if so, these guys have put a lot of blood, sweat and tears into it and they could actually love your feedback.?? Again gentlemen, thanks for your time today.
Bryan: Thanks Matt.
Jeff: Thank you.
Matt: Good luck with the project.