David Johnson: CEO at Engine
Curtis Lacy: VP of Technology at Engine
Engine provides an e-mail productivity tool that allows people to answer their e-mail much easier than they do today. Engine connects and indexes all your different sources of data so when you get an e-mail, Engine sees what might be related in the e-mail and passes it on to you.
Engine started out with MongoDB and still use it for some things that require lighter lifting such as authentication to use and that sort of thing. Scale became the biggest factor so they looked towards Cassandra. Cassandra provided them a straightforward way to basically grow without limit, which was a big draw. They found a Cassandra cluster is relatively straightforward and far easier to work with than Mongo.
Engine is using DataStax Enterprise with Cassandra to power the indexing of files and other data that is inserted and flows into the database when you sign onto the service. Engine currently operates in the cloud with nodes spread across multiple cloud availability zones.
David, can you provide some background on Engine?
DJ: Engine provides an e-mail productivity tool that allows people to answer their e-mail much easier than they do today. We believe that one of the biggest pain points for e-mail users is receiving a lot of different requests from people for information, for files or information they have to look up from previous e-mails or other applications.
A lot of times somebody says, “Hey, where are those files from last week or where are those notes from the last conversation or where is the contract?” For that kind of stuff, there is no reason we should be doing that type of manual search anymore.
So, we connect and index all your different sources of data, so when you get an e-mail, Engine sees the person e-mailing you, the context and all the other factors about what might be related to the information in the e-mail. Then we go and fetch that information, bring it to the sidebar and display it right next to the e-mail that the user is reading so they no longer have to go and manually search for those files. So if somebody requests files or notes or a date or the contact information, or anything of that nature, it pops up automatically. That is really our over-arching goal – to see how much time we can knock out of people’s day in the e-mail category.
That’s fantastic. I can’t tell you how helpful that would be. I get email requests all the time and have to go searching for the document that people want. Can I download it today?
DJ: Yes you can download it now. And your reaction is very much what drove me to make it for myself in a way. I had 50,000 e-mails in my inbox in 2010, and realized there is just a huge need in this area and if you look at the studies, people spend between two to four hours a day answering e-mail.
If you go to Engine.co, you can create an account by logging in with your Google ID and then you can connect your e-mail accounts, your calendars, your contacts, your LinkedIn, Facebook, Twitter, Dropbox and whatever you would like. Then, we will index all that information and you download the Gmail extension and it will show up on your sidebar.
Is there any additional background you can provide about Engine?
DJ: We recently crossed 20,000 total interactions where people are clicking on results in the sidebar or hovering over them and reading the full text and using that information to answer e-mails. So, we have had recently 1000% increase in the last month, as far as the number of user interactions. So, that’s the big metric that we really track is how much people are actually using the results we are providing, and it’s growing very fast.
Can you give me an idea about the software infrastructure that you use to pull this off? I am assuming you guys are cloud-based?
CL: We are cloud-based at the moment, but we are exploring a couple of different options. We have a small cluster of machines doing front-end work and some other things like downloading data, scanning and indexing data, but the core of our system is really a collection of servers running DataStax Enterprise, using both Cassandra and Solr.
What factors drove you toward using NoSQL technology versus a relational database?
CL: We started out with MongoDB and still use it for some things that require lighter lifting such as authentication to use and that sort of thing. But the big draw of NoSQL for me was that we did not know at the outset what everything would look like, what data we were eventually going to need and what we are going to have indexed as we went. So, not needing a firm schema up front was absolutely huge. As it evolves, we still have a lot of freedom through NoSQL to evolve things without having to spend a whole lot of time on database migrations, which is really nice.
Outside of the more flexible schema that you find in NoSQL, what were the things that caused you to really center on Cassandra?
CL: Scale was the biggest factor. Cassandra provides a straightforward way to basically grow without limit. That was really the big draw. It is possible to make Mongo scale as well, but it gets bogged down quickly. A Cassandra cluster is relatively straightforward and far easier to work with than Mongo in my experience.
You mentioned the importance of search earlier, so I assume that you switched to DataStax Enterprise from open source because of Solr and search integration?
CL: Yes, the Solr integration was a major draw.
How do the various components of DataStax Enterprise power your application?
CL: Solr powers the sidebar that displays alongside your email with the associated files you don’t need to search for manually. And Cassandra powers the indexing of files and other data that is inserted and flows into the database when you sign onto the service.
Do you see yourself going to multiple-cloud availability zones for high-availability purposes or anything like that?
CL: We actually already have the nodes in separate availability zones. So, we are already doing a certain amount of that. And we are all operating kind of as peers, so that any of our client machines can read from any of them or write to any of them. So, we are aiming to provide a lot of that capability already so that we can patch one and not actually have any downtime as a result.
If you were to summarize the benefits that you have realized from DataStax Enterprise, what stands out?
DJ: The flexibility definitely aided our development time. Knowing that the scalability and redundancy is there has meant that I haven’t needed to design for it as much at the onset. So, I know that the scalability is there when I need it and can focus more on the application itself.