Shion Deysarkar: CEO at Datafiniti
Phil Coleman: Data Team Lead at Datafiniti
Michael Pellon: Operations Engineer at Datafiniti
Guys, thanks for making the time to talk with us today. Can you give us a quick overview of Datafiniti?
Datafiniti is a search engine for data. We’ve built a catalog of all structured data available on the Web; things like businesses, people, products, and more. We keep a massive database built on DSE of all this information that can be searched and used to generate custom output that meets our customer’s inputs and criteria.
We were originally focused on Web crawling technologies at a different company for about three years and then we pivoted that into our new company, which is Datafiniti. Our customers subscribe to our service and pay us based on the amount of data they receive from us.
How do you technically make all that happen?
There are two major components to our stack. One is the crawling infrastructure and the other is the search part, all of which is hosted in our data center. Where the crawling part is concerned, we have a number of servers that connect to volunteer computers all over the world that help collect our data for us.
Our search component makes use of DataStax Enterprise with Cassandra and Solr, with an API that sits on top of that for customers to query our database.
Did you guys start out using NoSQL for your database or transition from an RDBMS?
We didn’t consider relational technology, but started out by looking at all the various NoSQL options like HBase and search software like Elasticsearch. We ended up deciding on Cassandra for its non-centralized approach and easy scaling. Cassandra allows us to store the big amounts of data that we need to consume and manage.
What was missing in Cassandra was the Google-type search functionality that we needed. When we saw that DataStax had integrated Solr with Cassandra in DataStax Enterprise, it was just a natural evolution for us to use it as our database.
Also, when we tested Solr in DataStax Enterprise, we saw that it worked and performed better than open source Solr, which was also a win for us.
Did anything else come into play with your decision making process?
At the time, manageability was something important to us – we wanted a database that would be easy to install, manage, and grow. DataStax Enterprise was just the best option for what we do and need.
We also looked at various benchmarks and saw that Cassandra ran faster than the other options we were considering.
What are the some of the business benefits you’ve experienced with DataStax Enterprise?
The primary benefit is that we’re able to deliver much faster search operations to our customers with DSE, and as everyone knows, customers don’t like to wait long when it comes to searching for what they want.
What advice what you give to people who are just starting out with NoSQL and/or DataStax Enterprise?
Pay close attention to the demos and samples that ship with DSE because they will help you quickly get things set up and understand how things work. Also, make sure you understand how the individual components of DSE – Cassandra, Hadoop, and Solr – work independently of each other.
Lastly, it’s good to know up front how you go from Cassandra, which has a very flexible and fluid schema model to one that’s more restrictive like Solr.
Guys, thanks for the time.