Pawel Zarzycki CEO at Cognitum
Cognitum is a company from Warsaw, Poland. We are focused on Big Data and semantic technologies. We have been working with DataStax for the .NET connectivity for the Cassandra Project. We are involved in .NET driver development and a few other things. At the same time, we are a Cassandra user with one of our projects for the semantic knowledge management system that can be run both in off premise environment and also in the cloud with Windows Azure which is great option to scale required resources.
The main problem that we wanted to challenge with a Cassandra implementation was the high availability and the high scalability of the overall solution, especially in context of complex data managed with semantic technology. High availability requires efficient store for the materialization of the rules engine results and internal graph representation, which can all be done with Cassandra. We are using Cassandra also as a document repository that can be searched with Solr. This all can be clustered and used to store and search through our data graph with a semantic stack. Since we are also a partner of Microsoft in Poland, we have a ready deployment for Windows Azure allowing us to quickly enable new instances for new customers and in the same time we have the possibility to quickly scale up the solution when there is such necessity from the customer. For example, when more and more users would like to access the repository or the internal amount of information is growing higher.
Cognitum’s semantic platform
Our platform combines the already existing semantic web standards like OWL, RDF, SPARQL, and arbitrary free-text sources like Social Media or logs from various IT systems. It can also be connected indirectly with Legacy SQL data sources and from the user perspective, all that stuff can be accessed via a natural language interface that is semantically equivalent to the SPARQL query language. So the end user, typically the business analyst, has pretty nice, easy to learn interface to access all different kind of data that are actually integrated within the cluster which part of it is the Cassandra on underlying level.
Cassandra for growth
Basically, in terms of data store, it is how easy is to scale with when the data is growing. That was our primary driver to use this kind of approach and also Cassandra is pretty easy to be accessed from the development perspective. It was quite easy to build the application interfaces. From the development perspective, this is very convenient way to run the NoSQL solution.
Typically, we were running a few nodes in one cluster. In our case, the data typically grows exponentially since we are using the materialization engine from the OWL to the RDF. So, that’s why, for the end user, we have a very quick response time for the queries, because we are performing the required reasoning instantly. People who are familiar with OWL know that it requires a pretty extensive algorithm to perform the reasoning. Since we are doing this ahead of when the new information is being entered to the system, it allows us to respond to the query from the end user very quickly, but the cost of this approach is that a big amount of data that must be cached within the system and this is where Cassandra is very powerful.
Cassandra in Poland
Well, there is a growing Cassandra community in Poland. We already did a joint seminar about Big Data together with Microsoft and DataStax. Just recently we’ve introduced DataStax Enterprise for the IT community in Warsaw and we are planning to deliver jointly another seminar event for the DataStax development community also here in Warsaw and expand to other cities in Poland. So, we see that the big data topic is very vivid and Cassandra user group is growing more and more.