What is Apache Cassandra?

Apache Cassandra, a top level Apache project born at Facebook and built on Amazon’s Dynamo and Google’s BigTable, is a distributed database for managing large amounts of structured data across many commodity servers, while providing highly available service and no single point of failure.  Cassandra offers capabilities that relational databases and other NoSQL databases simply cannot match such as: continuous availability, linear scale performance, operational simplicity and easy data distribution across multiple data centers and cloud availability zones.Ring Architecture

Cassandra’s architecture is responsible for its ability to scale, perform, and offer continuous uptime. Rather than using a legacy master-slave or a manual and difficult-to-maintain sharded architecture, Cassandra has a masterless “ring” design that is elegant, easy to setup, and easy to maintain.

In Cassandra, all nodes play an identical role; there is no concept of a master node, with all nodes communicating with each other equally. Cassandra’s built-for-scale architecture means that it is capable of handling large amounts of data and thousands of concurrent users or operations per second—​even across multiple data centers—​as easily as it can manage much smaller amounts of data and user traffic. Cassandra’s architecture also means that, unlike other master-slave or sharded systems, it has no single point of failure and therefore is capable of offering true continuous availability and uptime — simply add new nodes to an existing cluster without having to take it down.

Many companies have successfully deployed and benefited from Apache Cassandra including some large companies such as: AppleComcast, InstagramSpotifyeBayRackspaceNetflix, and many more. The larger production environments have PB’s of data in clusters of over 75,000 nodes. Cassandra is available under the Apache 2.0 license.

Development
Development

Always on Architecture  A true masterless architecture (unlike other master/slave RDBMS and NoSQL databases) delivers continuous availability for your applications.

Natively Distributed  The gold standard in multi-data center and cloud replication supplies real write/read anywhere capabilities, allowing you to easily put data where it’s needed anywhere in the world.

Fast Linear-Scale Performance — Enables millisecond response times with linear scalability (double your throughput with two nodes, quadruple it with four, and so on) to deliver response time speeds your customers have come to expect.

Flexible Data Model — The Apache Cassandra data model allows for new entities or attributes to be added over time and you’re not restricted to a rigid data model that can’t evolve with the needs of the business application — such as the addition of a new complicated data structure that may be unique to your environment, or adding a new column to a column family.

Language Drivers — Cassandra supports an incredible array of language drivers to ensure that your application runs optimally on Cassandra – whether Python, C#/.NET, C++, Ruby, Java, Go, and many more.

Operational and Developmental Simplicity — With all nodes in a cluster being the same, there is no complex software tiers to manage so administration duties are greatly simplified. Plus, the Cassandra Query Language (CQL) looks and acts just like SQL, which makes moving to Cassandra from any RDBMS very easy.

Strong Developer Community — There is a rich developer community that surrounds Apache Cassandra that strives to support developers working on the project, as well as those developing applications that leverage the database.  Active in the IRC chat room and mailing lists, the Cassandra developer community is one of the most active for an open source project

Operations
Operations

Always Online Architecture —  Cassandra’s masterless “ring” architecture provides your application’s end user with always-on access to their data, even in the event of rack, machine, or entire data center failure.

Native Multi-Data Center Replication – Cross data center (in multiple geographies) and multi-cloud availability zone support for writes/reads.

Transparent Fault Detection and Recovery – Nodes that fail can easily be restored or replaced.

Tunable Data Consistency – Support for strong or eventual data consistency across a widely distributed cluster.

OpsCenter Monitoring/Management Tool — A graphical management and monitoring tool for Cassandra that provides a view of the system from a centralized dashboard. OpsCenter installs seamlessly, and gives system operators the flexibility to monitor and manage even the most complex workloads with ease from any web browser.

Central IT / CIO
Central IT / CIO

Runs on Commodity Hardware — Apache Cassandra is built-to-run on commodity hardware and is unparalleled in value. Don’t waste another dime on disaster recovery, high-end hardware, or revenue loss due to downtime. Focus your resources on building a great application, not on maintaining an expensive backend.

Mitigate Risks of Downtime — Apache Cassandra’s architecture is built with no single point of failure. If a node (rack, machine, or entire data center) goes down, another is available to take its place and serve read/write requests without interruption.

Improved Customer Experience — Apache Cassandra’s high availability and superior performance  gives businesses, and their mission-critical applications,  the ability to provide customers with a superior user experience.

Faster Time to Market — DataStax goes beyond standard open-source deployments  by providing  resources that make it easier to  deliver  Apache Cassandra in a single data center, or across multiple data centers, and clouds.

DataStax Enterprise — DataStax offers DataStax Enterprise, production certified Apache Cassandra, with 24 x 7 x 365 support and the latest integrated big data tools that ensure success in deploying, operating, and maintaining your production deployment.

Development
Development
Operations
Operations
Central IT / CIO
Central IT / CIO

Always on Architecture  A true masterless architecture (unlike other master/slave RDBMS and NoSQL databases) delivers continuous availability for your applications.

Natively Distributed  The gold standard in multi-data center and cloud replication supplies real write/read anywhere capabilities, allowing you to easily put data where it’s needed anywhere in the world.

Fast Linear-Scale Performance — Enables millisecond response times with linear scalability (double your throughput with two nodes, quadruple it with four, and so on) to deliver response time speeds your customers have come to expect.

Flexible Data Model — The Apache Cassandra data model allows for new entities or attributes to be added over time and you’re not restricted to a rigid data model that can’t evolve with the needs of the business application — such as the addition of a new complicated data structure that may be unique to your environment, or adding a new column to a column family.

Language Drivers — Cassandra supports an incredible array of language drivers to ensure that your application runs optimally on Cassandra – whether Python, C#/.NET, C++, Ruby, Java, Go, and many more.

Operational and Developmental Simplicity — With all nodes in a cluster being the same, there is no complex software tiers to manage so administration duties are greatly simplified. Plus, the Cassandra Query Language (CQL) looks and acts just like SQL, which makes moving to Cassandra from any RDBMS very easy.

Strong Developer Community — There is a rich developer community that surrounds Apache Cassandra that strives to support developers working on the project, as well as those developing applications that leverage the database.  Active in the IRC chat room and mailing lists, the Cassandra developer community is one of the most active for an open source project

Always Online Architecture —  Cassandra’s masterless “ring” architecture provides your application’s end user with always-on access to their data, even in the event of rack, machine, or entire data center failure.

Native Multi-Data Center Replication – Cross data center (in multiple geographies) and multi-cloud availability zone support for writes/reads.

Transparent Fault Detection and Recovery – Nodes that fail can easily be restored or replaced.

Tunable Data Consistency – Support for strong or eventual data consistency across a widely distributed cluster.

OpsCenter Monitoring/Management Tool — A graphical management and monitoring tool for Cassandra that provides a view of the system from a centralized dashboard. OpsCenter installs seamlessly, and gives system operators the flexibility to monitor and manage even the most complex workloads with ease from any web browser.

Runs on Commodity Hardware — Apache Cassandra is built-to-run on commodity hardware and is unparalleled in value. Don’t waste another dime on disaster recovery, high-end hardware, or revenue loss due to downtime. Focus your resources on building a great application, not on maintaining an expensive backend.

Mitigate Risks of Downtime — Apache Cassandra’s architecture is built with no single point of failure. If a node (rack, machine, or entire data center) goes down, another is available to take its place and serve read/write requests without interruption.

Improved Customer Experience — Apache Cassandra’s high availability and superior performance  gives businesses, and their mission-critical applications,  the ability to provide customers with a superior user experience.

Faster Time to Market — DataStax goes beyond standard open-source deployments  by providing  resources that make it easier to  deliver  Apache Cassandra in a single data center, or across multiple data centers, and clouds.

DataStax Enterprise — DataStax offers DataStax Enterprise, production certified Apache Cassandra, with 24 x 7 x 365 support and the latest integrated big data tools that ensure success in deploying, operating, and maintaining your production deployment.

Next : Learn How Real-World

Companies Use Cassandra

Follow @twitter