Cassandra – 2 – Basics of Cassandra

The word Cassandra was taken from the name of ancient Greet prophet Cassandra.

Apache Cassandra is a distributed NoSQL database system. It is a distributed database system. It is NOT RDBMS system like Oracle. It is NoSQL database.

High availability and linear scalability are some key benefits of Cassandra database. Cassandra is designed for high-volume, low-latency cloud applications.

Cassandra as such is open source database but Datastax is major organization which supports the use of Cassandra and has commercial Cassandra products which makes it easy for Clients to install Cassandra in production systems.

Some key benefits of the Cassandra database are:

Open Source — So no charge and you can customize according to your needs.
High Fault Tolerance – Data is propagated “automatically” to multiple nodes. There is no single point of failure
High Throughput – which means high performance
No single point of failure – because of distributed architecture. All nodes are identical, no mater-slave relation
Familiar SQL command line – Many commands of Cassandra Query Language (CQL) are similar to the SQL language
Awesome Scalability – Without any downtime/interruption, scale your database. Horizontal scalability adds more node to cluster easily
Flexible database – Example: A table can have a varying number of columns among its rows!
Less costly hardware – yes, you can use commodity low-cost hardware to build your database and it will not impact your performance
Easy Administration – less moving part as compared to Oracle database so makes it easy to administer. Adding More nodes do not mean more complexity. Easy to install too.

Cassandra is a partitioned data store,. “Partitioned” here means that the database uses unique keys for each row to distribute the rows across multiple nodes.

Some key features which makes Cassandra a high performance database are:

Compression – Data is compressed to reduce the volume of data stored on disk and also to lower the disk I/O
Data Cache – Cached data is stored across the cluster so that when one node become unavailable the client reads data from another cached copy. Partition key cache and row cache are the two type of data cache used.
Bloom Filters usage – It is a probabilistic data structure which is designed to tell you, rapidly and memory-efficiently, whether an element is present in a set.
Compaction – Deletes are not actually deleted at once. At the background, Cassandra goes on compacting the data based on rules defined

Some drawbacks/limitations of Cassandra databases are:

Consistency of course is not granted at all the times. Data propagation does involve some latency
No joins, no foreign keys, no indexes in Cassandra. RDMBS DBAs are very much familiar with these terms.
Transactions concepts- locking/rollback etc. do not apply here. Some lightweight transaction is still there though

Let us list down some difference in data/data models between RDBMS and Cassandra:

PARAMETER	RDBMS	CASSANDRA
Type of Data	Structured	Unstructured
Type of schema	Fixed	Flexible
What is the significance of a column?	Columns represent a relation’s attributes.	Columns are a unit of storage
What is the significance of a row?	A row is a single record	A row is a unit of replication
How are relations defined?	Uses foreign keys and joins	Uses collections to represent relationships
What is the significance of a table?	array of arrays (records)	list of nested key-value pairs
What is the container for all data?	Database is the outermost data container	Keyspace is the outermost container for data

Fast performance and support for a large number of complex data types, distributed data kind of features has made Cassandra a popular NoSQL database.

Cassandra works great for data where not much updates/deletes happens and you need fast writes. Also everything in Cassandra is Java-based so as an Administrator if you know basics of Java then it will help you in understanding errors that pops up sometimes, tuning the JVMs and also to monitor it efficiently.

Author
Recent Posts

Brijesh Gogia

I’m an experienced Cloud/Oracle Applications/DBA Architect with more than 15 years of full-time DBA/Architect experience. I have gained wide knowledge on Oracle and Non-Oracle software stack running on-prem and on Cloud and have worked on several big projects for multi-national companies. I enjoy working with leading-edge technology and have a passion for Cloud architecture, automation, database performance, and stability. Thankfully my work allows me time for researching new technologies (and to write about them).

Cassandra – 2 – Basics of Cassandra

Related posts: