Skip to content

Category: DATA ENGINEERING

Cassandra – 4 – Cassandra Architecture terminology

Cassandra is a distributed database system and it runs on multiple nodes at once.  The nodes use a peer-to-peer communication protocol to exchange any information.

Some key points regarding Cassandra architecture/functioning:

  • Table rows are stored in tables, each with a mandatory primary key
  • Data gets first written to log file for durability and then somewhat similar to RDBMS databases, write the data to cache and when cache is full it write the data finally to disk
  • Automatically partition the data and replicate it.
  • At regular interval, it compacts the data in the database
  • Cluster nodes are chosen randomly to fetch the data as per client need

Leave a Comment

Cassandra – 3 – Related Terms : ACID, BASE, CAP Theorem

Oralce/MYSQL database administrators are well aware of term named ACID

ACID stands for: Atomicity, Consistency, Isolation, Durability and it is at the foundation of RDBMS success.

ATOMICITY: If one part of transaction fails, the entire transaction will fail to maintain integrity of database

CONSISTENCY: Database will always be in a consistent state both at the beginning and at the end of a transaction

ISOLATION:  No transaction has access to any other in-flight/unfinished transaction

DURABILITY:  Database records the transaction in persistent storage/data files when it completes. Power/disk  failure will not impact the completed transaction.

Leave a Comment

Cassandra – 2 – Basics of Cassandra

The word Cassandra was taken from the name of ancient Greet prophet Cassandra.

Apache Cassandra is a distributed NoSQL database system. It is a distributed database system. It is NOT RDBMS system like Oracle. It is NoSQL database.

High availability and linear scalability are some key benefits of Cassandra database. Cassandra is designed for high-volume, low-latency cloud applications.

Leave a Comment

Cassandra – 1 – NoSQL Database

As they say “Necessity is the mother of invention”. There would have been no RDBMS kind of databases like Oracle if there was no computers.

Similarly there would have been no need of NoSQL kind of databases if there was no Internet explosion, no Facebook/social media, no cell phones etc. etc. All these new “trends” generated such huge data at such fast pace that RDMBS was simply not able to cope up. NoSQL helped in solving this problem of this new/huge/versatile/fast generating data.

Leave a Comment