Skip to content

Overview of various Hadoop Distributions

All of the big data enterprises today use Apache Hadoop in some way or the other. Hadoop is by no means an out-of-the-box solution. Hadoop is open source system but to simplify working with Hadoop, many enterprise versions like Cloudera, MapR and Hortonworks are available in market. Vendor distributions are aimed at overcoming the issues that the users typically encounter in the standard editions. Think of various distribution of Linux like Red Hat.

At the time of writing this post these three are the prominent Hadoop distributions:

Cloudera

  • First company to develop and distribute Apache Hadoop-based software
  • Growing at fast rate. They got $740m in financing from chipmaker Intel in 2014.
  • Cloudera thinks that Hadoop technology can replace good old data warehousing.
  • With Hadoop is at core level, they also have proprietary Cloudera Management Suite which helps in automating the installation, reducing deployment time etc.

Cloudera has a free 60-day trial

Hortonworks

  • It is younger than Cloudera.
  • Unlike Cloudera, It DOES NOT thinks that Hadoop technology can replace good old data warehousing. Actually it believes that both system should co-exist.
  • Yahoo and Teradata have combined share of 20%+ in this company.
  • Riding high on innovations like Yarn.

Hortonworks’ distribution HDP2.0 can be directly downloaded from their website free of cost! No proprietary software. Committed to open source.

MapR

  • MapR replaces HDFS component and instead uses its own proprietary file system, called MapRFS. This is major difference.
  • They think that Open Source world is moving slower and want to make Hadoop Enterprise Production ready quickly.
  • According to MapR MapRFS is more stable, efficient and easy to use than HDFS.
  • MapR has partnership with Canonical, the creator of Ubuntu operating system to offer Hadoop as a default component of Ubuntu operating system

It has free version (M3) as well as paid versions(M5 & M7). As expected free version lacks some of its proprietary features like JobTracker HA, NameNode HA, NFS-HA, Mirroring, Snapshot etc.

Describing in one word, at this stage, Cloudera is about ‘maturity’, Hortonworks is about ‘vision’, and MapR is about ‘momentum’.

Brijesh Gogia
Leave a Reply