With Hortonworks ‘exiting the market’ (merging with Cloudera) the only provider of Apache Hadoop free packages is now BigTop. Why haven’t you heard of BigTop before? Well you have, you just didn’t know it. BigTop was what I like to think of the original provider of .deb and .rpm. They actually provided a lot of packages that Cloudera and Hortonworks redistributed. If you ever wondered what Amazon/Google use to build hadoop the answer is BigTop. When I installed hadoop for the very first time BigTop is what I used, I didn’t build from source. They have taken the work out of apache project code -> installable package. They simply became a little less important while CDH and HDP were making and distributing packages. BigTop made sure all the packages worked together and then provided easy tools to install. Now with all of CDP (The love child of Cloudera’s CDH and Hortonworks HDP) binaries behind a paywall, I’m once again turning to BigTop to provide me with packages. In the years that I’ve been gone from using BigTop they have also gotten much more advanced with their build/deploy/test scripts. They have even created Gradle tools, to allow you to specify a git repo, release, and you can create a build using the newest code. This means that if you see the fix in git and you want it on your system, you can truly pull it down the code and build the latest and greatest. Those of us that have been around the block will know that this should never be done without rigors testing but on the whole this is very promising to move forward with getting fixes from ‘the open source’ to your cluster.
You can really pick a version of an open source Apache Hadoop project, and deploy it, and have a series of tests run on it. This makes it trivial to test any 3rd party code/products against new versions of Hadoop Apache. (No wonder Cloudera, hired all the contributors at some point in their careers.) Just the fact you can package new Apache project code with simple commands is a huge upside and a great secret that companies like Cloudera don’t want you to know.
If you are looking at life after Hortonworks, and you didn’t want to move forward with Cloudera, this is the Open Source Option. Don’t get me wrong Cloudera makes the best tools to work with open source. Their Cloudera Data Science Workbench is the notebook I always wanted. CDP in the cloud is the deployment tool I dreamed off to make clusters fast and efficient. Cloudera got even better when they could adopt any technology. Gone are the days of avoiding specific tech because Hortonworks was behind it. (I’m looking at you SDX, the best of bread for security and data lineage) Cloudera has gotten even better because it is open source. They did however put up a paywall that makes it so you cannot get the RPMs/Debs for free. If that bothered you and you want your freedom back, then BigTop is your answer.