Cloudera Data Science Workbench – None Cloudera Opinion

Cloudera Data Science Workbench (CDSW) is actually a really solid product.  It is head and shoulders better than Zeppelin.  I just took an entire course in it and here’s my thoughts.

It’s an outstanding notebook.  It has the flexibility of Jupyter.  If you have it set up correctly you have the freedom of JupyterHub.  You can install all the libraries your heart desires, into your own docker container.  It can be set up for SSO.  For those that like details, its libraries stay in docker and don’t infect other docker containers, or self install to the cluster.  (This is on par with JupyterHub, but I do wish for a “push to the cluster” button.)

It’s got integration with Git which I think is amazing.  Simply point to an internal/external git and your project can be imported/exported.  This is the way things should be done.

I also like that it’s combined the concept of project/notebooks/console.  I like picking my files/commands to run, and seeing the output in the console.  The console keeps the previous output and isn’t phased by me changing files.  This is a welcome change from Zeppelin styled notebooks which would force me to combine all the code into one notebook to get the same result.  It has the scala console experience of being able to say “! hdfs dfs -ls ” and the command is run via command line.  This is amazingly handy and really great to see in a notebook experience.

I love the way that CDSW combines Markdown into Python Comments.  Allowing for a very good experience to mix code/presentation.  Of course it is even better that Python is free to import any library needed so you can use all the presentation libraries you want to use to really get a good presentation experience.  

It does have other abilities but the true feature I still miss and the only thing that would hold me back from using this 100% of the time is the lack of code completion.  I’m lazy and prone to typos and this is the next feature I hope cloudera brings to the platform.  Then I can finally say goodbye to my IDE and live in the Cloudera Data Science Workbench.