Some people are uncomfortable connecting to hive with the zookeeper namespace. It’s possible they don’t understand it, or perhaps they’re just old school and like a good old fashioned host and port.
For this tutorial our hive server will be located: hive.server.com:10000
Let me de-mystify what is happening when you use the zookeeper namespace.
When you connect via beeline you likely use a string that look like this:
jdbc:hive2://<ZOOKEEPER QUORUM>/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver
specifically you likely use a command that looks like this:
beeline -u "jdbc:hive2://server1.com:2181,server2.com:2181,server3.com:2181;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver"
So let’s break down this URL like looking thing to explain what it’s doing and why you should use it.
jdbc:hive2://<ZOOKEEPER QUORUM>/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver
Ok so now we understand how the url works big deal, why do we care?
Well let’s use this info to see what zookeeper is doing: (notice that we will re-use both the ZOOKEEPER QUORUM and zooKeeperNamespace
/path/to/zookeeper/bin/zkCli.sh -server server1.com:2181,server2.com:2181,server3.com:2181 ls /hiveserver2
This command produces a lot of output that’s, but look at the final lines of it:
[serverUri=hive.server.com:10000;version=1.2.1000.2.6.5.178-1;sequence=0000000197]
So you can see that actually, all beeline is doing is looking up the value from zookeeper.
Well if that’s all it is doing why not just use the host and port and skip the discover?
Basically you get free load-balancing, and you don’t have to worry about hardcoding.