Apache Kafka – 5 Best Practices tips

Apache Kafka can be a powerful tool for your data streams, but if the environment is not favorable,  it could become a bottleneck. Below are 5 tips for Kafka best practices to keep your Kafka cluster running fast.

  1. If your environment is running the latest Java 1.8 update, you can tune your JVM settings to assist the G1 collector by directly modifying the Kafka Environment variables in Ambari (Kafka env).
  2. Kafka needs throughput from it’s disks to read/write messages optimally, as it is a very “chatty” application. Providing Kafka with the fastest drives you can, will allow it to perform better. Kafka should not share any drives with other applications as it needs the drive available at all times to convey messages and log information.
    If you want to benchmark Kafka’s write speed, check out this article here – https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
  3. Keep Kafka’s filesystem on physical drives, NOT network drives, virtual drives, or remote file systems, Kafka has suffered known issues with network storage setups due to how it indexes. It is best practice to use an EXT4 filesystem for any configured drive Kafka will be utilizing.  
  4. Keep ZooKeeper using separate disks from Kafka. ZooKeeper is just as “chatty” as Kafka, with how it manages and coordinates a Kafka cluster. With ZooKeeper tracking all partitions and messages for Kafka transactions, it can be very resource intensive, so it is usually a good idea to keep the services separate, especially in a production environment. (*NOTE*: This advice applies to Kafka < 2.6 after 2.6 Kafka no longer requires ZooKeeper.) )
  5. Configure Kafka to utilize producer acknowledgements. While this will slow down your throughput, it provides integrity of data. Without acknowledgements, messages could become lost, as there is a chance a proper replica hasn’t been made for them. In a brief summary these are all of the other acknowledgement configurations;

 

  • acks = 0 
    • Just send messages. The producer won’t wait for any acknowledgement. (FAST++, Reliable–)
  • acks = 1 
    • An acknowledgement is sent by the broker when a message is confirmed to be written on the leader.(FAST+, Reliable+)
  • acks = all 
    • An acknowledgement is sent by the broker when a message is confirmed to be written on all replicas.(FAST–, Reliable++)

These practices should be followed when setting up your Kafka cluster, and should help with giving it a healthy start. Kafka is designed to scale, and these tips are merely only a starting point. For more information on how to scale Kafka, check out this Cloudera blog entry – https://blog.cloudera.com/scalability-of-kafka-messaging-using-consumer-groups/

 

Written by Ryan St. Louis