This was totally not my work but it is a great strategy that I”ll keep in mind for future uses. I love it when engineers take the time to write up these tricks as it’s really helpful for the community at large to learn from them. In this case, I think you might have been able to come up with the idea on your own, but now that you have seen it you can’t unsee it.
Read the article here:
https://engineeringblog.yelp.com/2019/01/migrating-kafkas-zookeeper-with-no-downtime.html
I think ideas like this should be generalized as a building block for future use. The idea of temporarily increasing a cluster to only cut it into two later with a rolling restart. The author of this article calls this Mitosis. It may have other uses and should be kept in the back pocket.
If you enjoy good articles about experiences from the field you may also enjoy this article:
https://engineeringblog.yelp.com/2019/01/migrating-kafkas-zookeeper-with-no-downtime.html
It’s another story about an architect that chose speed over safety, and would just “rebuild the lost data”. The only failure was that they didn’t test their recovery strategy and later learned that it was quite painful.
I also found a great article that calls out that by default there’s a way that order is not guaranteed in a partition. The risk is low but present and if you are counting on order in a partition then you must read this article to understand how to guarantee this order I have relied on many times.(Effects Kafka >2.1.0 )
https://blog.softwaremill.com/does-kafka-really-guarantee-the-order-of-messages-3ca849fd19d2