> ZooKeeper is rock solid. Moving off it is a mistake, IMO.
I’m agnostic about Kafka but ZooKeeper is problematic for many use cases based on personal experience and I wouldn’t recommend it. It can be “rock solid” and still not very good. I’ve seen ZK replaced with alternatives at a few different organizations now because it didn’t work well in practice, and what it was replaced with worked much better in every case.
ZooKeeper works, sort of, but I wouldn’t call it “good” in some objective sense.
To be fair, a lot of people use ZK wrong, then complaint about it.
For example, if you use it like a general purpose KV store like Redis, you'll have a bad time.
Another often encountered mistake is people, thinking it doesn't need to store much data, deploy ZK to a server with slow disk/network. Big mistake, as every write to ZK need to be broadcasted and synced to disk, a bottle-neck in disk and network IOPS will kill your ensembles.
This has also been my experience when I saw unreliable ZKs; they're sharing the OS, ZK, and maybe even some other services on the same disk, and sometimes they're even running software RAID or something on top of that.
I don't think teams who can't run ZK will have much luck running other distributed systems. (Maybe KRaft, if they're Kafka experts.) Most of the alternatives proposed here have been "let someone else run the hard part." (Which isn't a bad choice, but it's not technically a solution.)
I’m agnostic about Kafka but ZooKeeper is problematic for many use cases based on personal experience and I wouldn’t recommend it. It can be “rock solid” and still not very good. I’ve seen ZK replaced with alternatives at a few different organizations now because it didn’t work well in practice, and what it was replaced with worked much better in every case.
ZooKeeper works, sort of, but I wouldn’t call it “good” in some objective sense.