Last weekend, I had an interesting learning during BigData Cassandra App production release. Major shift was to change from RandomPartitioner to Murmur3Partitioner. Let me ink about it.
Basically, Cassandra partitioner determines how data is distributed across the nodes in the cluster including replicas. Basically, a partitioner is a hash function for computing the token/hash of a row key. Each row of data is uniquely identified by a row key and distributed across the cluster by the value of the token.
Both the Murmur3Partitioner and RandomPartitioner use tokens to help assign equal portions of data to each node and evenly distribute data from all the tables throughout the ring or other grouping, such as a keyspace. This is true even if the tables use different row keys, such as usernames or timestamps.
Two key differences on implementation are:
- Murmur3Partitioner uniformly distributes data across the cluster based on MurmurHash hash values; where as RandomPartitioneron MD5 hash values.
- On setting the partitioner in the cassandra.yaml file, Murmur3Partitioner includes org.apache.cassandra.dht.Murmur3Partitioner, where as RandomPartitioner refers org.apache.cassandra.dht.RandomPartitioner