Introduction

Consistent Hashing is an amazing hashing technique which is the technology behind various services like DynamoDB, Cassandra, Akamai etc. How? We will learn in detail.

What is Hashing?

I know you are more interested in learning consistent hashing but let’s quickly cover the basics as well.😁

Hashing is a simple technique of assigning or mapping a unique identifier (a key or a name etc.) to a set of data. This identifier or a key assigned to a particular data object is called a hash key and that object is called hash value. Even you are a hash too ! Your name is a hash key, you are a hash value. People call your name, and you respond. Isn’t it ? The hash key can be an integer or a string.

Now how do you assign a key to an object?
You follow a mechanism or an algorithm or some method to do so. This algorithm you use to assign a key to the object is called hashing algorithm. This algorithm can be a mathematical formula or a logical assignment or even a dedicated algorithm. Hashing is actually everywhere –

There are various hash functions and hashing techniques widely used to send end-to-end encrypted data. One such technique is HMAC (Hash-based Message Authentication Code).

Read more on hashing – here


What’s wrong with standard hashing algorithm?

We will understand this with an example – read here.

According to the example mentioned in the above link, if you have 5 databases servers (shards), system is working fine, but if you want to add 6th server, our hashing algorithm becomes obsolete because it was (modulo 5). Same is the case if we want to remove a server, leaving only 4 shards. With 5 shards and hashing algorithm being modulo 5, it was working well. See the distribution of keys (customer_ids) below in 5 servers-

Now let’s say if Server 2 (S2) is down in the above diagram. So now we only have 4 remaining servers. Now our hashing algorithm becomes modulo 4. Let’s see how the existing keys are now re-assigned in each server.

See the impact of this hashing algorithm. Now one server is down, the server-to-key mapping was re-calculated. As a result, most of the keys (customer ids) were re-assigned a different server id. Only 1 (in green) key was not touched. If a request to retrieve data for customer id = 125 arrives, it is now present in S1 but earlier it was in S0. This results in a lot of incorrect server calls to the database.

So, a normal (standard) hashing algorithm like (modulo n) where n is the number of server won’t work when servers are increased or decreased. It only works when the number of servers are fixed.


Presenting – Consistent Hashing

Now we see the need of having a new hashing technique that allows us to scale servers horizontally (database servers or shards) such that when number of servers scale up or down, it does not impact the incoming requests. This is possible only if we ensure that when we increase or decrease our servers, number of keys that are re-assigned (disturbed) are minimal.

Let’s say we define the numbers of maximum servers we can ever scale up to is M. M can be infinitely large (M can be as large as 2128 -1. ).

Each server has a server id (S0, S1, S2, and so on).

Let’s define a virtual strip with each slot of 0 to M. So now we have x0, x1, x2, x3, ….. xM as a strip.

We will try to map our servers on this strip and we will simply use the hash function

Let’s join both the ends of the strip x0 and xM so that it becomes a circular ring. This ring is called a hash ring.

Since M is infinitely big, we can consider this ring as a circle with each dot as xi. Now let’s say we have 5 servers which will occupy slots on this ring possibly uniformly like this.

Now we will place keys on the rings and map the keys (customer_ids) to the servers, but we will not map it using the hash function (modulo), instead we will place the keys on the ring and map the server in the following way –

  • Take the key and move in clockwise direction and the first server you get, you map the key to that server. This means that server will store the key.
  • Repeat it for all the keys. [See diagram below – ]

… and let’s see Consistent Hashing in Action

Alright, so far we have come up with a virtual hash ring with servers, and keys assigned to their nearest server moving clockwise. Notice how we did not use modulo n (% n) algorithm to assign keys to servers (where n is the number of servers).

Now in standard hashing function (modulo n) we saw that when we add or remove a server, we see almost all keys had to be re-mapped. Let’s see what happens when we add/remove a new server to the hash ring.

  • Adding a new server :

See the diagram below –

When we add a new server (S5), only one key (customer_id) had to be re-mapped (from S0 to S5), rest all keys remained in their assigned servers.


  • Removing a new server :

See the diagram below –

When we remove a server (S0), again only one key (customer_id) had to be re-mapped (from S0 to S1), rest all keys remained in their assigned servers.


Let’s implement above in our code ! Ready for fun?

We will implement the code in java. We will create the hash ring using a TreeMap. Initially the servers are 0.

Following is our Customer class.

package com.example.consistent_hashing;

import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;

@Data
@NoArgsConstructor
public class Customer {
    private Integer customerId;
    private String name;
    private String email;
    private Integer serverId;

    public Customer(Integer customerId, String name, String email) {
        this.customerId = customerId;
        this.name = name;
        this.email = email;
    }

    @Override
    public String toString() {
        return "{" +
                "customerId : " + customerId +
                " name : '" + name + '\'' +
                " email : '" + email + '\'' +
                " serverId: " + serverId +
                '}';
    }
}

Let’s write the implementation of Consistent Hashing !

package com.example.consistent_hashing;

import java.util.ArrayList;
import java.util.List;
import java.util.SortedMap;
import java.util.TreeMap;

// Implement Consistent Hashing in java using a TreeMap as hash-ring
public class ConsistentHashing {

    // serverId mapped to keys (in our case it is customer details)
    // we will maintain serverId as S0, S1, S3, ...
    int serverNumber = -1; // initialising it to -1. First server will be S0.
    private final SortedMap<Integer, List<Customer>> hashRing = new TreeMap<>();

    public void addNewServer(){
        // new server will be added next to the last server in the hashRing
        int lastServer = serverNumber;
        // new serverId to be added
        int newServerIdToAdd = ++serverNumber;
        // check if the last server stores any customer data.
        if(hashRing.get(lastServer) == null || hashRing.get(lastServer).isEmpty()){
            // the last server holds no customer data.
            // Simply add the new server next to it and return.
            hashRing.put(newServerIdToAdd, null);
            System.out.println("New Server added successfully! \nTotal Servers = 
            "+getServerCount());

            return;
        }
        // when we add the new server next the last server
        // we need to re-map the customers stored in last server to the next server
        // find the list of customer data in last server
        List<Customer> customers = hashRing.get(lastServer);
        System.out.println("List of customers stored in last server = "+customers);
        for(int i = 0 ; i < customers.size(); i++){
            // update the serverId for each server to re-map to new serverId
            Customer customer = customers.get(i);
            customer.setServerId(newServerIdToAdd);
        }
        // now update the hashRing.
        hashRing.put(newServerIdToAdd, customers);
        hashRing.put(lastServer, null);
        System.out.println("New Server added successfully. Re-mapping completed ! \nTotal Servers 
        = "+getServerCount());
        System.out.println();
    }

    public void removeServer(int serverId){
        if(!hashRing.containsKey(serverId)){
            System.out.println("ServerId = "+serverId+" does not exist.");
            return;
        }
        // find out if there are customers stored in this serverId.
        if(hashRing.get(serverId) == null || hashRing.get(serverId).isEmpty()){
            // no customers to re-map. simply delete and return;
            hashRing.remove(serverId);
            System.out.println("ServerId = "+serverId+" removed successfully !");
            return;
        }
        // find list of customers stored in the serverId
        List<Customer> customers = hashRing.get(serverId);
        System.out.println("ServerId = "+serverId+" to be removed has customers = "+customers);
        // map these customers to the next server than serverId.
        int nextServerToReMap = serverId == hashRing.size() - 1 ? 0: serverId + 1;
        System.out.println("Re-mapping these customers to next serverId = "+nextServerToReMap);
        for(int i = 0 ; i < customers.size(); i++){
            customers.get(i).setServerId(nextServerToReMap);
        }
        // append the customers in serverId
        if(hashRing.get(nextServerToReMap) == null){
            hashRing.put(nextServerToReMap, customers);
        }
        else {
            hashRing.get(nextServerToReMap).addAll(customers);
        }
        // remove server
        hashRing.remove(serverId);
        System.out.println("ServerId = "+serverId+" removed successfully. Re-mapping completed ! 
        \nTotal Servers = "+getServerCount());
        System.out.println();
    }


    public void addCustomer(Customer customer){
        if(hashRing.isEmpty()){
            System.out.println("No Servers present. Please add servers to store customer data");
        }
        // in order to distribute customer data evenly
        // across the servers, we will find the server with
        // the least customers.
        int serverToAddCustomer = 0;
        int minCustomers = Integer.MAX_VALUE;
        for(int i = 0 ; i < hashRing.size(); i++){
            int customers = hashRing.get(i) == null ? 0 : hashRing.get(i).size();
            if(customers < minCustomers){
                minCustomers = customers;
                serverToAddCustomer = i;
            }
        }
        customer.setServerId(serverToAddCustomer);
        hashRing.computeIfAbsent(serverToAddCustomer, k -> new ArrayList<>());
        hashRing.get(serverToAddCustomer).add(customer);
        System.out.println();
    }

    public void removeCustomer(Customer customer){
        System.out.println("Deleting customer data with customerId = "+customer.getCustomerId()+" 
        and name = "+customer.getName());
        // find the server in which this customer data is stored.
        int serverId = customer.getServerId();
        hashRing.get(serverId).remove(customer);
        customer.setServerId(null);
        System.out.println();
    }

    public void addNewServers(int count){
        while(count-- > 0)
            addNewServer();
        System.out.println();
    }

    public void showHashRing(){
        hashRing.forEach((serverId, customers ) -> {
            System.out.println(" Server: "+serverId+" has customers: \n"+
                    (customers == null? "none" : customers));
        });
        System.out.println();
    }

    public int getServerCount(){
        return hashRing.size();
    }
}

Let’s test the application –

package com.example.consistent_hashing;

public class ConsistentHashingApplication {

    public static void main(String[] args) {
        ConsistentHashing consistentHashing = new ConsistentHashing();

        // add 5 servers
        consistentHashing.addNewServers(5);

        // create 4 customers
        Customer alice = new Customer(101, "Alice", "alice@example.com");
        Customer bob = new Customer(102, "Bob", "bob@example.com");
        Customer john = new Customer(103, "John", "john@example.com");
        Customer rick = new Customer(104, "Rick", "rick@example.com");


        // store the customer data
        consistentHashing.addCustomer(alice);
        consistentHashing.addCustomer(bob);
        consistentHashing.addCustomer(john);
        consistentHashing.addCustomer(rick);


        // show hashRing
        consistentHashing.showHashRing();

        // remove serverId = 1
        consistentHashing.removeServer(1);
        consistentHashing.showHashRing();

        // remove customer rick
        consistentHashing.removeCustomer(rick);
        consistentHashing.showHashRing();
    }

}

Let’s see the output –

New Server added successfully! 
Total Servers = 1
New Server added successfully! 
Total Servers = 2
New Server added successfully! 
Total Servers = 3
New Server added successfully! 
Total Servers = 4
New Server added successfully! 
Total Servers = 5


 Server: 0 has customers: 
[{customerId : 101 name : 'Alice' email : 'alice@example.com' serverId: 0}]
 Server: 1 has customers: 
[{customerId : 102 name : 'Bob' email : 'bob@example.com' serverId: 1}]
 Server: 2 has customers: 
[{customerId : 103 name : 'John' email : 'john@example.com' serverId: 2}]
 Server: 3 has customers: 
[{customerId : 104 name : 'Rick' email : 'rick@example.com' serverId: 3}]
 Server: 4 has customers: 
none

ServerId = 1 to be removed has customers = [{customerId : 102 name : 'Bob' email : 'bob@example.com' serverId: 1}]
Re-mapping these customers to next serverId = 2
ServerId = 1 removed successfully. Re-mapping completed ! 
Total Servers = 4

 Server: 0 has customers: 
[{customerId : 101 name : 'Alice' email : 'alice@example.com' serverId: 0}]
 Server: 2 has customers: 
[{customerId : 103 name : 'John' email : 'john@example.com' serverId: 2}, {customerId : 102 name : 'Bob' email : 'bob@example.com' serverId: 2}]
 Server: 3 has customers: 
[{customerId : 104 name : 'Rick' email : 'rick@example.com' serverId: 3}]
 Server: 4 has customers: 
none

Deleting customer data with customerId = 104 and name = Rick

 Server: 0 has customers: 
[{customerId : 101 name : 'Alice' email : 'alice@example.com' serverId: 0}]
 Server: 2 has customers: 
[{customerId : 103 name : 'John' email : 'john@example.com' serverId: 2}, {customerId : 102 name : 'Bob' email : 'bob@example.com' serverId: 2}]
 Server: 3 has customers: 
[]
 Server: 4 has customers: 
none

Awesome ! We have successfully implemented the consistent hashing.

In the next article, we will see what are some issues with consistent hashing and how we can enhance it !

Happy Reading !

Leave a Reply

Your email address will not be published. Required fields are marked *