Hashing Data Structure - Know Program

Hashing Data Structure | Hashing is mainly used to introduce Dictionaries where we have key, value pair. It is also used to use Sets. The set can have only unique values. In Map/Dictionary keys are Set type therefore they are also unique. The most interesting fact about hashing is it provides search, insert, and delete in O(1) time on average.

Table of Contents

Data Structure	Search	Insert	Delete
Sorted Array	O(log n)	O(n)	O(n)
Unsorted Array	O(n)	O(1)	O(n)
Binary Search Tree	O(log n)	O(log n)	O(log n)
Hashing	O(1)	O(1)	O(1)

Hashing is not useful in these cases:-

Finding the closest value (Like if the key is not present then find the nearest smaller/greater value)
Keeping data in sorted order.
Prefix searching.

In the case of the 1st and 2nd points, we generally use the AVL tree or Red-Black tree. For 3rd point, we use the Trie data structure.

Applications of Hashing
After Array/ArrayList, hashing (Hashtable/HashMap) is the most commonly used data structure in computer science. Here are a few common applications of hashing:-

Dictionaries
Database Indexing
Cryptography
Caches
Symbol Tables in Compilers/Interpreters
Routers
Getting data from databases

Direct Address Table

Imagine a situation where you have 1000 keys with values from 0 to 999, how would you implement the following in O(1) time:-

Search
Insert
Delete

Sample operations to perform:- insert(10), insert(20), search(10), search(20), delete(119), and etc.

To implement search, insert, and delete operations in O(1) time for a set of keys with values from 0 to 999, you can use a Direct Address Table (DAT). This approach leverages the fact that the range of keys is known and limited.

Direct Address Table Implementation:-

Initialize the Table: Create an array of size 1000. Each index in the array represents a key.
Insert Operation: Directly place the value at the index corresponding to the key. Assuming -1 indicates the key is not present.
Search Operation: Directly access the value at the index corresponding to the key. Array has random access therefore it can be done in O(1).
Delete Operation: Set the value at the index corresponding to the key to -1 to indicate deletion.

public class DirectAddressTable {
    private int[] table;

    public DirectAddressTable() {
        table = new int[1000];
        // Initialize all keys to -1 (indicating empty)
        for (int i = 0; i < table.length; i++) {
            table[i] = -1;
        }
    }

    public void insert(int key, int value) {
        table[key] = value;
    }

    public int search(int key) {
        return table[key];
    }

    public void delete(int key) {
        table[key] = -1;
    }

    public static void main(String[] args) {
        DirectAddressTable dat = new DirectAddressTable();
        dat.insert(105, 500);
        System.out.println("Value at key 105: " + dat.search(105));
        dat.delete(105);
        System.out.println("Value at key 105 after deletion: " + dat.search(105));
    }
}

Even if our keys are from 1000 – 1999 then still we can use them as table[i-1000].

Limitation of Direct Address Table

The Direct Address Table (DAT) is highly efficient for certain scenarios, but it does come with a few limitations:

Space Utilization: High Memory Usage. It requires a large amount of memory proportional to the key range. If the key range is large but sparsely populated, it can be inefficient.
Fixed Key Range: Rigid Structure. It only works well when the key range is known and limited. It’s not suitable for applications where keys can be arbitrary or dynamic.
Lack of Flexibility: It is not suitable for data types beyond integers without additional mapping logic. We can’t use String/Address as a key.
No Handling of Collisions: Assumes Unique Keys. It assumes keys are unique and does not inherently handle collisions like hashing does.

Hashing

Key principles of Hashing:-

Hash Function: It converts a large range of keys into a smaller, fixed-size range (hash table indices). It ideally, distributes keys uniformly across the hash table to minimize collisions.
Collision Handling:

Separate Chaining: Uses linked lists to store multiple keys at the same index.
Open Addressing: Find another empty slot within the table using techniques like linear probing, quadratic probing, or double hashing.

Flexibility:

Dynamic Range: Can handle arbitrary key types (integers, strings, etc.) by converting them into a numerical index through the hash function.
Scalability: Hash tables can grow dynamically to accommodate more keys without significant performance degradation.

import java.util.Map;
import java.util.HashMap;

public class HashingExample {
    public static void main(String[] args) {
        Map<Integer, String> hashMap = new HashMap<>();

        // Insert
        hashMap.put(105, "Alice");
        hashMap.put(210, "Bob");

        // Search
        System.out.println("Key 105: " + hashMap.get(105)); // Output: Alice

        // Delete
        hashMap.remove(105);
        System.out.println("Key 105 after deletion: " + hashMap.get(105)); 
        // Output: null
    }
}

How do Hash functions work?

It should always map a large key to the same small key.
It should generate values from 0 to m-1.
It should be fast, O(1) from integer and O(len) for a string of length “len”.
It should uniformly distribute large keys into Hash table slots.

Sample hash function:-

public int simpleHash(String key, int tableSize) {
    int hash = 0;
    for (char c : key.toCharArray()) {
        hash += c; 
    }
    return hash % tableSize;
}

Example Hash Functions:-

hash(large_key) = large_key % m
For strings, weighted sum.
str[] = "abcd";
(str[0] * pow(x, 0) + str[1] * pow(x, 1) + str[2] * pow(x, 2) + str[3] * pow(x, 3)) % m
Universal Hashing:- Universal hashing is a technique used to minimize collisions in hash tables by randomly selecting a hash function from a family of hash functions. A hash function is chosen at random from this family for each new instance of the hash table. This makes it difficult for an adversary to predict which hash function will be used.

Collision Handling

Let us first see the Birthday Paradox. The Birthday Paradox is a fascinating concept in probability theory that demonstrates how counterintuitive probabilities can be. It explores the likelihood that, in a group of people, at least two individuals share the same birthday.

Surprising Result: In a group of just 23 people, there’s a more than 50% chance that at least two people have the same birthday. For 70 people, the probability that at least two people share the same birthday is nearly 99.9%.

This isn’t because birthdays cluster around specific dates, but rather due to the sheer number of possible pairs. Each person can pair with every other person, rapidly increasing the chance of a match.

Now we can imagine how serious the collision problem is. Collisions are bound to happen if we don’t know the keys in advance.

If we know the keys in advance, then we can use the perfect hashing method, which guarantees that collision won’t happen. But if we do not know the key in advance, then we use one of the following:-

Chaining
Open Addressing
- Linear Probing
- Quadratic Probing
- Double Hashing

Chaining

We maintain an array of LinkedList. The array size is 7 (from 0 to 6). Assume:-
hash(key) = key % 7
Input Keys:- 50, 21, 58, 17, 15, 49, 56, 22, 23, 25

Whenever the collision happens we will insert the item at the end of the LinkedList.

Let us insert the keys:-

50 % 7 = 1: Stored in Bucket 6.
21 % 7 = 0: Stored in Bucket 0.
58 % 7 = 2: Stored in Bucket 2.
17 % 7 = 3: Stored in Bucket 3.
15 % 7 = 1: Stored in Bucket 1 (collision, added to the list).
49 % 7 = 0: Stored in Bucket 0 (collision, added to the list).
56 % 7 = 0: Stored in Bucket 0 (collision, added to the list).
22 % 7 = 1: Stored in Bucket 1 (collision, added to the list).
23 % 7 = 2: Stored in Bucket 2 (collision, added to the list).
25 % 7 = 4: Stored in Bucket 4.

Performance of Chaining

m = Number of slots in Hashtable
n = Number of keys to be inserted
Load factor (α) = n / m
Expected chain length = α
Expected time to search = O(1 + α)
Expected time to insert/delete = O(1 + α)

Data Structures to Store Chains:-

Linked List:- It takes O(len) to search, delete, and insert. It is not cache-friendly and uses extra space to store references.
Dynamic Sized Arrays (ArrayList in Java):- Due to unsorted order search and delete takes O(len) and insert takes O(1). It is cache-friendly because of its contiguous locations.
Self Balancing BST (AVL Tree, Red Black Tree):- It takes O(log n) to search, delete, and insert. It is not cache-friendly.

Implementation of Chaining Using LinkedList

import java.util.LinkedList;

class HashMap<K, V> {
    private static final int INITIAL_CAPACITY = 7; // initial capacity of hash table
    private LinkedList<Entry<K, V>>[] buckets;

    public HashMap() {
        buckets = new LinkedList[INITIAL_CAPACITY];
        for (int i = 0; i < INITIAL_CAPACITY; i++) {
            buckets[i] = new LinkedList<>();
        }
    }

    private int getIndex(K key) {
        return key.hashCode() % INITIAL_CAPACITY;
    }

    public void put(K key, V value) {
        int index = getIndex(key);
        LinkedList<Entry<K, V>> bucket = buckets[index];
        for (Entry<K, V> entry : bucket) {
            if (entry.key.equals(key)) {
                entry.value = value; // update value if key already exists
                return;
            }
        }
        bucket.add(new Entry<>(key, value)); // add new entry
    }

    public V get(K key) {
        int index = getIndex(key);
        LinkedList<Entry<K, V>> bucket = buckets[index];
        for (Entry<K, V> entry : bucket) {
            if (entry.key.equals(key)) {
                return entry.value;
            }
        }
        return null; // return null if key is not found
    }

    public void remove(K key) {
        int index = getIndex(key);
        LinkedList<Entry<K, V>> bucket = buckets[index];
        bucket.removeIf(entry -> entry.key.equals(key)); // remove entry if key is found
    }
}

class Entry<K, V> {
    K key;
    V value;

    public Entry(K key, V value) {
        this.key = key;
        this.value = value;
    }
}

Open Addressing

Open addressing is an efficient collision resolution technique used in hash tables. Instead of storing all entries that hash to the same index in a linked list (like in chaining), open addressing stores all entries directly within the array and uses a systematic method to resolve collisions by finding another open slot within the array.

The basic requirement for open addressing:- Number of slots in Hash table >= Number of keys to be inserted. It is cache-friendly.

Key Methods of Open Addressing:

Linear Probing: If a collision occurs at index i, check the next slot (i + 1) % TableSize, and continue this process until an open slot is found.
Quadratic Probing: If a collision occurs at index i, check the slots i + 1^2, i + 2^2, i + 3^2, and so on, modulo TableSize, to find an open slot.
Double Hashing: Uses a secondary hash function to calculate the step size when a collision occurs. This ensures a more uniform probe sequence.

Linear Probing

Linear probing is a collision resolution technique in hash tables where if a collision occurs, the algorithm searches for the next available slot in a sequential manner.

Insert Operation Steps

Calculate Hash: Compute the hash index for the key. int index = hash(key);
Check for Collision: If the slot at the computed index is already occupied, move to the next slot.
Find Next Available Slot: Continue checking subsequent slots (wrap around if necessary) until an empty slot is found.
Insert Key-Value Pair: Place the key-value pair in the found slot.

Let’s insert keys 50, 21, 58, 17, and 15 into a hash table of size 7 using linear probing.

Insert 50:
- 50 % 7 = 1
- Slot 1 is empty → Insert 50 at index 1.
Insert 21:
- 21 % 7 = 0
- Slot 0 is empty → Insert 21 at index 0.
Insert 58:
- 58 % 7 = 2
- Slot 2 is empty → Insert 58 at index 2.
Insert 17:
- 17 % 7 = 3
- Slot 3 is empty → Insert 17 at index 3.
Insert 15:
- 15 % 7 = 1
- Slot 1 is occupied → Check the next slot (index 2).
- Slot 2 is occupied → Check the next slot (index 3).
- Slot 3 is occupied → Check the next slot (index 4).
- Slot 4 is empty → Insert 15 at index 4.

Search Operation Steps

Calculate Hash: Compute the hash index for the key.
Check Slot: If the slot at the computed index contains the key, return the value.
Check Subsequent Slots: If the key is not found, continue checking subsequent slots (wrap around if necessary) until an empty slot is encountered (indicating the key is not present) or all slots have been traversed.

Example: Let’s search for key 15 in the hash table after the above insertions. Search for 15:
-15 % 7 = 1

Slot 1 contains 50 (not 15) → Check the next slot (index 2).
Slot 2 contains 58 (not 15) → Check the next slot (index 3).
Slot 3 contains 17 (not 15) → Check the next slot (index 4).
Slot 4 contains 15 → Key found, return value.

Delete Operation

In linear probing, the delete operation requires special handling to ensure that subsequent search operations can still find elements that were inserted after the deleted element.

Steps for Delete Operation:-

Calculate Hash: Compute the hash index for the key.
Locate Key: Use linear probing to locate the key to be deleted.
Remove Key: Set the slot to a special marker value (like null) to indicate it has been deleted.
Rehash Subsequent Elements: Continue to probe and rehash any subsequent elements to ensure the integrity of the search operation.

Suppose you have the following hash table after inserting the keys: 50, 21, 58, 17, 15:

Index:  0  1  2  3  4  5  6
Keys:  21 50 58 17 15

If you delete the key 21:

Calculate Hash: 21 % 7 = 0
Locate Key: Found at index 0.
Remove Key: Set table[0] to null.
Rehash Subsequent Elements: Reinsert keys following the deleted element to prevent search failures.

Clustering

Clustering (a problem with probing). Clustering refers to the phenomenon in hash tables where a sequence of adjacent slots becomes filled due to collisions, causing subsequent insertions to experience more collisions. This significantly degrades the performance of hash table operations.

Types of Clustering in Linear Probing:-

Primary Clustering
Secondary Clustering

Primary clustering occurs when a collision causes a contiguous block of slots to be occupied. It Increases the length of probe sequences and the probability of future collisions within the cluster.

Example of Primary Clustering in Linear Probing: Suppose we insert the keys 50, 21, 58, 17, and 15 into a hash table of size 7 using linear probing.

Index:  0    1    2    3    4    5    6
Keys:  21   50   58   17   15

Here, primary clustering has occurred from index 1 to 4. Future insertions in this cluster will likely result in more collisions.

Mitigation Clustering

How to handle clustering problems with linear probing? In linear probing, the hash function was calculated as:- hash(key, i) = (h(key) + i) % TABLE_SIZE. Here h(key) = key % TABLE_SIZE.

Better Hash Functions: Use hash functions that provide a more uniform distribution of keys across the hash table. Example: Universal hashing or double hashing can help minimize clustering.
Different Probing Methods:
- Quadratic Probing: Instead of linear increments, use quadratic increments to reduce clustering. hash(key, i) = (h(key) + i^2) % TABLE_SIZE;
  - In this approach also, the problem of clustering is still there. The disadvantage of quadratic probing over linear probing is that sometimes it may not be able to find the slot even though there is an empty slot.
  - There is a mathematically proven fact that if the load factor is less than 0.5 and TABLE_SIZE (m) = a prime number then only it will guarantee that it will find the empty slot.
- Double Hashing: Use a secondary hash function to calculate the probe step size. hash(key, i) = (h(key) + i * secondaryHash(key)) % TABLE_SIZE;
Dynamic Resizing: Increase the size of the hash table and rehash all keys when the load factor (number of elements/size of table) exceeds a certain threshold. This helps maintain a lower load factor and reduce clustering.

Double Hashing

In Double Hashing we use two hash functions:- hash(key, i) = (h(key) + i * secondaryHash(key)) % TABLE_SIZE; Where h(key) = key % TABLE_SIZE. The secondaryHash(key) should never return the 0.
If secondaryHash(key) is relatively prime to TABLE_SIZE then it always find a free slot if there is one.
Distribute keys more uniformly then linear probing and quadratic hashing.
Here there will be no clustering.

Example:-
h(key) = key % TABLE_SIZE
h2(key) = 6 - (key % (TABLE_SIZE - 1))
hash(key, i) = (h(key) + i * h2(key)) % TABLE_SIZE;

Insert 49, 63, 52, 54, 48.

Hash Functions:

Primary Hash Function: h(key) = key % 7
Secondary Hash Function: h2(key) = 6 – (key % 6)
Combined Hash Function for Probing: hash(key, i) = (h(key) + i * h2(key)) % 7

Inserting Keys:-

Insert 49: Primary Hash: 49 % 7 = 0. Slot 0 is empty, insert 49 at index 0.
Insert 63:
- Primary Hash: 63 % 7 = 0
- Collision occurs.
- Secondary Hash: 6 – (63 % 6) = 6 – 3 = 3
- Probe 1: (0 + 1 * 3) % 7 = 3
- Slot 3 is empty → Insert 63 at index 3.
Insert 52:
- Primary Hash: 52 % 7 = 3
- Collision occurs.
- Secondary Hash: 6 – (52 % 6) = 6 – 4 = 2
- Probe 1: (3 + 1 * 2) % 7 = 5
- Slot 5 is empty → Insert 52 at index 5.
Insert 54:
- Primary Hash: 54 % 7 = 5
- Collision occurs.
- Secondary Hash: 6 – (54 % 6) = 6 – 0 = 6
- Probe 1: (5 + 1 * 6) % 7 = 4
- Slot 4 is empty → Insert 54 at index 4.
Insert 48: Primary Hash: 48 % 7 = 6. Slot 6 is empty → Insert 48 at index 6.

Index:  0   1  2  3  4  5  6
Keys:  49   -  -  63  54  52  48

Why h2(key) and m (TABLE_SIZE) should be relatively prime? It ensures that the entire hash table is probed before repeating any slots. This avoids clustering and ensures better distribution of keys, reducing the chances of collisions. Example:- h2(key) = 6, so (i * h2(key)) % m returns the following:-

(1 * 6) % 7 = 6
(2 * 6) % 7 = 5
(3 * 6) % 7 = 4
(4 * 6) % 7 = 3
(5 * 6) % 7 = 2
(6 * 6) % 7 = 1

void doubleHashingInsert(key) {
    if(table if full) return error;
    probe = h1(key);
    offset = h2(key);
    i = 1;
    while(table[probe] is occupied) {
        probe = (probe + i * offset) % m;
        i = i + 1;
    }
    table[probe] = m;
}

In the case of linear probing, it uses offset = 1.

Performance Analysis of Unsuccessful Search Operation

Load Factor (α): 𝛼 = 𝑛 / 𝑚, where 𝑛 is the number of elements in the table and 𝑚 is the table size. The load factor should be ≤ 1.

Assumption: Every probe sequence looks at a random location. Since the (1 – 𝛼) fraction of the table is empty, the expected number of probes required for an unsuccessful search can be calculated.

Expected Number of Probes: For an open addressing hash table, the expected number of probes 𝐸 for an unsuccessful search is given by:-

𝐸 = 1 / (1 - 𝛼)

This formula reflects the inverse relationship between the fraction of empty slots and the number of probes required. As the table gets fuller (𝛼 approaches 1), the number of probes increases, indicating more effort is needed to determine that an element is not present.

This means:

When the table is empty (α = 0), only one probe is needed (E = 1).
When the table is half full (α = 0.5), on average two probes are required (E = 2).
When the table is almost full (α approaches 1), the number of probes required approaches infinity (E approaches infinity).

Implementation of Open Addressing

public class MyHash {
    private int[] hashTable;
    private int capacity;
    private int size;

    public MyHash(int capacity) {
        this.capacity = capacity;
        this.size = 0;
        this.hashTable = new int[capacity];
        for (int i = 0; i < capacity; i++) {
            hashTable[i] = -1; // -1 indicates an empty slot
        }
    }

    private int hash(int key) {
        return key % capacity;
    }

    // linear probing
    public boolean search(int key) {
        int index = hash(key);
        int startIndex = index;
        while (hashTable[index] != -1) {
            if (hashTable[index] == key) {
                return true;
            }
            index = (index + 1) % capacity;
            if (index == startIndex) {
                return false;
            }
        }
        return false;
    }

    // linear probing
    public boolean insert(int key) {
        if (size == capacity) {
            System.out.println("Hash table is full");
            return false;
        }
        int index = hash(key);
        while (hashTable[index] != -1  // empty location
            && hashTable[index] != -2  // deleted location
            && hashTable[index] != key) { // key already not present
            index = (index + 1) % capacity;
        }
        if (hashTable[index] != key) {
            hashTable[index] = key;
            size++;
            return true;
        } else {
            return false;
        }
    }

    // very similar to search() method
    public void erase(int key) {
        int index = hash(key);
        int startIndex = index;
        while (hashTable[index] != -1) {
            if (hashTable[index] == key) {
                hashTable[index] = -2; // -2 indicates a deleted slot
                size--;
                return;
            }
            index = (index + 1) % capacity;
            if (index == startIndex) {
                return;
            }
        }
    }

    public void print() {
        for (int i = 0; i < capacity; i++) {
            if (hashTable[i] == -1) {
                System.out.print("-1 ");
            } else if (hashTable[i] == -2) {
                System.out.print("-2 ");
            } else {
                System.out.print(hashTable[i] + " ");
            }
        }
        System.out.println();
    }

    public static void main(String[] args) {
        MyHash mh = new MyHash(7);
        mh.insert(49);
        mh.insert(56);
        mh.insert(72);

        if (mh.search(56)) {
            System.out.println("Yes!");
        } else {
            System.out.println("No");
        }

        mh.erase(56);

        if (mh.search(56)) {
            System.out.println("Yes!");
        } else {
            System.out.println("No");
        }

        mh.print();
    }
}

How to handle the case when -1 and -2 are the input keys? To handle the case when -1 and -2 are input keys, you can use a special wrapper class for keys and values. This way, you avoid using -1 and -2 as special markers directly. Instead, use an additional flag within the wrapper class to indicate if a slot is empty or deleted.

public class MyHash {
    class Entry {
        int key;
        String value;
        boolean isDeleted;

        public Entry(int key, String value, boolean isDeleted) {
            this.key = key;
            this.value = value;
            this.isDeleted = isDeleted;
        }
    }

    private Entry[] hashTable;
    private int capacity;
    private int size;

    public MyHash(int capacity) {
        this.capacity = capacity;
        this.size = 0;
        this.hashTable = new Entry[capacity];
    }

    private int hash(int key) {
        return key % capacity;
    }

    public void insert(int key, String value) {
        if (size == capacity) {
            System.out.println("Hash table is full");
            return;
        }
        int index = hash(key);
        while (hashTable[index] != null && 
               !hashTable[index].isDeleted && 
               hashTable[index].key != key) {
            index = (index + 1) % capacity;
        }
        if (hashTable[index] == null || hashTable[index].isDeleted) {
            hashTable[index] = new Entry(key, value, false);
            size++;
        } else {
            hashTable[index].value = value; // Update existing key
        }
    }

    public String search(int key) {
        int index = hash(key);
        int startIndex = index;
        while (hashTable[index] != null) {
            if (hashTable[index].key == key && !hashTable[index].isDeleted) {
                return hashTable[index].value;
            }
            index = (index + 1) % capacity;
            if (index == startIndex) {
                return null;
            }
        }
        return null;
    }

    public void erase(int key) {
        int index = hash(key);
        int startIndex = index;
        while (hashTable[index] != null) {
            if (hashTable[index].key == key && !hashTable[index].isDeleted) {
                hashTable[index].isDeleted = true;
                size--;
                return;
            }
            index = (index + 1) % capacity;
            if (index == startIndex) {
                return;
            }
        }
    }

    public void print() {
        for (int i = 0; i < capacity; i++) {
            if (hashTable[i] == null) {
                System.out.print("null ");
            } else if (hashTable[i].isDeleted) {
                System.out.print("deleted ");
            } else {
                System.out.print(hashTable[i].key + ":" + hashTable[i].value + " ");
            }
        }
        System.out.println();
    }

    public static void main(String[] args) {
        MyHash mh = new MyHash(7);
        mh.insert(49, "Value49");
        mh.insert(56, "Value56");
        mh.insert(72, "Value72");
        mh.insert(-1, "ValueMinus1");
        mh.insert(-2, "ValueMinus2");

        System.out.println("Search for key 56: " 
             + (mh.search(56) != null ? "Found" : "Not Found"));
        System.out.println("Search for key -1: " 
             + (mh.search(-1) != null ? "Found" : "Not Found"));

        mh.erase(56);
        System.out.println("Search for key 56 after deletion: " 
             + (mh.search(56) != null ? "Found" : "Not Found"));

        mh.print();
    }
}

Chaining vs Open Addressing

Chaining	Open Addressing
The hash table never fills	Table may become full and resizing becomes mandatory
Less sensitive to Hash functions	Extra care is required for clustering
Poor cache performance	Cache friendly
Extra space for links in case of LinkedList	Extra space might be needed to achieve the same performance as chaining
Performance is 1 + α for unsuccessful search	Performance is 1 / (1 – α) for unsuccessful search

If α = 0.9 then chaining will require 1 + 0.9 = 1.9 comparison. Whereas open addressing would require 1 / (1 – 0.9) = 1 / 0.1 = 10 comparisons. To minimize the number of comparisons in open addressing we have to increase the size of the table (capacity).

Common Usage

Chaining: Often used when the load factor might exceed 0.7 or when simplicity and flexibility are more important.
Open Addressing: Preferred in scenarios with a high cache locality requirement and when memory overhead needs to be minimized.

In summary, chaining is more flexible and handles higher load factors well, while open addressing offers better space efficiency and cache performance but requires more careful management to avoid clustering.

HashSet in Java

It is used to store set of keys and uses hashing technique. It has add(), contains(), iterator() operations and can be done in O(1) time.

import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;

public class Test {
    public static void main(String[] args) {
        Set<String> set = new HashSet<>();
        set.add("Java");
        set.add("Courses");
        set.add("IDE");

        System.out.println(set);
        System.out.println(set.contains("IDE"));

        Iterator<String> iterator = set.iterator();
        while (iterator.hasNext()) {
            System.out.println(iterator.next() + " ");
        }

        set.remove("Java");
        System.out.println(set);
    }
}

Output:-

[Java, Courses, IDE]
true
Java
Courses
IDE
[Courses, IDE]

Points to be noted:-

The add() method returns true if the element is successfully added, else it ignores the item if it is already present and returns false.
When we traverse the HashSet there is no guarantee about the order of the output.
Set classes don’t have the get(index) method, for that we have to convert it to a list.

import java.util.HashSet;
import java.util.Set;

public class Test {
    public static void main(String[] args) {
        Set<String> set = new HashSet<>();
        set.add("Java");
        set.add("Courses");
        set.add("IDE");

        System.out.println(set.size());
        set.remove("Java");
        System.out.println(set.size());
        System.out.println(set.isEmpty());

        for (String str : set) {
            System.out.println(str);
        }

        set.clear();
        System.out.println(set.isEmpty()); // true
    }
}

Output:-
3
2
false
Courses
IDE
true

If we know the number of keys in advance then we can use:-

Set<String> set = new HashSet<>(3);

The HashSet class in Java does not provide a method to directly get its capacity. The capacity of a HashSet is based on its underlying HashMap, but this detail is not exposed. If you need to manage or track the capacity, you might want to use a HashMap directly or consider designing a custom set class.

Time Complexity:-

The size() and isEmpty() takes O(1).
The add(), remove(), and contains() takes O(1) on average.

HashMap in Java

HashSet is used to store only keys but HashMap can be used to create key-value pair.

import java.util.HashMap;
import java.util.Map;

public class Test {
    public static void main(String[] args) {
        Map<String, Integer> map = new HashMap<>();
        map.put("Java", 1);
        map.put("Rocco", 100);
        map.put("C", 11);

        System.out.println(map);
        System.out.println(map.size());
        for (Map.Entry<String, Integer> entry : map.entrySet()) {
            System.out.println(entry.getKey() + " " + entry.getValue());
        }
    }
}

Output:-

{Java=1, C=11, Rocco=100}
3
Java 1
C 11
Rocco 100

There is no guarantee of the order of keys.

If we know the number of keys in advance then we can use:-

Map<String> set = new HashMap<>(3);

More Methods

The containsKey() method checks whether the key exists in the key set or not and returns true/false.

if (map.containsKey("Java")) {
    System.out.println("Yes");
}
map.remove("Java");
System.out.println(map.size()); // 2
System.out.println(map.containsKey("Java")); // false

The remove() method deletes the key-value pair and returns the value. If the key is not present then it returns null.

System.out.println(map.remove("C")); // 11
System.out.println(map.remove("C")); // null

The containsValue() checks whether the value is present in the HashMap or not.

System.out.println(map.containsValue(100)); // true

The get() method is used to fetch the value corresponding to the key. If the key is not present in the map then it returns null.

System.out.println(map.get("Rocco")); // 100
System.out.println(map.get("C")); // null

Time complexity:-

put(), containsKey(), remove(), get() takes O(1) on average.
The size() and isEmpty() takes O(1).

1. Count Distinct Elements in Array

Array = {15, 12, 13, 12, 13, 13, 18}
Distinct elements = 4

import java.util.HashSet;
import java.util.Set;

public class Test {
    public static void main(String[] args) {
        int[] arr = { 15, 12, 13, 12, 13, 13, 18 };
        Set<Integer> set = new HashSet<>();
        for (int k : arr) {
            set.add(k);
        }
        System.out.println("Distinct elements: " + set.size()); // 4
    }
}

Time complexity: O(n)
Auxiliary Space: O(n)

This program can be further improved as follows, however, the elements should of Integer class type:-

// use wrapper class
Integer[] arr = { 15, 12, 13, 12, 13, 13, 18 };
Set<Integer> set = new HashSet<>(Arrays.asList(arr));

Arrays.asList(arr) returns a fixed-size list backed by the original array, so no extra space is needed for the list elements. The HashSet requires additional space to store the unique elements. This space usage is O(n) in the worst case, where n is the number of elements in the input array.

Time complexity: O(n), and auxiliary space: O(n).

In the case of primitives we can convert them as follows:-

int[] arr = { 15, 12, 13, 12, 13, 13, 18 };
Integer[] arrInteger = Arrays.stream(arr).boxed().toArray(Integer[]::new);
Set<Integer> set = new HashSet<>(Arrays.asList(arrInteger));

However, using Arrays.stream(arr).boxed().toArray(Integer[]::new) creates a new array.

2. Frequencies of Array Elements

Array = { 15, 12, 13, 12, 13, 13, 18 };
{18=1, 12=2, 13=3, 15=1}

import java.util.HashMap;
import java.util.Map;

public class Test {
    public static void main(String[] args) {
        int[] arr = { 15, 12, 13, 12, 13, 13, 18 };
        Map<Integer, Integer> frequency = new HashMap<>();
        for (int k : arr) {
            frequency.put(k, frequency.getOrDefault(k, 0) + 1);
        }
        System.out.println(frequency);
    }
}

Time complexity: O(n), and auxiliary space: O(n). Also see:- Frequency in Sorted Array

3. In-Place Frequency Count in an Array

Given: An array of integers where elements range from 1 to 𝑝, and 𝑝 ≤ 𝑛 (with 𝑛 being the length of the array).

Objective: Count the frequency of each element in the array and update the array such that the ith index contains the frequency of the element i in the array.

Constraints: Perform the task in place, meaning without using extra space for another array or hash map.

int[] arr = { 1, 2, 2, 1, 1, 3, 2 };
[3, 4, 1, 0, 0, 0, 0]

import java.util.Arrays;

public class Test {
    public static void main(String[] args) {
        int[] arr = { 1, 2, 2, 1, 1, 3, 2 };
        int n = arr.length;

        // count frequencies using (n+1) as the increment
        for (int i = 0; i < n; i++) {
            int value = arr[i] % (n + 1);
            arr[value - 1] += (n + 1);
        }

        // Extract frequencies
        for (int i = 0; i < n; i++) {
            arr[i] = arr[i] / n;
        }
        System.out.println(Arrays.toString(arr));
        // [3, 3, 1, 0, 0, 0, 0]
    }
}

Count Frequencies:

For each element, increase the value at the corresponding index by n + 1 (which is 8).
Element 1: arr[0] += 8 → [9, 2, 2, 1, 1, 3, 2]
Element 2: arr[1] += 8 → [9, 10, 2, 1, 1, 3, 2]
Element 2: arr[1] += 8 → [9, 18, 2, 1, 1, 3, 2]
Element 1: arr[0] += 8 → [17, 18, 2, 1, 1, 3, 2]
Element 1: arr[0] += 8 → [25, 18, 2, 1, 1, 3, 2]
Element 3: arr[2] += 8 → [25, 18, 10, 1, 1, 3, 2]
Element 2: arr[1] += 8 → [25, 26, 10, 1, 1, 3, 2]

Divide each element by n + 1 (which is 8):

25 // 8 = 3
26 // 8 = 3
10 // 8 = 1
1 // 8 = 0
1 // 8 = 0
3 // 8 = 0
2 // 8 = 0

Final Array: [3, 3, 1, 0, 0, 0, 0]

4. Intersection of Two Array

Count distinct common elements. Problem: 349

Array1 = {10, 15, 20, 15, 30, 30, 5}
Array2 = {30, 5, 30, 80}
Common elements: 2

Steps:-

Store Elements of First Array: set1 will hold the unique elements from array1.
Find Common Elements: Iterate through array2 and add elements to commonElements if they are in set1.
Since commonElements is a HashSet, it will only store distinct elements and ignore duplicates.
Count Common Elements: The size of commonElements gives the number of distinct common elements.

import java.util.Arrays;
import java.util.HashSet;
import java.util.Set;

public class Test {
    public static void main(String[] args) {
        Integer[] array1 = { 10, 15, 20, 15, 30, 30, 5 };
        Integer[] array2 = { 30, 5, 30, 80 };

        Set<Integer> set1 = new HashSet<>(Arrays.asList(array1));
        Set<Integer> commonElements = new HashSet<>();
        for (int k : array2) {
            if (set1.contains(k)) {
                commonElements.add(k);
            }
        }
        System.out.println(commonElements.size());
    }
}

Time complexity: O(m + n). Building the Set from array1: This operation takes O(m) time. Finding Common Elements in array2: This operation takes O(n) time.

The combined auxiliary space required is O(m + min(m, n)).

It can be also done using retainAll(). The retainAll() keep only the elements that are present in both sets:-

Set<Integer> set1 = new HashSet<>(Arrays.asList(array1)); 
Set<Integer> set2 = new HashSet<>(Arrays.asList(array2)); 
set1.retainAll(set2); 
System.out.println(set1.size()); // Output: 2

Also see:- Intersection of two sorted array

Extended Problem: 360

import java.util.Map.Entry;

class Solution {
    public int[] intersect(int[] nums1, int[] nums2) {
        Map<Integer, Integer> map = new HashMap<>();
        List<Integer> list = new ArrayList<>();
        for (int k : nums1) {
            map.put(k, map.getOrDefault(k, 0) + 1);
        }
        for (int k : nums2) {
            if (map.containsKey(k)) {
                list.add(k);
                if (map.get(k) > 1) {
                    map.put(k, map.get(k) - 1);
                } else {
                    map.remove(k);
                }
            }
        }
        int[] array = new int[list.size()];
        for (int i = 0; i < list.size(); i++) {
            array[i] = list.get(i);
        }
        return array;
    }
}

5. Union of two unsorted Array

Ignore the duplicates in the union and count the number of elements in the union.

Array1 = {15, 20, 5, 20}
Array2 = {15, 15, 15, 20, 10}
Union elements: 4

Integer[] array1 = { 15, 20, 5, 20 };
Integer[] array2 = { 15, 15, 15, 20, 10 };

Set<Integer> unionSet = new HashSet<>(Arrays.asList(array1));
unionSet.addAll(Arrays.asList(array2));
System.out.println(unionSet.size()); // 4

The time complexity is O(m + n) and the auxiliary space complexity is O(m + n), where m and n are the sizes of array1 and array2, respectively.

In the case of sorted arrays, The two-pointer technique is more straightforward for sorted arrays and avoids the overhead of set operations, making it a better choice for this specific scenario. Also see:- Union of two sorted array

6. Pair with the given Sum in an unsorted array

Array = {3, 2, 8, 15, -8}
Sum = 17
Pair exist = Yes (15, 2)

import java.util.HashSet;
import java.util.Set;

public class Test {
    public static void main(String[] args) {
        int[] array = { 3, 2, 8, 15, -8 };
        int sum = 17;
        Set<Integer> set = new HashSet<>();
        for (int k : array) {
            if (set.contains(sum - k)) {
                System.out.println("Yes: (" + k + ", " + (sum - k) + ")");
                return;
            }
            set.add(k);
        }
        System.out.println("No");
    }
}

Time complexity: O(n), auxiliary space: O(n).

If you want to return the index of the pairs then we can use HashMap where the key is arr[i] and the value is i. Check the program below.

7. Two Sum

Given an array of integer nums and an integer target, return indices of the two numbers such that they add up to the target. Problem: 1

You may assume that each input would have exactly one solution, and you may not use the same element twice.

You can return the answer in any order.

Input: nums = [2,7,11,15], target = 9
Output: [0,1]
Explanation: Because nums[0] + nums[1] == 9, we return [0, 1].

class Solution {
    public int[] twoSum(int[] nums, int target) {
        Map<Integer, Integer> map = new HashMap<>();
        for (int i = 0; i < nums.length; i++) {
            if (map.containsKey(target - nums[i])) {
                return new int[] { i, map.get(target - nums[i]) };
            }
            map.put(nums[i], i);
        }
        return new int[] { -1, -1 };
    }
}

Also see:- Find Sum Pair in Sorted Array [two-pointer approach]

8. Subarray with Zero Sum

Find whether there is a subarray with sum=0 or not.

Array = {1, 4, 13, -3, -10, 5}
Output: Yes

Array = {-1, 4, -3, 5, 1}
Output: Yes

Array = {5, 6, 0, 8}
Output: Yes

Array = {3, 1, -2, 5, 6}
Output: No

Sub array are contiguous elements in the array. Sub array of {5, 10, 15} are:-

Subarray starting from index 0: {5}, {5, 10}, {5, 10, 15}
Subarray starting from index 1: {10}, {10, 15}
Subarray starting from index 2: {15}

To solve this problem we can use prefixSum and hashing. Calculate the prefix sum, store it in the HashSet, and find whether the current prefix_sum already exists or not.

Steps:

Prefix Sum: Calculate the cumulative sum of elements as you iterate through the array.
Hash Set: Store each prefix sum in a hash set.
Check Conditions: If a prefix sum is repeated or equals zero, a subarray with sum 0 exists.

import java.util.HashSet;
import java.util.Set;

public class Test {
    public static void main(String[] args) {
        int[] array = { 1, 4, 13, -3, -10, 5 };
        System.out.println(subArrayWith0Sum(array, 0));
    }

    private static boolean subArrayWith0Sum(int[] array) {
        Set<Integer> set = new HashSet<>();
        int prefixSum = 0;
        for (int k : array) {
            prefixSum += k;
            if (set.contains(prefixSum)) {
                return true;
            }

            // handle the case where prefixSum =0 exist from 0 to j
            // like {-3, 2, 1, 4}
            if (prefixSum == 0) {
                return true;
            }
            set.add(prefixSum);
        }
        return false;
    }
}

This method ensures an O(n) time complexity with O(n) auxiliary space, providing an efficient solution.

9. Subarray With Given Sum

Array = {5, 8, 6, 13, 3, -1}
Sum = 22
Output: Yes

Array = {15, 2, 8, 10, -5, -8, 6}
Sum = 3
Output: Yes

Array = {3, 2, 5, 6}
Sum = 10
Output: No

We can use a HashMap to track the cumulative sum and check if the subarray with the given sum exists.

It is very similar to the previous problem. Calculate the prefix sum, store it in the HashSet, and find whether the (prefix_sum – sum) exists or not. If there is a subarray with sum 0 then in the prefix_sum array, there will be a duplicate.

Prefix Sum: Calculate the cumulative sum (prefix sum) as you iterate through the array.
Hash Map: Use a hash map to store the prefix sums and their indices.
Check for Subarray: For each element, check if the difference between the current prefix sum and the target sum exists in the hash map. If it does, it means there is a subarray with the given sum.

import java.util.HashSet;
import java.util.Set;

public class Test {
    public static void main(String[] args) {
        int[] array = { 5, 8, 6, 13, 3, -1 };
        System.out.println(subArrayWithSum(array, 22));
    }

    private static boolean subArrayWithSum(int[] array, int sum) {
        Set<Integer> set = new HashSet<>();
        int prefixSum = 0;
        for (int k : array) {
            prefixSum += k;
            if (set.contains(prefixSum - sum)) {
                return true;
            }

            // handle the case where prefixSum = sum exist from 0 to j
            if (prefixSum == sum) {
                return true;
            }
            set.add(prefixSum);
        }
        return false;
    }
}

Time complexity O(n), auxiliary space O(n).

10. Count Subarray with Given Sum

Given an array of integers nums and an integer k, return the total number of subarrays whose sum equals to k. Problem: 560. Solution

A subarray is a contiguous non-empty sequence of elements within an array.

Input: nums = [1,1,1], k = 2
Output: 2
Example 2:

Input: nums = [1,2,3], k = 3
Output: 2

Prefix Sum and HashMap: The solution uses the concept of prefix sums to efficiently find subarrays that sum up to k. The prefix sum up to the current element is stored in prefixSum, and a HashMap (map) is used to track the frequency of each prefix sum encountered.

Finding Subarrays: When iterating through the array, the key insight is that if prefixSum – k is found in the map, it indicates that there is a subarray (or multiple subarrays) ending at the current index that sums to k. By adding the frequency of prefixSum – k to count, we ensure all such subarrays are counted.

Efficient Update: The map is updated with the current prefixSum to keep track of all the prefix sums encountered so far, allowing for efficient lookups and updates.

class Solution {
    public int subarraySum(int[] nums, int k) {
        int count = 0, prefixSum = 0;
        Map<Integer, Integer> map = new HashMap<>();
        map.put(0, 1);
        for (int i : nums) {
            prefixSum += i;
            if (map.containsKey(prefixSum - k)) {
                count += map.get(prefixSum - k);
            }
            map.put(prefixSum, map.getOrDefault(prefixSum, 0) + 1);
        }
        return count;
    }
}

11. Longest Subarray with given Sum

Given an array of integers and a target sum, find the length of the longest subarray that sums to the target sum. If no such subarray exists, return 0. Problem. Problem with 0 sum

Array = {5, 8, -4, -4, 9, -2, 2}
Sum = 0
Output: 3 [max_len((8, -4, -4), (-2, 2))]

Array = {3, 1, 0, 1, 8, 2, 3, 6}
Sum = 5
Output: 4

Array = {8, 3, 7}
Sum = 15
Output: 0

We can use a combination of prefix sums and a hash map to efficiently find the longest subarray with the given sum. This ensures an O(n) time complexity.

We use the prefix sum and check whether prefix_sum – sum already exists in the previously occurred prefix sum or not. If yes then after that point to till this point subarray exist. In HashMap store prefix and index (leftmost appearance).

For prefix_sum = {8, 11, 12, 17, 11, 17, 19, 21}. If the value appears multiple times, then just store the first appearance.
Prefix sum Map = {(8, 0), (11, 1), (12, 2), (17, 3), (19, 6), (21, 7)}

import java.util.HashMap;
import java.util.Map;

public class Test {
    public static void main(String[] args) {
        int[] array = { 8, 3, 1, 5, -6, 6, 2, 2 };
        System.out.println(longestSubArrayWithSum(array, 4)); // 4 from {-6, 6, 2, 2}
    }

    private static int subArrayWithSum(int[] array, int sum) {
        Map<Integer, Integer> map = new HashMap<>();
        int prefixSum = 0, res = 0;
        for (int i = 0; i < array.length; i++) {
            prefixSum += array[i];

            // Check if the prefix sum is equal to the target sum
            if (prefixSum == sum) {
                res = i + 1;
            }

            // If the prefix sum is not seen before, put it in the map
            if (!map.containsKey(prefixSum)) {
                map.put(prefixSum, i);
            }

            // Check if there exists a subarray with the target sum
            if (map.containsKey(prefixSum - sum)) {
                res = Math.max(res, i - map.get(prefixSum - sum));
            }

        }
        return res;
    }
}

Time complexity O(n), auxiliary space O(n).

12. Longest Subarray with Equal Number of 0’s and 1’s

It is one of the popular interview questions. We have a binary array. Find the longest subarray with an equal number of 0’s and 1’s. Example:- {0, 0, 0, 1, 1, 0, 0, 0, 0, 1}. The longest subarray where 0’s and 1’s are the same:- {0, 0, 1, 1}, {0, 1, 1, 0}, {1, 1, 0, 0}. Problem: 525

This problem is going to be reduced to the problem of finding the length of the longest subarray with 0 sum.

If we replace all the 0 with -1, then the array becomes {-1, -1, -1, 1, 1, -1, -1, -1, -1, 1}, and the problem becomes finding the longest subarray with 0 sum.

Therefore while doing prefix sum consider 0 as -1.

import java.util.HashMap;
import java.util.Map;

public class Test {
    public static void main(String[] args) {
        int[] array = { 1, 0, 1, 1, 1, 0, 0 };
        System.out.println(longestEqual0And1(array)); // 4
    }

    private static int longestEqual0And1(int[] array) {
        Map<Integer, Integer> map = new HashMap<>();
        map.put(0, -1); // Initialize to handle the case when subarray starts from index 0

        int prefixSum = 0, res = 0;
        for (int i = 0; i < array.length; i++) {
            int key = array[i] == 0 ? -1 : 1;
            prefixSum += key;

            // Check if the prefix sum is equal to the target sum
            if (prefixSum == 0) {
                res = i + 1;
            }

            // If the prefix sum is not seen before, put it in the map
            if (map.containsKey(prefixSum)) {
                res = Math.max(res, i - map.get(prefixSum));
            } else {
                map.put(prefixSum, i);
            }
        }
        return res;
    }
}

13. Longest Common Span with Same Sum in Two Binary Arrays

Given two binary arrays arr1 and arr2 of the same size, the goal is to find the length of the longest span (subarray) where the sum of elements is the same in both arrays. Assume both the arrays are the same size.

Example-1:
Array 1: [0, 1, 0, 0, 0, 0]
Array 2: [1, 0, 1, 0, 0, 1]

Longest Span Length: 4

Consider spans where sums are equal in both arrays:
The span from index 1 to 4:
arr1 sum: 1
arr2 sum: 1
Both spans have equal sums and the length is 4.

Example-2:-
Array 1: {0, 1, 0, 1, 1, 1, 1}
Array 2: {1, 1, 1, 1, 1, 0, 1}
Longest Span Length: 6 (From 1st index to end in both arrays)

Example-3:-
Array 1: {0, 0, 0}
Array 2: {1, 1, 1}
Longest Span Length: 0

Example-4:-
Array 1: {0, 0, 1, 0}
Array 2: {1, 1, 1, 1}
Longest Span Length: 1 (At 2nd index)

This problem is going to be reduced to the problem of finding the longest subarray with 0 sum.

Subtract one array from another.
Find the longest sub-array with 0 sum in the resultant array.

We subtract both arrays because 0-0 & 1-1 become 0 therefore they belong to our subarray. The 0-1 and 1-0 results in 1, -1. If there is a common distribution between 1 & -1 then they belong to the subarray.

import java.util.HashMap;
import java.util.Map;

public class Test {
    public static void main(String[] args) {
        int[] array1 = { 0, 1, 0, 1, 1, 1, 1 };
        int[] array2 = { 1, 1, 1, 1, 1, 0, 1 };
        System.out.println(longestCommonSpan(array1, array2)); // 6
    }

   private static int longestCommonSpan(int[] arr1, int[] arr2) {
        // subtract one array from another array
        for(int i=0; i<arr1.length; i++) {
            arr1[i] -= arr2[i];
        }
        return longestEqual0And1(arr1);
    }

    // longestEqual0And1()
}

Time complexity: O(n), and auxiliary space: O(n).

14. Longest Consecutive Subsequence

Given an array of integers, the goal is to find the length of the longest subsequence of consecutive integers in the array. Problem: 128

Array: [100, 4, 200, 1, 3, 2]
The longest consecutive subsequence is [1, 2, 3, 4] with a length of 4.

Method-1: Use sorting. In the sorted array, find the max length of the consecutive elements. The time complexity will be O(n log n) + O(n) = O(n log n).

Method 2: Use Hashing.

Use a HashSet: Insert all elements of the array into a hash set to allow O(1) time complexity for lookups.
Find the Starting Points: Iterate through the array, and for each element, check if it is the start of a sequence (i.e., element – 1 is not in the hash set).
Count the Sequence Length: For each starting point, count the length of the consecutive sequence by checking subsequent elements (element + 1, element + 2, etc.).
Track the Maximum Length: Keep track of the maximum length of consecutive sequences found.

import java.util.Arrays;
import java.util.HashSet;
import java.util.Set;

public class Test {
    public static void main(String[] args) {
        Integer[] arr = { 100, 4, 200, 1, 3, 2 };
        System.out.println(longestConsSequence(arr)); // 4
    }

    private static int longestConsSequence(Integer[] arr) {
        int max = 0;
        // Insert all array elements into the hash set
        Set<Integer> set = new HashSet<>(Arrays.asList(arr));

        for (int k : arr) {
            // Check if k is the start of a sequence
            if (!set.contains(k - 1)) {
                int count = 1;
                int num = k;
                // Count the length of the sequence
                while (set.contains(++num)) {
                    count++;
                }
                // Update the maximum length of consecutive sequence found
                max = Math.max(max, count);
            }
        }
        return max;
    }
}

Time complexity O(n). Although it’s close to linear time, it still ends up traversing the array multiple times, causing a time limit exceeded error for large inputs. Here’s a more efficient solution that ensures linear time complexity, O(n), by avoiding unnecessary checks:

for (int k : nums) {
    count = 1;
    // Check if current element is the start of a sequence
    if (!set.contains(k - 1)) {
        // Count the length of the sequence
        while (set.contains(++k)) {
            count++;
        }
    }
    // Update the result with the maximum length found
    res = Math.max(res, count);
}

15. More Than n/k Occurrences

For a given array, k is given. Print all those elements which occur more than n/k times. Here n is the length of the array.

Array = {30, 10, 20, 20, 10, 20, 30, 30};
k = 4
Output: 20, 30 (As n = 8, n/k = 4}

Array = {30, 10, 20, 30, 30, 40, 30, 40, 30};
k = 2
Output: 30 (As n = 9, n/k = 4.5}

import java.util.HashMap;
import java.util.Map;
import java.util.Map.Entry;

public class Test {
    public static void main(String[] args) {
        int[] arr = { 30, 10, 20, 20, 10, 20, 30, 30 };
        int k = 4;
        nkOccurrence(arr, k); // 20 30
    }

    private static void nkOccurrence(int[] arr, int k) {
        int n = arr.length;
        Map<Integer, Integer> map = new HashMap<>();
        for (int a : arr) {
            map.put(a, map.getOrDefault(a, 0) + 1);
        }
        for (Entry<Integer, Integer> e : map.entrySet()) {
            if (e.getValue() > Math.ceil(n / k)) {
                System.out.println(e.getKey());
            }
        }
    }
}

Time complexity: O(n), auxiliary space: O(n).

O(n * k) Solution

Assume N is very large compared to k. In that case, we can use an extended version of the Boyer-Moore Voting Algorithm to handle the n/k occurrences case.

Let ‘res_count’ be the number of elements in the result. Therefore:- res_count <= k-1.

k * (n/k + 1) <= n
n + k <= n
Which can’t be true therefore the number of results <= k-1.

Create an empty map. The size of the map will be <= k-1.
Traverse the elements of the map.

If it already occurred then increment the count.
Else if map.size() is less than k-1 then store map.put(element, 1).
Otherwise, decrease all values in the map by one. If the value becomes 0, then remove that element.

Verify: For all elements in the map, print the elements that actually appear more than n/k times.

import java.util.HashMap;
import java.util.Map;
import java.util.Map.Entry;

public class Test {
    public static void main(String[] args) {
        int[] arr = {30, 10, 20, 20, 10, 20, 30, 30};
        int k = 4;
        nkOccurrence(arr, k); // 20 30
    }

    private static void nkOccurrence(int[] arr, int k) {
        int n = arr.length;
        // Find possible candidates
        Map<Integer, Integer> cand = new HashMap<>();
        for (int a : arr) {
            if (cand.containsKey(a)) {
                cand.put(a, cand.get(a) + 1);
            } else if (cand.size() < k) {
                cand.put(a, 1);
            } else {
                // Decrease count of existing candidates
                for (Entry<Integer, Integer> e : cand.entrySet()) {
                    if (e.getValue() > 1) {
                        cand.put(e.getKey(), cand.get(e.getKey()) - 1);
                    } else {
                        cand.remove(e.getKey());
                    }
                }
            }
        }

        // Verify candidates
        for (Integer key : cand.keySet()) {
            cand.put(key, 0); // Reset count
        }
        for (int a : arr) {
            if (cand.containsKey(a)) {
                cand.put(a, cand.get(a) + 1);
            }
        }

        // Print elements that occur more than n/k times
        for (Entry<Integer, Integer> e : cand.entrySet()) {
            if (e.getValue() > n / k) {
                System.out.println(e.getKey());
            }
        }
    }
}

Time complexity: O(n * k) because when we decrease the count from the map then it runs for k times. Auxiliary space O(k).

The previous solution took O(n) time, and O(n) auxiliary space, but since in this case n is very large therefore we have used an extended version of the Boyer-Moore Voting Algorithm which took O(n * k) time and O(k) auxiliary space.

If you enjoyed this post, share it with your friends. Do you want to share more information about the topic discussed above or do you find anything incorrect? Let us know in the comments. Thank you!

Direct Address Table

Hashing

Collision Handling

Chaining

Open Addressing

Linear Probing

Clustering

Double Hashing

Implementation of Open Addressing

Chaining vs Open Addressing

HashSet in Java

HashMap in Java

1. Count Distinct Elements in Array

2. Frequencies of Array Elements

3. In-Place Frequency Count in an Array

4. Intersection of Two Array

5. Union of two unsorted Array

6. Pair with the given Sum in an unsorted array

7. Two Sum

8. Subarray with Zero Sum

9. Subarray With Given Sum

10. Count Subarray with Given Sum

11. Longest Subarray with given Sum

12. Longest Subarray with Equal Number of 0’s and 1’s

13. Longest Common Span with Same Sum in Two Binary Arrays

14. Longest Consecutive Subsequence

15. More Than n/k Occurrences

O(n * k) Solution

Leave a Comment Cancel Reply