Skip to content

A simple Hash Table Implementation from the first principles

Notifications You must be signed in to change notification settings

Shiwang0-0/Hash-Table

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A simple Hash Table implementation from the first principles

why soft delete?

The concept of soft delete is, if we have elements on index 3,5,7,10 and when deleting a key it say its hash values lands on index 5, i would not "hard delete" it, because then we will make 3 and 7,10 seperated by an empty space, and we will not be able to perform the right look up through the hash table in this case.
what we do is, we will mark the 5th index as soft_deleted=true and will not delete its pointer for now.
Once we will do resizing, we will run a hard delete for all items that had soft_deleted=true and move only the remaining pointers in the resized table.

To address the same problem we also have active_key_count which tells how many keys/items there are actually in the hash table and used_key_count which tells that how many of the key space or alternately ( active keys + soft deleted keys ).
So we will decrease only the active_key_count when performing soft_delete_item operation This also helps in addressing the Load Factor in a new manner.

While doing a search, we can simply check if the mapped value is already soft_deleted (for the same table size) if it was soft_deleted dont take that value, and search the same key for other attempt.
While doing insert, since the pointers of soft_deleted_item are not freed yet, so we can simply continue with our normal apporach

little about amortized analysis

basic idea of amortized analysis is, for some operations the operating cost might be expensive but these expensive operations can be analysed and the average case tends to be a small cost. for this hash table implementation amortized analysis can be obeserved when we resize the hash table.

double the size of hash table (dynamic array)

say we have a size 1 hash table initially, to insert a new element we will need to create a new hash table of twice the size which is 2 and then copy the element from size 1 hash table, this will cost us 4 operations ( 2 for doubling the size of the array, 1 for copying the element and 1 for inserting the new element ), on similar note when we double the number of operations goes in the following way

no of element | size | operations
1 | 1 | 1 (for single insertion)
2 | 2 | 4 (2 ops for 2 size table, 1 op for copy, 1 op for insert)
3 | 4 | 4+2+1 (4 ops for 4 size table, 2 op for copy, 1 op for insert)
4 | 4 | 1 (since size was already available, only 1 op for insert)
5 | 8 | 8+4+1 (8 ops for 8 size table, 4 op for copy, 1 op for insert)
6 | 8 | 1 (since size was already available, only 1 op for insert)
7 | 8 | 1 (since size was already available, only 1 op for insert)

here cheaper operations are performed far more times than the expensive ones, thus dropping us from assumed approximation of O(n^2) to near about O(n) for n insertions.

halving the size of hash table

when we want to increase size, we increase it by doubling it, would shrinking when elements are less than twice the array size will benefit? No, suppose we have 1,2,3,4,5,6,7,8 so size would have been 16, and if we remove 8, we are immediately shirinking the size to 8, now elements are 1,2,3,4,5,6,7. but one more insertion would lead us to double the size again (more operations frequently).
should we do it in 1/4 ? suppose we have 16 elements, 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16 size would have been 32. remove last 8 elements, elements would be 1,2,3,4,5,6,7,8 in space of 32 elements (considering threshold is at 0.5) even if we half the size now meaning take it to 16 size, one more insertion and we would break the threshold of 0.5 because we will have 8 element in size of 16. (although n/4 is still considered as reasonable spot, if threshold is more than 0.5)

we will consider 1/8 as our sweet spot, in that way some few immediate insertions would not cause a upsize. so we only have 2 elements remaining in the array, then 16 size table will shrink to 8, (16/8 == 2)

About

A simple Hash Table Implementation from the first principles

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages