A simple Hash Table implementation from the first principles
The concept of soft delete is, if we have elements on index 3,5,7,10 and when deleting a key it say its hash values lands on index 5, i would not "hard delete" it, because then we will make 3 and 7,10 seperated by an empty space, and we will not be able to perform the right look up through the hash table in this case.
what we do is, we will mark the 5th index as soft_deleted=true
and will not delete its pointer for now.
Once we will do resizing, we will run a hard delete for all items that had soft_deleted=true
and move only the remaining pointers in the resized table.
To address the same problem we also have active_key_count
which tells how many keys/items there are actually in the hash table and used_key_count
which tells that how many of the key space or alternately ( active keys + soft deleted keys ).
So we will decrease only the active_key_count when performing soft_delete_item
operation
This also helps in addressing the Load Factor in a new manner.
While doing a search, we can simply check if the mapped value is already soft_deleted (for the same table size) if it was soft_deleted dont take that value, and search the same key for other attempt.
While doing insert, since the pointers of soft_deleted_item
are not freed yet, so we can simply continue with our normal apporach
basic idea of amortized analysis is, for some operations the operating cost might be expensive but these expensive operations can be analysed and the average case tends to be a small cost. for this hash table implementation amortized analysis can be obeserved when we resize the hash table.
say we have a size 1 hash table initially, to insert a new element we will need to create a new hash table of twice the size which is 2 and then copy the element from size 1 hash table, this will cost us 4 operations ( 2 for doubling the size of the array, 1 for copying the element and 1 for inserting the new element ), on similar note when we double the number of operations goes in the following way
no of element | size | operations
1 | 1 | 1 (for single insertion)
2 | 2 | 4 (2 ops for 2 size table, 1 op for copy, 1 op for insert)
3 | 4 | 4+2+1 (4 ops for 4 size table, 2 op for copy, 1 op for insert)
4 | 4 | 1 (since size was already available, only 1 op for insert)
5 | 8 | 8+4+1 (8 ops for 8 size table, 4 op for copy, 1 op for insert)
6 | 8 | 1 (since size was already available, only 1 op for insert)
7 | 8 | 1 (since size was already available, only 1 op for insert)
here cheaper operations are performed far more times than the expensive ones, thus dropping us from assumed approximation of O(n^2) to near about O(n) for n insertions.
when we want to increase size, we increase it by doubling it,
would shrinking when elements are less than twice the array size will benefit?
No, suppose we have 1,2,3,4,5,6,7,8
so size would have been 16, and if we remove 8, we are immediately shirinking the size to 8, now elements are 1,2,3,4,5,6,7
.
but one more insertion would lead us to double the size again (more operations frequently).
should we do it in 1/4 ?
suppose we have 16 elements, 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
size would have been 32.
remove last 8 elements, elements would be 1,2,3,4,5,6,7,8
in space of 32 elements
(considering threshold is at 0.5) even if we half the size now meaning take it to 16 size,
one more insertion and we would break the threshold of 0.5 because we will have 8 element in size of 16.
(although n/4 is still considered as reasonable spot, if threshold is more than 0.5)
we will consider 1/8 as our sweet spot, in that way some few immediate insertions would not cause a upsize. so we only have 2 elements remaining in the array, then 16 size table will shrink to 8, (16/8 == 2)