0
Answer

Detecting and Removing Redundancy in file

meena moon

meena moon

4y
576
1
I have a very large dataset ( integer data ) in file .
I would like to search for duplicates data (int value) and then remove them from file in a rapidly way.
What would be a good algorithm for this ??
I'm reading about minhash algorithm. Is it a good way for this purpose? or is there another way??