Answer

Detecting and Removing Redundancy in file

meena moon

576

I have a very large dataset ( integer data ) in file .

I would like to search for duplicates data (int value) and then remove them from file in a rapidly way.

What would be a good algorithm for this ??

I'm reading about minhash algorithm. Is it a good way for this purpose? or is there another way??

Next Recommended Forum

ChatBot did not work in Web Emulator

Leaderboard

View all

Mahender Pal

Abiola David

Venkatasubbarao Polisetty

View all

Yesterday's leader

Dynamics CRM, Dynamics 365 CE, PowerApps

India

Member of the month

Belgrade (Yugoslavia)

500

Speaker of the month

ASP.NET, .NET, C#, JavaScript, Azure

Sweden

Upcoming events

View all

Suggested for you

View all

Alpesh Maniya

Harminder Singh

Shaishav Desai

Forum Statistics

Please welcome our newest member Gualter Marques.
3,076,801 users have contributed to 147,418 threads and 483,387
In the past 24 hours, we have 12 new threads, 38 new posts, and 141 new users.
In last week, the most popular thread is 'Delta lake vs Lakehouse in Fabric?'.