How to remove duplicate words from string in C#

Introduction

Removing duplicate words from a string in C# is a common programming task that can be useful in many scenarios. For instance, you may want to remove duplicate words from a user's input in a search bar to ensure more accurate search results. Fortunately, there are several ways to accomplish this task in C#. In this article, we will explore all possible methods to remove duplicate words from a string in C# with examples and explanations.

Methods to remove duplicate words from a string

  • Using Regular Expressions
  • Using Split() and Distinct()
  • Using Dictionary

Method 1. Using Regular Expressions

Regular expressions are a powerful tool for pattern matching in strings. We can use regular expressions to match and remove duplicate words from a string in C#. Here's how:

using System.Text.RegularExpressions;

string input = "C# Corner is a popular online online community";
string output = Regex.Replace(input, @"\b(\w+)\s+\1\b", "$1");

Console.WriteLine(output); 
  • First, we import the System.Text.RegularExpressions namespace to use regular expressions.
  • Then, we define a string variable input with the input string that we want to remove duplicates from.
  • Next, we use the Regex.Replace() method to match and replace duplicate words in the input string.
  • The regular expression \b(\w+)\s+\1\b matches any word character (\w+) that is followed by one or more whitespace characters (\s+) and then the same word again (\1). The \b at the beginning and end ensure that the match is a whole word, not just a part of a larger word.
  • Finally, we replace the duplicate word with just the first occurrence of the word ($1) using the regular expression replacement syntax.

Method 2. Using Split() and Distinct()

Another way to remove duplicate words from a string in C# is to use the Split() method to split the string into an array of words, then use the Distinct() method to remove duplicates, and finally join the array back into a string. Here's an example:

string input = "C# Corner is a popular online community popular online community";
string[] words = input.Split(' ');
string[] distinctWords = words.Distinct().ToArray();
string output = string.Join(" ", distinctWords);
Console.WriteLine(output); 
  • First, we define a string variable input with the input string that we want to remove duplicates from.
  • Then, we use the Split() method to split the input string into an array of words, using a space character as the separator.
  • Next, we use the Distinct() method to remove duplicates from the array of words.
  • Finally, we join the distinct words back into a string using the string.Join() method, again using a space character as the separator.

Method 3. Using Dictionary

We can also use a dictionary to remove duplicate words from a string in C#. Here's how:

string input = "C# Corner is a popular online community popular online community";
string[] words = input.Split(' ');
Dictionary<string, int> dict = new Dictionary<string, int>();

foreach (string word in words)
{
    if (!dict.ContainsKey(word))
    {
        dict.Add(word, 0);
    }

    dict[word]++;
}

string output = string.Join(" ", dict.Keys);

Console.WriteLine(output);
  • First, we define a string variable input with the input string that we want to remove duplicates from.
  • Then, we use the Split() method to split the input string into an array of words, using a space character as the separator.
  • Next, we define a dictionary dict that we will use to keep track of the word occurrences.
  • We iterate over each word in the words array using a foreach loop.
  • For each word, we check if it exists in the dictionary using the ContainsKey() method. If it doesn't exist, we add it to the dictionary with an initial count of 0 using the Add() method.
  • Finally, we increment the count of the word in the dictionary by 1 using the ++ operator.
  • After all the words have been processed, we join the distinct words in the dictionary using the Keys property and the string.Join() method.

FAQs

Q- What is the difference between Method 2 and Method 3?
A- Method 2 uses the Split() and Distinct() methods to remove duplicate words, while Method 3 uses a dictionary to keep track of the word occurrences. Method 3 is more flexible and can be easily modified to perform other tasks such as counting the occurrences of each word.

Q- Can I use these methods to remove duplicate characters from a string?
A- No, these methods are specifically designed to remove duplicate words from a string. To remove duplicate characters, you can use methods such as Distinct() or a loop to iterate over each character in the string and remove duplicates manually.

Q- Are these methods case-sensitive?
A- Yes, these methods are case-sensitive. To make them case-insensitive, you can use the ToLower() or ToUpper() methods to convert the input string and the words to lowercase or uppercase before processing them.

Up Next
    Ebook Download
    View all
    Learn
    View all