AI Techs :: Minimum Edit Distance Method in Unicode Strings in C++
By Yilmaz Yoru July 1, 2021
In the Artificial Intelligence Technology, mostly in the field of Natural Language Processing (NLP), Computer Linguistics and in other fields of Computer Science, The Edit Distance Method is a way of quantifying how dissimilar two text far from each other in char comparison by counting the minimum number of operations required to transform one string into the other. Для просмотра ссылки Войдиили Зарегистрируйся find applications in NLP where automatic spelling corrections can be determined. Determines candidate corrections for a misspelled word by selecting words from a dictionary that have a low distance to the word in question This method also used in In bioinformatics to quantify the similarity of DNA sequences, which can be viewed as strings of the letters A, C, G and T.
As described in Wikipedia, different types of edit distance methods allow different sets of string operations. For instance:
Mostly examples in C++ about minimum edit distance are written for char arrays, ASCII strings. Для просмотра ссылки Войдиили Зарегистрируйся is good to understand Minimum Edit Distance method.
In this post we modified this Minimum Edit Distance method to Unicode Strings for the C++ Builder.
Basically, we use two unicode strings (source and dest) in this method, and for these two string inputs,
• We define T[j] as the edit distance matrix between source and dest[j] chars
• Here we compare all characters of source string and all characters of dest string
• The edit distance between source and dest is
that we retun as an output of this function.
Here is the full Example of Minimum Edit Distance method,
We can use this function as given below,
This distance will give us how far both words, if they match distance is equal to 0.
By Yilmaz Yoru July 1, 2021
In the Artificial Intelligence Technology, mostly in the field of Natural Language Processing (NLP), Computer Linguistics and in other fields of Computer Science, The Edit Distance Method is a way of quantifying how dissimilar two text far from each other in char comparison by counting the minimum number of operations required to transform one string into the other. Для просмотра ссылки Войди
As described in Wikipedia, different types of edit distance methods allow different sets of string operations. For instance:
- The Для просмотра ссылки Войди
или Зарегистрируйся allows deletion, insertion and substitution. - The Для просмотра ссылки Войди
или Зарегистрируйся (LCS) distance allows only insertion and deletion, not substitution. - The Для просмотра ссылки Войди
или Зарегистрируйся allows only substitution, hence, it only applies to strings of the same length. - The Для просмотра ссылки Войди
или Зарегистрируйся allows insertion, deletion, substitution, and the Для просмотра ссылки Войдиили Зарегистрируйся of two adjacent characters. - The Для просмотра ссылки Войди
или Зарегистрируйся allows only Для просмотра ссылки Войдиили Зарегистрируйся.
Mostly examples in C++ about minimum edit distance are written for char arrays, ASCII strings. Для просмотра ссылки Войди
In this post we modified this Minimum Edit Distance method to Unicode Strings for the C++ Builder.
Basically, we use two unicode strings (source and dest) in this method, and for these two string inputs,
• We define T[j] as the edit distance matrix between source and dest[j] chars
• Here we compare all characters of source string and all characters of dest string
• The edit distance between source and dest is
C++:
T[source.Length()][dest.Length()]
Here is the full Example of Minimum Edit Distance method,
Код:
[I]cpp[/I][/I]]int MinEditDistance(UnicodeString source, UnicodeString dest)
{
int sol1, sol2, sol3;
int i, j;
int T[source.Length()+1][dest.Length()+1];[/I][/I]
[I][I] for ( i = 0; i <= source.Length(); i++ ) T[i][0] = i;
for ( j = 0; j <= dest.Length(); j++ ) T[0][j] = j;[/I][/I]
[I][I] for ( i = 1; i <= source.Length(); i++ )
{
for ( j = 1; j <= dest.Length(); j++ )
{[/I][/I]
[I][I]if ( source[1+i-1] == dest[1+j-1] )[/I][/I]
[I][I]{[/I][/I]
[I][I]sol1 = T[i-1][j-1];[/I][/I]
[I][I]T[i][j] = sol1;[/I][/I]
[I][I]}[/I][/I]
[I][I]else[/I][/I]
[I][I]{[/I][/I]
[I][I]sol1 = T[i-1][j];[/I][/I]
[I][I]sol2 = T[i][j-1];[/I][/I]
[I][I]sol3 = T[i-1][j-1];[/I][/I]
[I][I]sol1 = sol1 + 1;[/I][/I]
[I][I]sol2 = sol2 + 1;[/I][/I]
[I][I]sol3 = sol3 + 1;[/I][/I]
[I][I]if ( sol1 <= sol2 && sol1 <= sol3 ) T[i][j] = sol1;[/I][/I]
[I][I]if ( sol2 <= sol1 && sol2 <= sol3 ) T[i][j] = sol2;[/I][/I]
[I][I]if ( sol3 <= sol1 && sol3 <= sol2 ) T[i][j] = sol3;[/I][/I]
[I][I]}[/I][/I]
[I][I] }
}[/I][/I]
[I][I] return(T[source.Length()][dest.Length()]);
}
C++:
int distance;
distance = MinEditDistance("intention" , "execution" );
distance = MinEditDistance("intention" , "intentoin" );