Advertisement
dipBRUR

Longest Common substring of two string

Aug 21st, 2017
355
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Ada 2.94 KB | None | 0 0
  1. 7. Longest Common substring of two string :
  2.                                            Given two string S and T. It is required to find their
  3.    longest common substring, that is, a string X, which is a substring of S and T at the same time.
  4.    Complexity Requirement's : O(length(S) + length(T))
  5.    Algorithm : We build the suffix automaton for the string S.
  6.    We follow the string T on the automaton and look for the longest suffix that each prefix has
  7.    appeared in S. In other words, for each position in the string T, we want to find the longest
  8.    common substring for which S and T ends at that position.
  9.    To achieve this, we define two variables
  10.                                 v = current state
  11.                                 l = current length
  12.    These two variables describe the current matching part, its length and state, which string
  13.    (which cannot be determined if the length is not stored, since a state may match multiple
  14.    strings of different lengths).
  15.    Initially, v = To
  16.               l = 0
  17.    Now, we consider the character T[i], we want to find the answer to this position.
  18.         1. If the state v in the automaton has a transfer of the symbol T[i], we can simply take
  19.            the transfer and then add the length l to one.
  20.         2. If the state v does not have the branch, we should try to shorten the current matching
  21.            part, for which we go along the suffix link v = link(v).
  22.            In this case, the current matching length must be reduced, but the remaining part is
  23.            as much as possible many. Obviously, let l = len(v).
  24.            If the newly arrived state still does not have the transfer we want, then we must go along
  25.            the suffix link again and decrease the value of l until we find (That is, the character T[i]
  26.            does not appear in S, so let v = l = 0 and then continue processing the next i).
  27.     The answer to the original question is the maximum that has been reached.
  28.     The progressive complexity of this traversal method is O(length(T)), because in a move we either add
  29.     one, or follow the suffix link several times, each time will be strictly reduced l. Thus, the sum of
  30.     the total reductions cannot exceed length(T), which means linear time complexity.
  31.  
  32.  
  33. int LCS(string s, string t)
  34. {
  35.     init();
  36.  
  37.     int len_s = s.length();
  38.     int len_t = t.length();
  39.  
  40.     for (int i=0; i<len_s; i++)
  41.     {
  42.         sz_extend(s[i]);
  43.     }
  44.  
  45.     int v = 0;
  46.     int l = 0;
  47.     int best = 0;        // LCS length
  48.     int bestPos = 0;     // LCS starting pos in T
  49.  
  50.     for (int i=0; i<len_t; i++)
  51.     {
  52.         while (v && !st[v].next.count(t[i]))
  53.         {
  54.             v = st[v].link;
  55.             l = st[v].len;
  56.         }
  57.  
  58.         if (st[v].next.count(t[i]))
  59.         {
  60.             v = st[v].next[t[i]];
  61.             l++;
  62.         }
  63.  
  64.         if (l > best)
  65.         {
  66.             best = l;
  67.             bestPos = i;
  68.         }
  69.     }
  70.     return t.substr(bestPos-best+1, best);
  71. }
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement