Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- 6. All appear location query :
- Given a text T, the query format is as follows :
- Given the pattern string P, it is required to give all occurences of P in T
- (The interval can intesect).
- Complexity requirements : Pre-process O(length(T)), single pass O(length(P) +
- answer(P)), where answer(P) is the size of the answer collection. i.e requires
- time complexity and input and output rank.
- Algorithm : for the text T to establish a suffix automaton, similar to the prev-
- ious problem, in the process of building automaton for each state to calculate
- the first occurrence of the end of the first pos.
- Suppose we have received an inquiry - the string P. We found the corresponding
- state t.
- Obviously should return firstpos(t). What are the locations? We consider the states
- of the automaton that contain the string P, that is, those whose P is its suffix.
- In other words, we need to find out all the state through the suffix link to the
- state t.
- Therefore, in order to solve this problem, we need to store for each node all the
- suffixes points to it. In order to find the answer, we need to carry out DFS/BFS
- along with these rollover suffix links starting from state t.
- This traversal will end in O (answer (P)) time because we will not access the same
- state twice (since each suffix link points to only one point, so there are no two
- paths to the same state).
- However, the firstpos value of the two states may be the same if a state is copied
- from another copy. But this does not affect progressive complexity because each
- non-replicated node will only have one copy.
- In addition, you can easily remove those repetitive locations if we do not consider
- those copies of the state of the firstpos. In fact, all copies of the state are the
- "parent" state of the suffix link point. So, we record the tag is_clon for each node,
- and we do not consider the firstpos that is the state of is_clon = true. So that we
- get the answer (P) non-repetitive state.
- Give an offline version of the implementation:
- struct state {
- ....
- bool is_clon;
- int first_pos;
- vector <int> inv_link;
- };
- .... suffix automato build
- for (int v=1; v<sz; v++)
- st[st[v].link].inv_link.push_back(v);
- ...ans : return all occurrences (may be overlap)
- void output_all_occurences(int v, int P_length)
- {
- if (!st[v].is_clon)
- cout << st[v].first_pos - P_length + 1 << endl;
- for (size_t i=0; i<st[v].inv_link.size(); i++)
- output_all_occurences(st[v[.inv_link[i], P_length);
- }
- 8. The query is not the shortest string that appears in the text :
- Problem : Given string S and alphabet. Ask to find a shortest string, so it is not a
- string of S.
- Complexity requirements. O (length (s)).
- Algorithm : Dynamic programming is performed on the suffix automaton of the string S.
- Let d [v] be the answer to node v, that is, we have entered a part of the string,
- matching v, and we want to find the minimum number of characters to be added to reach
- a nonexistent transfer.
- Calculating d [v] is very simple. If a transfer does not exist at v, then d [v] = 1: use
- this character to "jump out" the automaton to get the string.
- Otherwise, a string can not meet the requirements, so we have to take the smallest of all
- the characters in the answer:
- d[v] = 1 + min(d[w])
- The answer to the original question is equal to d [t_0], and the desired string can be
- obtained by the method of recording the traversing path.
Add Comment
Please, Sign In to add comment