Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- #!/bin/sed -f
- # Removes duplicate fields in a | separated file
- # e.g. foo|bar|foo|quz|bar
- # becomes foo|bar|quz
- : restart
- # The s instruction needs some explanation.
- # The regular expression consists of the following parts
- # \1: \(^\||\)
- # Beginning of line or termination of last field
- # Note that we use | as field separator
- # \2: \([^|]\+\)
- # Everything between \1 and the next field
- # We can use the \+ extension because we need an extension in \4 anyway
- # \3: \(.*\)
- # Everything between \2 and \4
- # \4: \(|\2\)
- # A field identical to \2 plus field separator
- # \5: \(|\|$\)
- # Field separator closing \4 or end of line
- #
- # The replacement \1\2\3\5 excludes \4. So the duplicated field is removed
- s/\(^\||\)\([^|]\+\)\(.*\)\(|\2\)\(|\|$\)/\1\2\3\5/
- # Loop if the s instruction matched something until all duplicates are gone
- # s///g does not work in this case as changes may overlap
- t restart
- # Handling of repeated empty fields has to happen separately
- # The regex matches || or | followed by end of line
- # The replacement is a single | unless we matched the end of line
- # Then it is the null line matched by $
- #
- # The suffix 2g is a GNU extension and replaces all but the first match
- # For non-GNU, may be replaced with another loop
- s/|\(|\|$\)/\1/2g
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement