Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- report zz_patterns_with_backtracking.
- * An ABAP implementation of a regex search strategy
- * Supports backtracking
- * The basic interface is lif_pattern, with only one
- * method match(). Returns the length of the matching substring
- * or raises lcx_invalid if the given string does not match
- * lcl_pattern_charset is a superclass for [---]{n,m}
- * = Find a string built from some characters, n <= length <= m
- * lcl_pattern_sequence combines several lif_pattern instances
- * and requires the matches one after another.
- * It is here where backtracking is implemented.
- * lcl_matcher is the object which applies a pattern to a given
- * string, and searches for single hits, or all hits.
- * Confer SCN discussion http://scn.sap.com/message/13690938#13690938 for the
- * motivation.
- types: begin of ty_match_result,
- line type i,
- offset type i,
- length type i,
- text type string,
- end of ty_match_result,
- ty_match_result_tab type standard table of ty_match_result.
- * Stop condition for matching
- class lcx_invalid definition inheriting from cx_static_check.
- endclass. "lcx_invalid DEFINITION
- *----------------------------------------------------------------------*
- * INTERFACE lif_pattern DEFINITION
- *----------------------------------------------------------------------*
- * An "atom" of a pattern
- *----------------------------------------------------------------------*
- interface lif_pattern.
- methods match
- importing
- iv_text type csequence
- iv_max type i default -1 " for backtracking
- returning value(ev_length) type i
- raising lcx_invalid.
- endinterface. "lif_pattern DEFINITION
- *----------------------------------------------------------------------*
- * CLASS lcl_matcher DEFINITION
- *----------------------------------------------------------------------*
- class lcl_matcher definition create private.
- public section.
- class-methods: create
- importing io_pattern type ref to lif_pattern
- returning value(eo_matcher) type ref to lcl_matcher.
- methods: match_one importing iv_text type csequence
- exporting es_match type ty_match_result
- raising lcx_invalid,
- match_all importing iv_text type csequence
- exporting et_matches type ty_match_result_tab,
- match_all_in_table
- importing it_text type table
- exporting et_matches type ty_match_result_tab.
- private section.
- data: go_pattern type ref to lif_pattern.
- endclass. "lcl_matcher DEFINITION
- *----------------------------------------------------------------------*
- * CLASS lcl_pattern_sequence DEFINITION
- *----------------------------------------------------------------------*
- class lcl_pattern_sequence definition create private.
- public section.
- interfaces lif_pattern.
- class-methods:
- create importing io_pattern type ref to lif_pattern
- returning value(eo_sequence) type ref to lcl_pattern_sequence.
- methods:
- add importing io_pattern type ref to lif_pattern
- returning value(eo_sequence) type ref to lcl_pattern_sequence.
- private section.
- data: go_left type ref to lif_pattern,
- go_right type ref to lif_pattern.
- endclass. "lcl_pattern_sequence DEFINITION
- *----------------------------------------------------------------------*
- * CLASS lcl_pattern_charset DEFINITION
- *----------------------------------------------------------------------*
- class lcl_pattern_charset definition
- create private.
- public section.
- interfaces lif_pattern.
- class-methods:
- create
- importing
- iv_min type i default 1
- iv_max type i default -1
- iv_charset type csequence
- returning value(eo_pattern) type ref to lif_pattern,
- word
- importing
- iv_min type i default 1
- iv_max type i default -1
- returning value(eo_pattern) type ref to lif_pattern,
- alpha
- importing
- iv_min type i default 1
- iv_max type i default -1
- returning value(eo_pattern) type ref to lif_pattern,
- char
- importing
- iv_min type i default 1
- iv_max type i default 1
- iv_char type csequence
- returning value(eo_pattern) type ref to lif_pattern.
- private section.
- data:
- gv_min type i,
- gv_max type i,
- gv_charset type string.
- endclass. "lcl_pattern_charset DEFINITION
- start-of-selection.
- perform start.
- * ---
- form start.
- data: lt_text type tttext255,
- lo_matcher type ref to lcl_matcher,
- lo_pattern type ref to lcl_pattern_sequence,
- lt_matches type ty_match_result_tab.
- field-symbols: <ls_match> type ty_match_result.
- * Make a test table
- perform make_test_text changing lt_text.
- lo_pattern = lcl_pattern_sequence=>create(
- lcl_pattern_charset=>alpha( iv_max = 1 ) ).
- lo_pattern->add(
- lcl_pattern_charset=>word( iv_min = 3 iv_max = 13 ) )->add(
- lcl_pattern_charset=>char( '-' ) )->add(
- lcl_pattern_charset=>word( iv_min = 1 iv_max = 30 ) ).
- lo_matcher = lcl_matcher=>create( lo_pattern ).
- lo_matcher->match_all_in_table(
- exporting it_text = lt_text
- importing et_matches = lt_matches ).
- * Print the found texts
- loop at lt_matches assigning <ls_match>.
- write: / <ls_match>-text.
- endloop.
- endform. "start
- *----------------------------------------------------------------------*
- * CLASS lcl_matcher IMPLEMENTATION
- *----------------------------------------------------------------------*
- class lcl_matcher implementation.
- method create.
- create object eo_matcher.
- eo_matcher->go_pattern = io_pattern.
- endmethod. "add_pattern
- method match_one.
- data: lv_pos type i,
- lv_strlen type i.
- lv_strlen = strlen( iv_text ).
- while lv_pos < lv_strlen.
- try.
- es_match-offset = lv_pos.
- es_match-length = go_pattern->match( iv_text+lv_pos(*) ).
- es_match-text = iv_text+lv_pos(es_match-length).
- return.
- catch lcx_invalid.
- add 1 to lv_pos.
- endtry.
- endwhile.
- raise exception type lcx_invalid.
- endmethod. "match_one
- method match_all.
- data: ls_match type ty_match_result,
- lv_pos type i,
- lv_strlen type i.
- lv_strlen = strlen( iv_text ).
- clear et_matches.
- while lv_pos < lv_strlen.
- try.
- call method match_one
- exporting
- iv_text = iv_text+lv_pos(*)
- importing
- es_match = ls_match.
- append ls_match to et_matches.
- lv_pos = lv_pos + ls_match-offset + ls_match-length.
- if ls_match-length is initial.
- return. " for zero-width assertions
- endif.
- catch lcx_invalid.
- return.
- endtry.
- endwhile.
- endmethod. "match_all
- method match_all_in_table.
- data: lt_matches type ty_match_result_tab,
- lv_line type i.
- field-symbols: <lv_text> type csequence,
- <ls_match> type ty_match_result.
- loop at it_text assigning <lv_text>.
- lv_line = sy-tabix.
- call method match_all
- exporting
- iv_text = <lv_text>
- importing
- et_matches = lt_matches.
- loop at lt_matches assigning <ls_match>.
- <ls_match>-line = lv_line.
- append <ls_match> to et_matches.
- endloop.
- endloop.
- endmethod. "match_all_in_table
- endclass. " lcl_matcher
- *----------------------------------------------------------------------*
- * CLASS lcl_pattern_charset IMPLEMENTATION
- *----------------------------------------------------------------------*
- class lcl_pattern_charset implementation.
- method lif_pattern~match.
- data: lv_strlen type i,
- lv_pos type i,
- lv_matches type flag,
- lv_max type i.
- if iv_max > -1.
- lv_max = iv_max.
- else.
- lv_max = gv_max.
- endif.
- lv_strlen = strlen( iv_text ).
- lv_pos = gv_min.
- * Greedy match
- while ( lv_pos <= lv_max or lv_max = -1 ) and lv_pos < lv_strlen.
- if iv_text(lv_pos) cn gv_charset.
- exit.
- endif.
- lv_matches = abap_true.
- add 1 to lv_pos.
- endwhile.
- if lv_matches eq abap_true.
- ev_length = lv_pos - 1.
- else.
- raise exception type lcx_invalid.
- endif.
- endmethod. "lif_pattern~match
- method create.
- data: lo_pattern type ref to lcl_pattern_charset.
- create object lo_pattern.
- lo_pattern->gv_min = iv_min.
- lo_pattern->gv_max = iv_max.
- lo_pattern->gv_charset = iv_charset.
- eo_pattern = lo_pattern.
- endmethod. "create
- method word.
- eo_pattern = create( iv_min = iv_min
- iv_max = iv_max
- iv_charset =
- `_0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ` ).
- endmethod. "word
- method char.
- eo_pattern = create( iv_min = iv_min
- iv_max = iv_max
- iv_charset = iv_char ).
- endmethod. "char
- method alpha.
- eo_pattern = create( iv_min = iv_min
- iv_max = iv_max
- iv_charset =
- `abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ` ).
- endmethod. "alpha
- endclass. " lcl_pattern_charset
- *----------------------------------------------------------------------*
- * CLASS lcl_pattern_sequence IMPLEMENTATION
- *----------------------------------------------------------------------*
- class lcl_pattern_sequence implementation.
- method create.
- create object eo_sequence.
- eo_sequence->go_left = io_pattern.
- endmethod. "create
- method add.
- go_right = eo_sequence = create( io_pattern ).
- endmethod. "add
- method lif_pattern~match.
- data: lv_pos type i,
- lv_max type i value -1.
- while lv_pos >= 0.
- lv_pos = go_left->match( iv_text = iv_text iv_max = lv_max ).
- if go_right is bound.
- try.
- lv_pos = lv_pos + go_right->match( iv_text+lv_pos(*) ).
- catch lcx_invalid.
- lv_max = lv_pos - 1. " Release 1 char (backtracking)
- continue.
- endtry.
- endif.
- ev_length = lv_pos.
- return.
- endwhile.
- endmethod. "lif_pattern~match
- endclass. "lcl_pattern_sequence IMPLEMENTATION
- * Make a test text
- form make_test_text changing ct_text type tttext255.
- define _append.
- append &1 to ct_text.
- end-of-definition.
- _append :
- `Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy`,
- `eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam`,
- `voluptua. At VBAP-MATNR accusam et justo duo dolores et ea rebum. Stet `,
- `clita kasd gubergren, no sea sanctus est Lorem ipsum dolor sit`,
- `amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam `,
- `nonumy eirmod tempor in EKPO-PSTYP labore et dolore magna aliquyam erat,s`,
- `ed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum`.
- endform. "make_test_text
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement