Advertisement
mayankjoin3

Regex

Apr 13th, 2025
287
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 6.50 KB | None | 0 0
  1. Set 1: Basics – Literal Matches and Simple Character Classes
  2. Focus: Matching exact strings and basic patterns
  3. Match the word "India"
  4.  
  5. Regex: India
  6.  
  7. Explanation: Matches the exact word "India" anywhere in the text.
  8.  
  9. Match either "Delhi" or "Mumbai"
  10.  
  11. Regex: Delhi|Mumbai
  12.  
  13. Explanation: Matches either of the two city names.
  14.  
  15. Match any single digit (e.g., part of a pincode)
  16.  
  17. Regex: [0-9]
  18.  
  19. Explanation: Matches any single digit from 0 to 9.
  20.  
  21. Match all vowels (useful for analyzing names)
  22.  
  23. Regex: [aeiouAEIOU]
  24.  
  25. Explanation: Matches any vowel (both lowercase and uppercase).
  26.  
  27. Match the word "Namaste" exactly at the start of a sentence
  28.  
  29. Regex: ^Namaste
  30.  
  31. Explanation: ^ anchors the pattern to the beginning of a line.
  32.  
  33. Set 2: Quantifiers and Common Formats
  34. Focus: Using +, *, {n}, ?
  35. Match a sequence of 5 digits (Indian pincode)
  36.  
  37. Regex: \b\d{6}\b
  38.  
  39. Explanation: Matches exactly 6-digit numbers; \b ensures word boundary.
  40.  
  41. Match a word like “Ram”, “Rama”, “Ramaa”, etc.
  42.  
  43. Regex: Rama*
  44.  
  45. Explanation: Matches "Ram", "Rama", "Ramaa", etc. (a* = 0 or more a's)
  46.  
  47. Match names starting with 'A' followed by any 2 letters
  48.  
  49. Regex: A[a-zA-Z]{2}
  50.  
  51. Explanation: Matches strings like “Ami”, “Aru”, etc.
  52.  
  53. Match a string that ends with “ji” (e.g., Gandhi ji)
  54.  
  55. Regex: ji$
  56.  
  57. Explanation: $ anchors to the end of the string.
  58.  
  59. Match any word containing at least one vowel
  60.  
  61. Regex: \b\w*[aeiouAEIOU]\w*\b
  62.  
  63. Explanation: \w* matches any word with at least one vowel.
  64.  
  65. Would you like me to continue with the next 5 examples? The next set will start exploring groups, more complex alternation, and Indian-specific formats like phone numbers and vehicle registration numbers.
  66.  
  67. You said:
  68. Next 5
  69. ChatGPT said:
  70. Great! Here's the next batch.
  71.  
  72. Set 3: Grouping, Alternation, and Indian-Specific Formats
  73. Focus: Using parentheses (), grouping with |, and applying to real-life Indian formats
  74. Match either “Kolkata” or “Chennai” or “Bengaluru”
  75.  
  76. Regex: (Kolkata|Chennai|Bengaluru)
  77.  
  78. Explanation: Grouping with | lets you match any one of these cities.
  79.  
  80. Match Indian mobile numbers starting with 7, 8, or 9
  81.  
  82. Regex: [789]\d{9}
  83.  
  84. Explanation: Matches a 10-digit mobile number starting with 7, 8, or 9.
  85.  
  86. Match vehicle registration numbers like KA01AB1234
  87.  
  88. Regex: [A-Z]{2}\d{2}[A-Z]{2}\d{4}
  89.  
  90. Explanation: Matches Indian vehicle plates (e.g., KA01AB1234, DL09CD6789).
  91.  
  92. Match PAN card format (e.g., ABCDE1234F)
  93.  
  94. Regex: [A-Z]{5}[0-9]{4}[A-Z]
  95.  
  96. Explanation: Matches PAN structure – 5 uppercase letters, 4 digits, 1 uppercase letter.
  97.  
  98. Match Aadhaar numbers (12-digit format, optional spaces)
  99.  
  100. Regex: \d{4} ?\d{4} ?\d{4}
  101.  
  102. Explanation: Matches Aadhaar numbers with or without spaces (e.g., 123456789012 or 1234 5678 9012).
  103.  
  104. Would you like the next 5 examples to focus on email validation, name patterns, and Indian website formats? Or shall we go deeper into advanced regex concepts like lookaheads, backreferences, etc.?
  105.  
  106. You said:
  107. Next 5
  108. ChatGPT said:
  109. Perfect! Here's the next set.
  110.  
  111. Set 4: Indian Names, Emails, and Web Patterns
  112. Focus: Character classes, optional elements, and realistic Indian data
  113. Match typical Indian names starting with capital (e.g., “Amit”, “Suresh”, “Priya”)
  114.  
  115. Regex: [A-Z][a-z]{2,}
  116.  
  117. Explanation: Matches a capital letter followed by at least two lowercase letters – good for basic name detection.
  118.  
  119. Match common Indian email addresses (e.g., [email protected])
  120.  
  121. Regex: [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.(com|in)
  122.  
  123. Explanation: Matches Gmail, Yahoo, etc. ending in .com or .in – commonly used domains in India.
  124.  
  125. Match Indian website URLs like www.irctc.co.in
  126.  
  127. Regex: www\.[a-zA-Z0-9\-]+\.(com|in|co\.in)
  128.  
  129. Explanation: Matches websites ending in common Indian domains like .in and .co.in.
  130.  
  131. Match names with optional title (e.g., Mr. Sharma, Mrs. Rani)
  132.  
  133. Regex: (Mr|Mrs|Ms)\.?\s[A-Z][a-z]+
  134.  
  135. Explanation: Matches optional period after title, then a capitalized surname.
  136.  
  137. Match hashtags used in Indian context (e.g., #India, #IPL2025)
  138.  
  139. Regex: #\w+
  140.  
  141. Explanation: Matches hashtags (words beginning with # and followed by letters/digits).
  142.  
  143. Let me know if you'd like to continue to the next 5. We can start getting into lookaheads, backreferences, or custom validations like matching date formats (e.g., 13/04/2025), or GSTIN numbers.
  144.  
  145. You said:
  146. Next
  147. ChatGPT said:
  148. Awesome, let’s keep it rolling!
  149.  
  150. Set 5: Dates, Times, and Advanced Indian Format Validation
  151. Focus: Date/time patterns, character escapes, and real-world validation
  152. Match Indian date format DD/MM/YYYY (e.g., 13/04/2025)
  153.  
  154. Regex: \b\d{2}/\d{2}/\d{4}\b
  155.  
  156. Explanation: Matches dates in day/month/year format with slashes.
  157.  
  158. Match 24-hour time format (e.g., 14:35, 09:00)
  159.  
  160. Regex: \b([01]\d|2[0-3]):[0-5]\d\b
  161.  
  162. Explanation: Validates 00:00 to 23:59 format.
  163.  
  164. Match Indian currency values (e.g., ?500, ?12,34,567.89)
  165.  
  166. Regex: ?[0-9,]+(\.\d{2})?
  167.  
  168. Explanation: Matches ? followed by Indian-style numerals with optional paise.
  169.  
  170. Match Indian GSTIN (Goods and Services Tax Identification Number)
  171.  
  172. Regex: \d{2}[A-Z]{5}\d{4}[A-Z]{1}[A-Z\d]{1}[Z]{1}[A-Z\d]{1}
  173.  
  174. Explanation: Matches the 15-character alphanumeric GSTIN format.
  175.  
  176. Match Indian STD codes (e.g., 011, 022, 080)
  177.  
  178. Regex: 0\d{2,3}
  179.  
  180. Explanation: Matches STD codes starting with 0 and 23 digits.
  181.  
  182. We're halfway through! ??
  183. Ready for the next 5? Shall we continue with lookaheads, validations, or regional language elements (like Hindi text in Unicode)?
  184.  
  185. You said:
  186. Next
  187. ChatGPT said:
  188. Love the momentum! Here's the next batch.
  189.  
  190. Set 6: Lookaheads, Word Boundaries, and Unicode (Indian Language Basics)
  191. Focus: Lookaheads, word boundaries, and basic Hindi character matching
  192. Match a word that starts with "Bharat" but doesn’t end with "Mata"
  193.  
  194. Regex: \bBharat(?!.*Mata)\b
  195.  
  196. Explanation: Negative lookahead ensures "Mata" doesn’t appear after "Bharat".
  197.  
  198. Match a sentence that contains "India" only if it's followed by "is great"
  199.  
  200. Regex: India(?= is great)
  201.  
  202. Explanation: Positive lookahead ensures "India" is only matched when followed by "is great".
  203.  
  204. Match exact word "chai" as a standalone word
  205.  
  206. Regex: \bchai\b
  207.  
  208. Explanation: Word boundaries (\b) ensure it doesn’t match "chaibreak" or "masalachai".
  209.  
  210. Match Hindi characters (Devanagari script, e.g., ??????)
  211.  
  212. Regex: [\u0900-\u097F]+
  213.  
  214. Explanation: Matches one or more characters in the Devanagari Unicode range.
  215.  
  216. Match sentences ending with "hai" (common in spoken Hindi)
  217.  
  218. Regex: \b\w+\s+hai\.?$
  219.  
  220. Explanation: Matches lines ending with "hai" optionally followed by a period.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement