Advertisement
Python253

ipp7_1_count_digrams

Jun 2nd, 2024
809
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 3.02 KB | None | 0 0
  1. #!/usr/bin/env python
  2. # -*- coding: utf-8 -*-
  3. # Filename: ipp7_1_count_digrams.py
  4. # Version: 1.0.0
  5. # Author: Jeoi Reqi
  6.  
  7. """
  8. Description:
  9.    - This script demonstrates "Chapter 3: Practice Project #1: Finding Digrams: Count Digrams" from the book "Impractical Python Projects" by Lee Vaughan.  
  10.    - Generate letter pairs in Voldemort & find their frequency in a dictionary.
  11.    - Requires dictionary.txt (English dictionary) file in the current working directory.
  12.  
  13. Requirements:
  14.    - Python 3.x
  15.    - The following modules:
  16.        - sys
  17.        - collections
  18.    
  19. Functions:
  20.    - main():
  21.        Main function to generate letter pairs from Voldemort and find their frequency in a dictionary.
  22.    
  23. Usage:
  24.    - Run the script directly in a Python 3.x environment:
  25.            
  26.            $ python ipp7_1_count_digrams.py
  27.    
  28. Additional Notes:
  29.    - This script uses a dictionary.txt file containing English words to find the frequency of letter pairs in the name 'Voldemort'.
  30.    - It generates unique letter pairs from the name and then counts their occurrences in the dictionary file.
  31. """
  32.  
  33. import sys
  34. from collections import defaultdict
  35.  
  36. def load(file):
  37.     """
  38.    Open a text file & turn contents into a list of lowercase strings.
  39.    
  40.    Arguments:
  41.        file (str): The name of the text file to open.
  42.        
  43.    Returns:
  44.        list: A list of lowercase strings representing the contents of the file.
  45.    """
  46.     try:
  47.         with open(file, encoding='utf-8') as in_file:
  48.             loaded_txt = in_file.read().strip().split('\n')
  49.             loaded_txt = [x.lower() for x in loaded_txt]
  50.             return loaded_txt
  51.     except IOError as e:
  52.         print("{}\nError opening {}. Terminating program.".format(e, file))
  53.         sys.exit(1)
  54.  
  55. def main():
  56.     """
  57.    Main function to generate letter pairs from Voldemort and find their frequency in a dictionary.
  58.    """
  59.     # Load dictionary
  60.     print("Loading Dictionary...\n")
  61.     word_list = load('dictionary.txt')
  62.  
  63.     # Define name and convert to lowercase
  64.     name = 'Voldemort'  # (tmvoordle)
  65.     print("Name:", name, "\n\nGathering Digrams...\n")
  66.     name = name.lower()
  67.  
  68.     # Generate unique letter pairs from name
  69.     digrams = {''.join(pair) for pair in zip(name, name[1:])}
  70.     print(*sorted(digrams), sep='\n')
  71.     print("\nNumber of Digrams = {}\n".format(len(digrams)))
  72.  
  73.     # Use regular expressions to find repeating digrams in a word
  74.     mapped: defaultdict[str, int] = defaultdict(int)
  75.     for word in word_list:
  76.         word = word.lower()
  77.         for digram in digrams:
  78.             mapped[digram] += word.count(digram)
  79.  
  80.     print("Digram Frequency Count:\n")
  81.     for k in sorted(mapped):
  82.         print("{} {}".format(k, mapped[k]))
  83.     print("")
  84.    
  85.     print("_" * 100)
  86.     print("\nThis concludes the demonstration of \"Chapter 3: Practice Project #1: Finding Digrams: Count Digrams\"\n\n\t\t\t   Thank you for your attention...   Goodbye!")
  87.     print("_" * 100)
  88.    
  89. if __name__ == '__main__':
  90.     main()
  91.  
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement