Advertisement
Python253

remove_diacritics

Mar 5th, 2024
307
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 1.22 KB | None | 0 0
  1. #!/usr/bin/env python3
  2. # -*- coding: utf-8 -*-
  3. # Filename: remove_diacritics.py
  4. # Author: Jeoi Reqi
  5.  
  6. """
  7. This Python script removes diacritics (Eg; é, è, ê, ñ, ü, ç, ả, ă & å) from a specified text file and saves the results.
  8.  
  9. Requirements:
  10. - Python 3
  11. - unidecode library (install using 'pip install unidecode')
  12. """
  13.  
  14. import os
  15. from unidecode import unidecode
  16.  
  17. def remove_diacritics(input_file, output_file):
  18.     try:
  19.         with open(input_file, 'r', encoding='utf-8') as file:
  20.             text = file.read()
  21.             text_without_diacritics = unidecode(text)
  22.  
  23.         with open(output_file, 'w', encoding='utf-8') as file:
  24.             file.write(text_without_diacritics)
  25.  
  26.         print(f"Diacritics removed successfully. Output saved to '{output_file}'.")
  27.  
  28.     except Exception as e:
  29.         print(f"Error: {str(e)}")
  30.  
  31. if __name__ == "__main__":
  32.     input_filename = "input.txt"  # Change 'input.txt' to the name of your input file
  33.     output_filename = "output.txt"  # Change 'output.txt' to the desired name for the output file
  34.  
  35.     input_path = os.path.join(os.getcwd(), input_filename)
  36.     output_path = os.path.join(os.getcwd(), output_filename)
  37.  
  38.     remove_diacritics(input_path, output_path)
  39.  
  40.  
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement