Advertisement
Python253

pdf2csv

Mar 14th, 2024
677
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 1.43 KB | None | 0 0
  1. #!/usr/bin/env python3
  2. # -*- coding: utf-8 -*-
  3. # Filename: pdf2csv.py
  4. # Version: 1.0.0
  5. # Author: Jeoi Reqi
  6.  
  7. """
  8. Description:
  9. This script converts a PDF file (.pdf) to a CSV file (.csv).
  10. It extracts text from each page of the PDF and writes it to a CSV file.
  11.  
  12. Requirements:
  13. - Python 3.x
  14. - PyMuPDF library (install using: pip install PyMuPDF)
  15.  
  16. Usage:
  17. 1. Save this script as 'pdf2csv.py'.
  18. 2. Ensure your PDF file ('example.pdf') is in the same directory as the script.
  19. 3. Install the PyMuPDF library using the command: 'pip install PyMuPDF'
  20. 4. Run the script.
  21.  
  22. Note: Adjust the 'pdf_filename' and 'csv_filename' variables in the script as needed.
  23. """
  24.  
  25. import fitz  # PyMuPDF
  26. import csv
  27.  
  28. def pdf_to_csv(pdf_filename, csv_filename):
  29.     pdf_document = fitz.open(pdf_filename)
  30.  
  31.     with open(csv_filename, 'w', newline='', encoding='utf-8') as csv_file:
  32.         csv_writer = csv.writer(csv_file)
  33.  
  34.         for page_num in range(pdf_document.page_count):
  35.             page = pdf_document[page_num]
  36.             text_lines = page.get_text().split('\n')
  37.  
  38.             for line in text_lines:
  39.                 csv_writer.writerow([line])
  40.  
  41. if __name__ == "__main__":
  42.     # Set the filenames for the PDF and CSV files
  43.     pdf_filename = 'example.pdf'
  44.     csv_filename = 'pdf2csv.csv'
  45.  
  46.     # Convert the PDF to a CSV file
  47.     pdf_to_csv(pdf_filename, csv_filename)
  48.  
  49.     print(f"Converted '{pdf_filename}' to '{csv_filename}'.")
  50.  
  51.  
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement