Advertisement
bob_f

read_para_text_from_an_epub.py

Jan 9th, 2024
773
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 0.60 KB | None | 0 0
  1. import ebooklib
  2. from ebooklib import epub
  3. from bs4 import BeautifulSoup
  4.  
  5. def chapter_to_str(chapter):
  6.     soup = BeautifulSoup(chapter.get_body_content(), 'html.parser')
  7.     text = [para.get_text() for para in soup.find_all('p')]
  8.     return ' '.join(text)
  9.  
  10. file_name: str = r'C:\Users\C191773\OneDrive - Thomson Reuters Incorporated\Documents\The Data Vault Guru_ a pragmati - Patrick Cuba.epub'
  11. book = epub.read_epub(file_name)
  12. documents = list(book.get_items_of_type(ebooklib.ITEM_DOCUMENT))
  13. texts = {}
  14.  
  15. for document in documents:
  16.     texts[document.get_name()] = chapter_to_str(document)
  17.  
  18. pass
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement