Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- # WHAT IS THIS? - This script is used to convert markdown files into HTML, which then dynamically inserts the Meta Title and Body Content into a pre-existing HTML template. Output files come in .html, and are named after the h1 tag. It's great for programmatically expanding content in HTML sites, or for edge-case mass-content generation formatting.
- # INSTRUCTIONS - to configure and execute the provided script, first ensure that Python is installed on your system, with PATH, and that the mistune, tqdm, and pathlib packages are installed, which can be done using pip install mistune tqdm pathlib. Next, update the input_dir variable to the directory path containing your Markdown .txt files, and set the output_dir to your desired location for the HTML files. The error_log path should also be set to where you'd like error logs to be saved. Check that these directories exist or adjust the script to create them. Then, customize the html_template string, within the triple quotes, to match the desired HTML structure and styles for your output files. The example includes title and content placeholders, which must be included in the correct areas, in order to run, properly. Finally, you can save the script on your desktop as a .py, and then if you run the script by name, directly from command line. In your CLI, it will show a progress bar, for tracking very large batch progress.
- # WARNINGS - this script looks for single curly brackets, to dynamically insert things, so if you have anything, rendering in your HTML Content Placeholder, which uses curly brackets, you have to double them, on both sides, so that python won't freak-out. Similarly, your input/output paths have to have double backslashes, or python will freak-out. Markdown tables work, but there can be spacing oddities. To capture and log errors, in the static text file, you'll need to wrap the part of your code that could fail in a try-except block and write any exceptions to the error log file. It's easier to just read and handle them from command line.
- import os
- import re
- import mistune
- from pathlib import Path
- from tqdm import tqdm
- # Input and output directories (must have double backslash)
- input_dir = 'C:\\Users\\EXAMPLE\\Desktop\\FOLDER\\MARKDOWN1'
- output_dir = 'C:\\Users\\EXAMPLE\\Desktop\\FOLDER\\HTML2'
- error_log = 'C:\\Users\\EXAMPLE\\Desktop\\FOLDER\\error-log.txt'
- # Your HTML template designed to accommodate title and content appropriately
- html_template = """
- <!DOCTYPE html>
- <html lang="en">
- <head>
- <meta charset="UTF-8">
- <title>{title_placeholder}</title>
- <!-- other head elements -->
- </head>
- <body>
- {content_placeholder}
- </body>
- </html>
- """
- # Function to convert Markdown to HTML using Mistune with table support
- def markdown_to_html(md):
- # Create a markdown instance and enable the table plugin
- markdown = mistune.create_markdown(plugins=['table'])
- return markdown(md)
- # Function to extract the title (from the first H1 tag in Markdown)
- def extract_title(md_content):
- lines = md_content.split('\n')
- title = None
- for line in lines:
- if line.startswith('# '):
- title = line.lstrip('# ').strip()
- md_content = '\n'.join(lines[lines.index(line):])
- break
- return title, md_content
- # Ensure output directory exists
- Path(output_dir).mkdir(parents=True, exist_ok=True)
- # Process each markdown file in the input directory
- for md_filename in tqdm(os.listdir(input_dir)):
- if md_filename.endswith('.txt'):
- with open(os.path.join(input_dir, md_filename), 'r', encoding='utf-8') as md_file:
- markdown_content = md_file.read()
- # Extract the title from the Markdown content
- title, markdown_content = extract_title(markdown_content)
- if not title: # If title is not found, use a default or filename
- title = 'Untitled'
- # Convert Markdown content to HTML
- html_content = markdown_to_html(markdown_content)
- # Replace placeholders with actual content
- final_html = html_template.format(
- title_placeholder=title,
- content_placeholder=html_content
- )
- # Format the filename based on the title
- filename_title = re.sub(r'\s+', '-', title.lower())
- filename_title = re.sub(r'[^\w-]', '', filename_title)
- # Output the final HTML to a file
- output_file_path = os.path.join(output_dir, f'{filename_title}.html')
- with open(output_file_path, 'w', encoding='utf-8') as html_file:
- html_file.write(final_html)
- # Print completion message
- print("All files processed. Check the error log for errors.")
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement