Advertisement
Python253

get_duplicate_md5

May 11th, 2024
856
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 2.35 KB | None | 0 0
  1. #!/usr/bin/env python
  2. # -*- coding: utf-8 -*-
  3. # Filename: get_duplicate_md5.py
  4. # Version: 1.0.0
  5. # Author: Jeoi Reqi
  6.  
  7. """
  8. Description:
  9.    This script finds duplicate files within the current working directory based on their MD5 hashes.
  10.  
  11. Expected Output:
  12. -------------------------------------------------------------------------------
  13.    Duplicate files with MD5 hash: 098f6bcd4621d373cade4e832627b4f6
  14.  
  15.    C:\\Users\\pytho\\OneDrive\\Desktop\\MyScripts\\New PY Scripts\\test.txt
  16.    C:\\Users\\pytho\\OneDrive\\Desktop\\MyScripts\\New PY Scripts\\test_new_name.txt
  17.  
  18.    You can safely delete the duplicate file(s).
  19.  
  20.    Program has completed without errors.    GoodBye!
  21. -------------------------------------------------------------------------------
  22.  
  23. Usage:
  24.    - The script prints the paths of duplicate files along with their MD5 hash.
  25.    - To use this script, simply run it. It will search for duplicate files within the current working directory.
  26.  
  27. Additional Notes:
  28.    - This script only searches for duplicate files within the current working directory.
  29.    - It calculates the MD5 hash of each file to identify duplicates.
  30. """
  31.  
  32. import os
  33. import hashlib
  34. from collections import defaultdict
  35.  
  36. # Function to calculate the MD5 hash of a file
  37. def get_md5_hex(file_path):
  38.     with open(file_path, "rb") as f:
  39.         md5 = hashlib.md5()
  40.         for chunk in iter(lambda: f.read(4096), b""):
  41.             md5.update(chunk)
  42.     return md5.hexdigest()
  43.  
  44. # Function to find duplicate files in the current working directory
  45. def find_duplicate_files():
  46.     md5_hashes_seen = defaultdict(list)
  47.  
  48.     cwd = os.getcwd()  # Get the current working directory
  49.     for root, _, files in os.walk(cwd):
  50.         for file in files:
  51.             file_path = os.path.join(root, file)
  52.             md5_hash = get_md5_hex(file_path)
  53.             md5_hashes_seen[md5_hash].append(file_path)
  54.  
  55.     for md5_hash, files in md5_hashes_seen.items():
  56.         if len(files) > 1:
  57.             print("\nDuplicate files with MD5 hash:", md5_hash, "\n")
  58.             for file_path in files:
  59.                 print(file_path)
  60.             print(
  61.                 "\n\tYou can safely delete the duplicate file(s).\n\nProgram has completed without errors.\tGoodBye!\n"
  62.             )
  63.  
  64. # Main function
  65. def main():
  66.     find_duplicate_files()
  67.  
  68. if __name__ == "__main__":
  69.     main()
  70.  
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement