The DocxComparator is a Python program that enables you to compare, align, and improve text from two .docx files. It uses spaCy NLP to generate an updated .docx file with improved text based on the alignment results.
- Compare and compute the similarity score between two .docx files.
- Align the text from both files to identify similar and dissimilar sentences.
- Analyze and improve the text alignment to suggest enhanced sentences.
- Generate a new .docx file containing the improved text.
- Python 3.x
docxlibraryspacylibrary with English language model (en_core_web_sm)
- Clone the repository or download the zip file.
- Install the required libraries by running
pip install docx spacyandpython -m spacy download en_core_web_sm.
- Create an instance of the
DocxComparatorclass with the file paths of the two .docx files to be compared. - Run the
run()method to perform the comparison, alignment, and text improvement. - The program will print the similarity score, aligned text, and updated text, and it will save the improved text in a new .docx file.
Example:
from DocxComparator import DocxComparator
file1_path = 'path/to/file1.docx'
file2_path = 'path/to/file2.docx'
comparator = DocxComparator(file1_path, file2_path)
comparator.run()This project is licensed under the MIT License - see the LICENSE file for details.
- spaCy - An open-source natural language processing library.
- python-docx - A library to work with .docx files in Python.
- Inspiration - Inspiration for this project came from a similar document text comparison repository.