Description: 2 Methods were implemented to translate any corpus of any size from one language to another.
The first method uses a Chrome WebDriver
to mock the activity of translating text by the original google translate site.
The second method uses a Translator API
and send batch data samples to be translated.
All dependencies with the exact downloading version are listed in the requirements.txt file
To install all of the dependencies, run the following script from the root of your project's directory:
pip install -r requirements.txt -v
Some variables that should be configured:
translate_from=r'Path/to/source/Corpus/'
translate_to=r'Path/to/new/translated/Corpus'
lang_code = 'Target Language'
max_character_limit=5000 #Max Char limit (google translate's max limit per translation is 5000)
After configuration, run one of the following scripts:
python translation_api.py
python translation_task.py
The configuration should be fed as an argument instead of static variables... maybe a contributer can fix it if I was lazy to do so :)
To be added, spoiler (it's open source :D)
- Project was inspired during building my thesis experiment as I needed to tranlsate an English Corpus consisting of 200K articles into the Arabic language