-
Notifications
You must be signed in to change notification settings - Fork 4
API reference: FileIO V0.1a
FileIO is located in src/fileIO.py. It contains functions used to handle file operations in the Aligner.
The description here is of module version 0.1a.
Parameters:
-
result:Alignment, detailed description of this format -
fileName:str, the file to export to
Parameters:
-
file1:str, the first file to read -
file2:str, the second file to read -
linesToLoad:int, the lines to read
Return:
-
Bitext, detail of this format.
Parameters:
-
file1:str, the first file to read -
file2:str, the second file to read -
file3:str, the third file to read -
linesToLoad:int, the lines to read
Return:
-
Tritext, detail of this format.
Parameters:
-
fileName:str, the Alignment file to read -
linesToLoad:int, the lines to read
Return:
-
GoldAlignment, detail of this format.
UTF-8 text files. Each line contains one sentence, sentences are segmented in which words are separated by space. One language each file.
UTF-8 text files. Each line contains one sentence. Alignments of words of in one sentence are separated by space. Each alignment is represented in the following format:
-
"NN-MM", whereNNandMMare integers, means that there is a certain alignment between theNNth word of the source sentence and theMMth word of the target sentence. In addition,MMcould be of the format:"M1,M2,M3,..."which means that there are certain alignments between theNNth word of the source sentence and each of theMith words of the target sentence. -
"NN?MM", whereNNandMMare integers, means that there is a probable alignment between theNNth word of the source sentence and theMMth word of the target sentence. In addition,MMcould be of the format:"M1,M2,M3,..."which means that there are probable alignments between theNNth word of the source sentence and each of theMith words of the target sentence. -
"NN-MM-TT", whereNNandMMare integers,TTis astrrepresenting the tag of the word. It means that there is a certain alignment between theNNth word of the source sentence and theMMth word of the target sentence, both of which are ofTTtype.