EricGuo5513 · icedwater · Sep 12, 2024 · Sep 12, 2024 · Sep 18, 2024 · Sep 18, 2024
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,29 @@
+# vim swaps
+.*.sw?
+
+# python binaries
+*.py[oc]
+
+# numpy arrays
+*.np[yz]
+
+# HumanML3D texts, when unzipped
+HumanML3D/texts/*.txt
+
+# Custom texts, once processed
+Custom/texts
+
+# amass or body model data, when unzipped
+amass_data/
+body_models/
+
+# zip files
+*.zip
+*.bz2
+*.tar
+*.gz
+*.tar.gz
+*.tgz
+
+# animations
+*.mp4
diff --git a/README.md b/README.md
@@ -26,15 +26,16 @@ We double the size of HumanML3D dataset by mirroring all motions and properly re
 [KIT Motion-Language Dataset](https://motion-annotation.humanoids.kit.edu/dataset/) (KIT-ML) is also a related dataset that contains 3,911 motions and 6,278 descriptions. We processed KIT-ML dataset following the same procedures of HumanML3D dataset, and provide the access in this repository. However, if you would like to use KIT-ML dataset, please remember to cite the original paper.
 </details>
 
-If this dataset is usefule in your projects, we will apprecite your star on this codebase. 😆😆
-## Checkout Our Works on HumanML3D
+If this dataset is useful in your projects, we will appreciate your star on this codebase. 😆😆
+
+## Checkout Our Work on HumanML3D
 :ok_woman: [T2M](https://ericguo5513.github.io/text-to-motion) - The first work on HumanML3D that learns to generate 3D motion from textual descriptions, with *temporal VAE*.  
 :running: [TM2T](https://ericguo5513.github.io/TM2T) - Learns the mutual mapping between texts and motions through the discrete motion token.  
 :dancer: [TM2D](https://garfield-kh.github.io/TM2D/) - Generates dance motions with text instruction.  
 :honeybee: [MoMask](https://ericguo5513.github.io/momask/) - New-level text2motion generation using residual VQ and generative masked modeling.
 
 ## How to Obtain the Data
-For KIT-ML dataset, you could directly download [[Here]](https://drive.google.com/drive/folders/1D3bf2G2o4Hv-Ale26YW18r1Wrh7oIAwK?usp=sharing). Due to the distribution policy of AMASS dataset, we are not allowed to distribute the data directly. We provide a series of script that could reproduce our HumanML3D dataset from AMASS dataset. 
+For KIT-ML dataset, you could directly download [[Here]](https://drive.google.com/drive/folders/1D3bf2G2o4Hv-Ale26YW18r1Wrh7oIAwK?usp=sharing). Due to the distribution policy of AMASS dataset, we are not allowed to distribute the data directly. We provide a series of scripts that could reproduce our HumanML3D dataset from AMASS dataset.
 
 You need to clone this repository and install the virtual environment.
 
@@ -49,6 +50,8 @@ conda env create -f environment.yaml
 conda activate torch_render
 ```
 
+Alternatively, install `requirements.txt` into the virtual environment using the workflow of your choice.
+
 In the case of installation failure, you could alternatively install the following:
 ```sh
 - Python==3.7.10
@@ -102,7 +105,7 @@ After all, the data under folder "./HumanML3D" is what you finally need.
 ```
 HumanML3D data follows the SMPL skeleton structure with 22 joints. KIT-ML has 21 skeletal joints. Refer to paraUtils for detailed kinematic chains.
 
-The file named in "MXXXXXX.\*" (e.g., 'M000000.npy') is mirrored from file with correspinding name "XXXXXX.\*" (e.g., '000000.npy'). Text files and motion files follow the same naming protocols, meaning texts in "./texts/XXXXXX.txt"(e.g., '000000.txt') exactly describe the human motions in "./new_joints(or new_joint_vecs)/XXXXXX.npy" (e.g., '000000.npy')
+The file named in "MXXXXXX.\*" (e.g., 'M000000.npy') is mirrored from file with corresponding name "XXXXXX.\*" (e.g., '000000.npy'). Text files and motion files follow the same naming protocols, meaning texts in "./texts/XXXXXX.txt"(e.g., '000000.txt') exactly describe the human motions in "./new_joints(or new_joint_vecs)/XXXXXX.npy" (e.g., '000000.npy')
 
 Each text file looks like the following:
 ```sh
@@ -111,11 +114,11 @@ the standing person kicks with their left foot before going back to their origin
 a man kicks with something or someone with his left leg.#a/DET man/NOUN kick/VERB with/ADP something/PRON or/CCONJ someone/PRON with/ADP his/DET left/ADJ leg/NOUN#0.0#0.0
 he is flying kick with his left leg#he/PRON is/AUX fly/VERB kick/NOUN with/ADP his/DET left/ADJ leg/NOUN#0.0#0.0
 ```
-with each line a distint textual annotation, composed of four parts: *original description (lower case)*, *processed sentence*, *start time(s)*, *end time(s)*, that are seperated by *#*.
+with each line a distinct textual annotation, composed of four parts: *original description (lower case)*, *processed sentence*, *start time(s)*, *end time(s)*, that are separated by *#*.
 
 Since some motions are too complicated to be described, we allow the annotators to describe a sub-part of a given motion if required. In these cases, *start time(s)* and *end time(s)* denotes the motion segments that are annotated. Nonetheless, we observe these only occupy a small proportion of HumanML3D. *start time(s)* and *end time(s)* are set to 0 by default, which means the text is captioning the entire sequence of corresponding motion. 
 
-If you are not able to install ffmpeg, you could animate videos in '.gif' instead of '.mp4'. However, generating GIFs usually takes longer time and memory occupation.
+If you are not able to install ffmpeg, you could animate videos in '.gif' instead of '.mp4'. However, generating GIFs usually takes longer time and uses more memory.
 
 ## Citation
 

diff --git a/animation.ipynb b/animation.ipynb
@@ -2,7 +2,7 @@
  "cells": [
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -14,7 +14,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -112,7 +112,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -122,7 +122,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -132,7 +132,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -157,17 +157,9 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": null,
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "100%|██████████| 10/10 [00:14<00:00,  1.43s/it]\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
     "for npy_file in tqdm(npy_files):\n",
     "    data = np.load(pjoin(src_dir, npy_file))\n",
@@ -177,20 +169,13 @@
     "#   You may set the title on your own.\n",
     "    plot_3d_motion(save_path, kinematic_chain, data, title=\"None\", fps=20, radius=4)"
    ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
   }
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python [conda env:torch_render]",
+   "display_name": "hml3d",
    "language": "python",
-   "name": "conda-env-torch_render-py"
+   "name": "python3"
   },
   "language_info": {
    "codemirror_mode": {
@@ -202,7 +187,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.10"
+   "version": "3.9.19"
   }
  },
  "nbformat": 4,

diff --git a/annotate_texts.py b/annotate_texts.py
@@ -0,0 +1,114 @@
+"""
+Given a text file with raw descriptions of actions, tag each description with
+parts-of-speech tags, then write them in the training format to a new file.
+"""
+
+import spacy
+from tqdm import tqdm
+
+nlp = spacy.load('en_core_web_sm')
+
+def process_text(sentence: str) -> tuple[list[str], list[str]]:
+    """
+    Return lists of words and their parts of speech (POS) tags
+    for a given sentence.
+
+    :param sentence:    string to be tagged
+    :return word_list:  list of tokens found in the sentence
+    :return pos_list:   list of part-of-speech tags by token
+    """
+    sentence = sentence.replace('-', '')
+    doc = nlp(sentence)
+    word_list = []
+    pos_list = []
+    for token in doc:
+        word = token.text
+        if not word.isalpha():
+            continue
+        if (token.pos_ in ("NOUN", "VERB")) and (word != 'left'):
+            word_list.append(token.lemma_)
+        else:
+            word_list.append(word)
+        pos_list.append(token.pos_)
+    return (word_list, pos_list)
+
+
+def read_text_from_file(input_file: str) -> list[str]:
+    """
+    Read the text from a file of action descriptions for parsing.
+
+    :param input_file:  string path to the input file
+    :return result:     list of strings read from the input file
+    """
+    with open(input_file, 'r', encoding="utf-8") as infile:
+        raw_lines = infile.readlines()
+        result = [line.strip() for line in raw_lines]
+
+    return result
+
+
+def prepare_combined_line(sentence: str, start_time: float=0.0, end_time: float=0.0) -> str:
+    """
+    Given each sentence, parse it and attach tags to each token.
+    Then include the description start and end time to be edited if needed.
+    By default, these are 0.0 if we are describing the full sequence.
+
+    :param sentence:    string containing an input sentence
+    :param start_time:  float representing start time of the description
+    :param end_time:    float representing end time of the description
+    :return combined_line:  string containing sentence#tagged_sentence#start_time#end_time
+    """
+    (words, tags) = process_text(sentence)
+    tagged_sentence = ' '.join([f"{n[0]}/{n[1]}" for n in zip(words, tags)])
+    combined_line = f"{sentence}#{tagged_sentence}#{start_time}#{end_time}\n"
+
+    return combined_line
+
+
+def write_output_file(output_list: list[str], output_file: str):
+    """
+    Write a specified output list to a specified file.
+
+    :param output_list:     list of strings to write
+    :param output_file:     string path to output file location
+    """
+    with open(output_file, 'w', encoding="utf-8") as outfile:
+        outfile.writelines(output_list)
+
+
+def tag_one_file(input_file: str, output_file: str):
+    """
+    Do the tagging for one input file and return one output file.
+
+    :param input_file:      string path to file with untagged descriptions
+    :param output_file:     string path to file for storing results
+    """
+    output = []
+    strings = []
+
+    strings = read_text_from_file(input_file=input_file)
+
+    for input_line in strings:
+        output_line = prepare_combined_line(input_line)
+        output.append(output_line)
+
+    write_output_file(output_list=output, output_file=output_file)
+
+
+def main():
+    """
+    Allow for directories to be passed in
+    """
+    from os import listdir
+    from os.path import join as pjoin
+
+    data_dir = "Custom/texts/raw"
+    save_dir = "Custom/texts"
+
+    for text_file in tqdm(listdir(data_dir)):
+        print(f"Processing {text_file}...")
+        tag_one_file(input_file=pjoin(data_dir, text_file), output_file=pjoin(save_dir, text_file))
+
+
+if __name__ == "__main__":
+    main()