-
Notifications
You must be signed in to change notification settings - Fork 91
Description
Dear Author:
Thank you for this amazing work. I have several problems may need your kind help:
Could you please provide details about the datasets referenced in the NLU file code? Specifically, I would like to know:
What is the format of the datasets used in the code?
Where can these datasets be downloaded from?
I noticed that while Hugging Face provides the data in parquet format, the official source doesn't separate the data into train, test, and validation sets. (for example, I download the movie-review dataset, which mentioned in mr.py in this link:https://huggingface.co/datasets/cornell-movie-review-data/rotten_tomatoes/tree/main. But fail to run the code. Specifically, I have changed the code like that:
) Could you clarify:
What format is the dataset that's being directly loaded from disk in the code?
What is the exact source for downloading these datasets?
Or help me to look into the problem?
This information would help me better understand the data pipeline implementation in the code.

