Forward dictionary of Korean Proverbs
Reverse dictionary of Korean Proverbs (using BERT model)
Install virtualenv (if you already installed, you can skip):
pip3 install virtualenvClone this project and set up a virtualenv:
git clone https://github.com/ArtemisDicoTiar/storyteller
cd storyteller
virtualenv storytellerEnv
source storytellerEnv/bin/activate # activate the virtualenv
pip3 install -r ./requirements.txt # install the required libraries onto the virtualenvAfter installing all packages:
If you are trying to download proverbs from opendict(μ°λ¦¬λ§μ), you MUST have your own api token.
The format of .env file should be like:
opendict_api="{your api token}"
db_username="{Database username}"
db_host="{Database internal address}"
db_password="{Database password}"
db_port="{Database port}"
storyteller_schema="{storyteller's schema name}"After you add .env file, the project will automatically use your api token using "secrets.py".
Now the project structure should be look like the following.
.
βββ README.md
βββ .env
βββ requirements.txt
βββ storyteller
βββ collect
βΒ Β βββ modifiers
βΒ Β βΒ Β βββ exampleOrganiser.py
βΒ Β βββ parsers
βΒ Β βΒ Β βββ definitions
βΒ Β βΒ Β βΒ Β βββ namuwikiParser.py
βΒ Β βΒ Β βΒ Β βββ opendictParser.py
βΒ Β βΒ Β βΒ Β βββ wikiquoteParser.py
βΒ Β βΒ Β βββ examples
βΒ Β βΒ Β βββ daumDictCrawl.py
βΒ Β βΒ Β βββ koreaUniversityCrawl.py
βΒ Β βΒ Β βββ naverDictCrawl.py
βΒ Β βββ utils
βΒ Β βββ morphAnalysis.py
βΒ Β βββ proverbUtils.py
βββ examples
βΒ Β βββ collect
βΒ Β βββ explore_parsers_definitions.py
βΒ Β βββ explore_parsers_examples.py
βΒ Β βββ explore_utils_morphAnalysis.py
βΒ Β βββ explore_utils_proverbUtils.py
βββ main
βΒ Β βββ dl_data.py
βββ paths.py
βββ secrets.py
βββ tests
Your storyteller gives you the raw data with dvc: Therefore, you must install dvc. In addition, this repository uses AWS S3 as remote storage for dvc. So, you must add the aws credential information on your system.
pip install awscli
aws configure
// For credential information, request to repository owner.
// Default region name [None]: ap-northeast-2
// Default output format [None]: jsonpip install 'dvc[s3]'
dvc pullThen you will be able to see the data on ./data