Skip to content

Recommend

LeeYejun1324 edited this page May 22, 2023 · 40 revisions

KeyBERT

  • Minimize redundancy and maximize diversity of results in text summarization tasks.
  • Select the keyword/keyphrase that is most similar to the document. Then select new candidates repeatedly that are similar to the document and are not similar to the already selected keyword/keyphrase

Starting Guide

Install requirements.txt

pip install -r requirements.txt

Main Function

def key_bert(user_id, database, model_ST, model_W2V, category):

Sub Function

  • The higher the Diversity, the more various keywords are extracted.
  • top_n : Number of keywords to be extracted
def mmr(doc_embedding, candidate_embeddings, words, top_n, diversity):

Word2Vec

We use this W2V model to calculate similarity between keywords from user's chat and categories. And it returns max_ctg that has the largest similarity above 0.8.

Starting Guide

Install konlpy and soynlp

pip3 install konlpy
pip3 install soynlp

category_connect calls key_bert

def category_connect(...):
   bert_keyword = key_bert(uid, db, model_ST, model_W2V, category)

Get keyword_mmr to use.

def key_bert(user_id, database, model_ST, model_W2V, category):
...
    keyword_mmr = mmr(doc_embedding, candidate_embeddings, candidates, top_n=5, diversity=0.7)

How to work

First. Cut the keywords that get from Keybert into spaces.

Second. Check the words are in the trained model.

Third. Set max_score = 0.79999 to find the largest similarity above 0.8. The high score means it has large similarity.

Fourth. Calculate similarity using model_W2V.

Finally. It returns max_ctg that has the highest similarity of user's keywords.

Dataset

주제별 텍스트 일상 대화 데이터


Adding Category

Get max_ctg to bert_keyword from function key_bert.

def category_connect(...):
   bert_keyword = key_bert(uid, db, model_ST, model_W2V, category)

Update users list in firebase fav > category(bert_keyword).

def KoGPT(...):
...
   db.collection("fav").document(bert_keyword).update({"users": firestore.ArrayUnion([email])})

Now, user can enter the new category board.

Clone this wiki locally