diff --git "a/Week12_\353\263\265\354\212\265\352\263\274\354\240\234_\355\225\250\354\230\210\353\246\260.ipynb" "b/Week12_\353\263\265\354\212\265\352\263\274\354\240\234_\355\225\250\354\230\210\353\246\260.ipynb" new file mode 100644 index 0000000..03f0f3b --- /dev/null +++ "b/Week12_\353\263\265\354\212\265\352\263\274\354\240\234_\355\225\250\354\230\210\353\246\260.ipynb" @@ -0,0 +1,580 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + } + }, + "cells": [ + { + "cell_type": "markdown", + "source": [ + "##**Text generation Experiment**\n", + "\n", + "- 이번 복습과제에는 GPT-2 모델을 사용한 텍스트 생생을 다룹니다. 🙂\n", + "- GPT-2는 약 40GB의 인터넷 텍스트 데이터로 훈련된 모델로 다음 단어 예측(next word prediction)을 목적으로 학습이 되었습니다\n", + "- Beam Search, Top-k sampling, Top-p sampling 과 같은 다양한 디코딩 기법들을 실험해보겠습니다." + ], + "metadata": { + "id": "8Gxy65cu8irm" + } + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "id": "_M2apVV-8cyb" + }, + "outputs": [], + "source": [ + "#reproducability을 위해 해당 코드를 실행해주세요\n", + "SEED = 34\n", + "#max number of words in output text\n", + "MAX_LEN = 70" + ] + }, + { + "cell_type": "code", + "source": [ + "# 실험할 문장입니다.\n", + "input_sequence = \"I don't know about you, but there's only one thing I want to do after a long day of work\"" + ], + "metadata": { + "id": "Kd6ZRQmG8gWL" + }, + "execution_count": 2, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# transformers을 가져와서 사용해봅시다\n", + "from transformers import TFGPT2LMHeadModel, GPT2Tokenizer\n", + "\n", + "# pretrained large GPT2 tokenizer 와 GPT2 model를 가져와주세요.\n", + "tokenizer = GPT2Tokenizer.from_pretrained(\"gpt2-large\")\n", + "GPT2 = TFGPT2LMHeadModel.from_pretrained(\"gpt2-large\", pad_token_id=tokenizer.eos_token_id)\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "pEjO6IVs8gS0", + "outputId": "44d41d65-ff9a-406e-e7a7-a2642cfb8cba" + }, + "execution_count": 3, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.11/dist-packages/huggingface_hub/utils/_auth.py:94: UserWarning: \n", + "The secret `HF_TOKEN` does not exist in your Colab secrets.\n", + "To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.\n", + "You will be able to reuse this secret in all of your notebooks.\n", + "Please note that authentication is recommended but still optional to access public models or datasets.\n", + " warnings.warn(\n", + "All PyTorch model weights were used when initializing TFGPT2LMHeadModel.\n", + "\n", + "All the weights of TFGPT2LMHeadModel were initialized from the PyTorch model.\n", + "If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "# model parameters을 확인해 봅시다.\n", + "GPT2.summary()" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "QI2_C2gw8gPq", + "outputId": "0ba9b49e-dae3-488e-d374-b38f998f007b" + }, + "execution_count": 4, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Model: \"tfgpt2lm_head_model\"\n", + "_________________________________________________________________\n", + " Layer (type) Output Shape Param # \n", + "=================================================================\n", + " transformer (TFGPT2MainLay multiple 774030080 \n", + " er) \n", + " \n", + "=================================================================\n", + "Total params: 774030080 (2.88 GB)\n", + "Trainable params: 774030080 (2.88 GB)\n", + "Non-trainable params: 0 (0.00 Byte)\n", + "_________________________________________________________________\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "#아래 코드를 실행해주세요-\n", + "import tensorflow as tf\n", + "tf.random.set_seed(SEED)" + ], + "metadata": { + "id": "cPFXwobg8gMJ" + }, + "execution_count": 5, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### **Greedy-Search**\n", + "- Greedy Search 에서는 각 시점마다 가장 확률이 높은 단어를 다음 단어로 선택합니다.\n", + "- 즉, 다음 단어는 아래와 같이 업데이트됩니다:\n", + "\n", + "$ wt=argmax wP(w|w1:t−1) $ \n", + "\n", + "> 즉, 각 타임스텝 𝑡마다 조건부 확률이 가장 높은 단어를 선택하는 것!\n", + "\n", + "\n", + "- 이 단순한 접근방식이 어떤 성능 차이를 보이는지 살펴봅시다." + ], + "metadata": { + "id": "zsX-xn93-tUP" + } + }, + { + "cell_type": "code", + "source": [ + "# context를 encoder해주세요\n", + "input_ids = tokenizer.encode(input_sequence, return_tensors=\"tf\")\n", + "\n", + "# 텍스트 생성하기, 이때 output length가 (context length 포함) 50이 될 때까지\n", + "greedy_output = GPT2.generate(input_ids, max_length=50, do_sample=False)\n", + "\n", + "# output sequences 출력하기\n", + "print(\"Output:\\n\" + 100 * '-')\n", + "print(tokenizer.decode(greedy_output[0], skip_special_tokens = True))" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "ig-oWtIA8gIq", + "outputId": "59fa11ae-1475-4c78-fb20-239847e23cf2" + }, + "execution_count": 6, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Output:\n", + "----------------------------------------------------------------------------------------------------\n", + "I don't know about you, but there's only one thing I want to do after a long day of work: go to the gym.\n", + "\n", + "I'm not talking about the gym that's right next to my house. I'm talking about\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "💡**위 Greedy Search 식과 코드 결과를 보고 고려되는 주요 문제점을 해당 셀을 풀고 설명해주세요.**\n", + "\n", + "\n", + "---\n", + "\n", + "\n", + "- 다양하고 창의적인 텍스트 생성이 어려워서 생성 결과가 반복적일 수 있다.\n", + "- max_length까지의 토큰만을 출력하기 때문에 문장이 마무리 되지 않은 채로 출력이 되는 문제가 발생해 결과가 어색할 수 있다." + ], + "metadata": { + "id": "gVj1neC__f2N" + } + }, + { + "cell_type": "markdown", + "source": [ + "### **Beam Search + N-Gram Penalty**\n", + "- Beam Search는 기본적으로 Greedy Search와 유사하지만, 모델이 각 시점에서 여러 개(num_beams)의 후보 경로를 동시에 추적한다는 점이 다릅니다\n", + " > 즉, 모델이 여러 대안을 비교하면서 텍스트를 생성할 수 있다는 점!\n", + "\n", + "\n", + "- 또한, n-gram 반복을 방지하기 위한 패널티도 적용할 수 있습니다.예를 들어 `no_repeat_ngram_size = 2`로 설정하면\n", + "동일한 2-그램이 두 번 등장하지 않도록 제한됩니다.\n", + "\n", + "- 그리고 `num_return_sequences = 5` 로 설정하면\n", + "5개의 beam 결과를 모두 출력하여 비교해볼 수 있습니다." + ], + "metadata": { + "id": "3EC0shCGAAQq" + } + }, + { + "cell_type": "code", + "source": [ + "# Beam Search를 사용하려면,단순히 generate 함수의 몇몇 파라미터만 변경하면 됩니다.\n", + "# num_beans를 설정해서 beam search decoding을 실행해주세요\n", + "beam_outputs = GPT2.generate(\n", + " input_ids,\n", + " max_length=50,\n", + " num_beams=5,\n", + " num_return_sequences=5,\n", + " no_repeat_ngram_size=2,\n", + " early_stopping=True\n", + ")\n", + "\n", + "print('')\n", + "print(\"Output:\\n\" + 100 * '-')\n", + "\n", + "# output sequences 출력하기\n", + "for i, beam_output in enumerate(beam_outputs):\n", + " print(\"{}: {}\".format(i, tokenizer.decode(beam_output, skip_special_tokens=True)))" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "l6OrEzA684Np", + "outputId": "c0b4561a-3504-445a-aa6b-4356174d8a18" + }, + "execution_count": 7, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\n", + "Output:\n", + "----------------------------------------------------------------------------------------------------\n", + "0: I don't know about you, but there's only one thing I want to do after a long day of work, and that's to sit down and watch a movie.\"\n", + "\n", + "\"I know, I know,\" you say. \"But I\n", + "1: I don't know about you, but there's only one thing I want to do after a long day of work, and that's to sit down and watch a movie.\"\n", + "\n", + "\"I know, I know,\" you say. \"I'm\n", + "2: I don't know about you, but there's only one thing I want to do after a long day of work, and that's to sit down and watch a movie.\"\n", + "\n", + "\"I know, I know,\" you say. \"But you\n", + "3: I don't know about you, but there's only one thing I want to do after a long day of work, and that's to sit down and watch a movie.\"\n", + "\n", + "\"I know, I know,\" you say, \"but I\n", + "4: I don't know about you, but there's only one thing I want to do after a long day of work, and that's to sit down and watch a movie.\"\n", + "\n", + "\"I know, I know,\" you say. \"I just\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "💡**아래 그래프는 Beam Search의 결과와 실제 인간의 말하기 방식 사이의 차이를 보여줍니다. 위 Beam Search 코드 결과와 아래 그래프를 보고 고려되는 주요 문제점을 해당 셀을 풀고 설명해주세요. (기재된 논문에서 힌트를 찾을 수 있습니다.)**\n", + "\n", + "\n", + "---\n", + "\n", + "\n", + "- 인간의 언어는 불규칙성을 포함하고 있으나 Beam Search의 결과는 확률이 높은 경로만을 탐색하기 때문에, 생성 결과는 창의성이 부족하게 된다.\n", + "- Beam Search는 동일한 입력에 대해 유사한 출력을 생성하는 경향이 있기 때문에 다양한 결과를 생성하는 데 한계가 있다.\n", + "- Beam Search는 높은 확률의 토큰을 선택하여 반복적인 문장을 생성하는 경향이 있기 때문에 결과가 예측 가능하게 된다." + ], + "metadata": { + "id": "_VhLZdJlBVZk" + } + }, + { + "cell_type": "markdown", + "source": [ + "![image.png]()\n", + "\n", + "[출처] The Curious Case of Neural Text Degeneration, arXiv:1904.09751 (cs)\n", + "https://arxiv.org/abs/1904.09751" + ], + "metadata": { + "id": "aOBGUk2aAwQ-" + } + }, + { + "cell_type": "markdown", + "source": [ + "### **Basic Sampling**\n", + "- 이 방식은 가장 확률이 높은 문장을 찾는 경로를 고집하지 않고, 각 시점에서 조건부 확률 분포에 따라 무작위로 다음 단어를 선택합니다.\n", + "\n", + "$w t​ ∼P(w∣w 1:t−1)$\n", + "- 하지만 이렇게 무작위성이 추가되면, 생성된 문장이 일관성이 떨어지고 혼란스러워질 수 있습니다.\n", + "- 그래서 무작위성을 제어하기 위해 temperature 파라미터를 도입할 수 있습니다. 이 파라미터는 확률이 높은 단어의 선택 가능성을 높이고, 확률이 낮은 단어는 선택될 가능성을 줄여줍니다." + ], + "metadata": { + "id": "BcDagIp1BvFA" + } + }, + { + "cell_type": "code", + "source": [ + "# 샘플링을 구현하려면 do_sample = True만 설정하면 됩니다.\n", + "# temperature을 설정해주세요.\n", + "# 이때 top_k = 0으로 설정해주세요.\n", + "sample_output = GPT2.generate(\n", + " input_ids,\n", + " do_sample=True,\n", + " max_length=50,\n", + " temperature=1.0,\n", + " top_k=0\n", + ")\n", + "# output sequences 출력하기\n", + "print(\"Output:\\n\" + 100 * '-')\n", + "print(tokenizer.decode(sample_output[0], skip_special_tokens = True))" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "z6pXSH2RBuz8", + "outputId": "94c95232-cdc6-4aff-a9c2-59f43a52caea" + }, + "execution_count": 8, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Output:\n", + "----------------------------------------------------------------------------------------------------\n", + "I don't know about you, but there's only one thing I want to do after a long day of work – play games. Believe it or not, it's not that difficult. Today I decided to take a break, hope that I was\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "💡**temperature 파라미터가 어떤 매커니즘으로 무작위성을 제어하는지 해당 셀을 풀고 설명해주세요.**\n", + "\n", + "\n", + "---\n", + "\n", + "- 모델이 생성할 단어를 결정하기 위해 소프트맥스 함수를 사용하는데, 여기에 temperature를 적용하면 걱 단어 후보의 로짓 값을 temperature로 나눈 뒤 확률로 변환한다. Temperature가 1보다 작으면 일관성 있고 보수적인 텍스트가 생성되고, 1보다 크면 다양하고 창의적인 문장이 생성되지만, 문맥과 맞지 않는 단어가 등장할 수 있다. 1일 경우에는, 원래 모델이 계산한 확률 분포를 그대로 사용한다." + ], + "metadata": { + "id": "8g2RrY7PFmjJ" + } + }, + { + "cell_type": "markdown", + "source": [ + "### **Top-k Sampling**\n", + "- Top-K 샘플링에서는 다음 단어 후보 중 확률이 가장 높은 상위 K개 단어만 선택하고,\n", + "전체 probability mass을 이 K개의 단어에만 분배합니다.\n", + "\n", + "> 즉, 확률이 높은 단어의 선택 확률을 높이고, 낮은 단어의 확률을 줄이는 방식이 아니라,아예 확률이 낮은 단어들을 완전히 제거하는 방식!" + ], + "metadata": { + "id": "RzmrRsA8CmYs" + } + }, + { + "cell_type": "code", + "source": [ + "# top_k 값을 설정해서, 조건부 확률 분포에서 고려할 상위 단어 개수(K)를 지정해주세요!\n", + "sample_output = GPT2.generate(\n", + " input_ids,\n", + " do_sample=True,\n", + " max_length=50,\n", + " top_k=50,\n", + ")\n", + "\n", + "# output sequences 출력하기\n", + "print(\"Output:\\n\" + 100 * '-')\n", + "print(tokenizer.decode(sample_output[0], skip_special_tokens = True), '...')" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "WA-og6IeD1BZ", + "outputId": "9c3561ad-ffab-4d72-81c9-ac0310f2a35b" + }, + "execution_count": 9, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Output:\n", + "----------------------------------------------------------------------------------------------------\n", + "I don't know about you, but there's only one thing I want to do after a long day of work - the one thing I get to do when I leave home. And when I step into the kitchen, I want to use that one ...\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "### **Top-P Sampling(Nucleus Sampling)**\n", + "- Top-K 샘플링은 이전의 random sampling보다 더 일관된 텍스트를 생성하는 것으로 보입니다. 하지만 이보다 더나은 방법으로 Top-p sampling이 있습니다.\n", + "- Top-P 샘플링은 Top-K와 유사하지만,가장 확률이 높은 상위 K개 단어를 고르는 대신,누적 확률이 P 이상이 되는 최소한의 단어 집합을 선택합니다 그리고 전체 probability mass는 이 단어 집합에 재분배됩니다.\n" + ], + "metadata": { + "id": "2CgUegJOAw6h" + } + }, + { + "cell_type": "code", + "source": [ + "# top_p 파라미터를 통해 only from 80% most likely words 만 sample 해주세요.\n", + "sample_output = GPT2.generate(\n", + " input_ids,\n", + " do_sample=True,\n", + " max_length=50,\n", + " top_p=0.8,\n", + ")\n", + "# output sequences 출력하기\n", + "print(\"Output:\\n\" + 100 * '-')\n", + "print(tokenizer.decode(sample_output[0], skip_special_tokens = True), '...')" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "GEhy8PgbAr2f", + "outputId": "36fa68cf-144e-4587-ca60-70c087ff3bdf" + }, + "execution_count": 10, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Output:\n", + "----------------------------------------------------------------------------------------------------\n", + "I don't know about you, but there's only one thing I want to do after a long day of work: take a bath. I'd even go as far as to say that it's my most important pre-work ritual, even if ...\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "### **Top-K + Top-P sampling**\n", + "- 둘을 동시에 사용하면, 확률이 매우 낮은 단어(이상한 단어)가 나올 가능성을 줄이면서도, 선택되는 단어 집합의 크기는 유동적으로 유지할 수 있습니다." + ], + "metadata": { + "id": "heGKePrAE46H" + } + }, + { + "cell_type": "code", + "source": [ + "# top_k와 top_p에 값을 지정하면 되고, temperature 파라미터도 함께 사용할 수 있습니다.\n", + "# 아래 코드를 완성해주세요.\n", + "# 이때 max_length= 2*MAX_LEN 으로 설정해주세요\n", + "sample_outputs = GPT2.generate(\n", + " input_ids,\n", + " do_sample=True,\n", + " max_length=2*MAX_LEN,\n", + " top_k=50,\n", + " top_p=0.8,\n", + " temperature=0.8,\n", + " num_return_sequences=5\n", + ")\n", + "# output sequences 출력하기\n", + "print(\"Output:\\n\" + 100 * '-')\n", + "for i, sample_output in enumerate(sample_outputs):\n", + " print(\"{}: {}...\".format(i, tokenizer.decode(sample_output, skip_special_tokens = True)))\n", + " print('')" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Q8-CnW76E3FI", + "outputId": "f6a98ace-fdb5-45eb-e387-5021d06e7665" + }, + "execution_count": 11, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Output:\n", + "----------------------------------------------------------------------------------------------------\n", + "0: I don't know about you, but there's only one thing I want to do after a long day of work: smoke a cigarette. But I don't want to smoke in my office. That would be rude. So I've made a rule: I don't smoke anywhere.\n", + "\n", + "I've also made a rule: I don't smoke anywhere in the building.\n", + "\n", + "The first rule is easy to remember. The second is more difficult.\n", + "\n", + "I'm going to tell you what my rule is: I don't smoke anywhere.\n", + "\n", + "I've been told that's a rule. I've been told that's a law.\n", + "\n", + "I've been told that's a rule...\n", + "\n", + "1: I don't know about you, but there's only one thing I want to do after a long day of work: go home, have a few drinks, and watch some TV.\n", + "\n", + "The problem with that is that it's not that easy.\n", + "\n", + "It's not that easy to get a drink at a bar or to get a beer at a bar.\n", + "\n", + "I'm going to be honest with you. I'm not a big drinker. I don't drink a lot of beer, but I do drink a lot of wine.\n", + "\n", + "And I'm not going to lie to you. I've had some bad experiences with getting a beer at a bar.\n", + "\n", + "Here...\n", + "\n", + "2: I don't know about you, but there's only one thing I want to do after a long day of work: sleep. I've been doing that for over three years now, and I'm still not happy.\n", + "\n", + "So I decided to give it a shot. I found a couple of websites that provide tips on how to get a good night's sleep. One of them is called SleepChart, and the other is Sleep-O-Meter. Both of them are free.\n", + "\n", + "I downloaded the SleepChart app and started sleeping on it. It's a very simple app, but it's very easy to use. It's like a clock, but it has a graph that...\n", + "\n", + "3: I don't know about you, but there's only one thing I want to do after a long day of work: watch a movie.\n", + "\n", + "The first time I saw a movie was at a party, when I was a kid. I was the only one there, and I couldn't find anything to watch. My parents were both very old-school, so they took me to see The Godfather.\n", + "\n", + "I watched it over and over again, and I still remember the feeling of being mesmerized by the action.\n", + "\n", + "It wasn't until a few years later, when I was in college, that I started to think about what it was that I wanted to do after college...\n", + "\n", + "4: I don't know about you, but there's only one thing I want to do after a long day of work, and that's to go home and relax. If you think I'm going to miss the show, you're wrong.\n", + "\n", + "The show will be back on the air on Wednesday, September 13th at 10pm ET.\n", + "\n", + "For more information about the show, check out our website, and follow us on Twitter @nbcuniverses....\n", + "\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "💡**Top-k와 Top-p의 방식의 차이에 대해 설명해주세요**\n", + "\n", + "\n", + "---\n", + "- Top-k 방식은 확률 순으로 정렬된 단어들 중 상위 k개만을 고정적으로 선택 후보로 삼고, 그중에서 무작위로 단어를 생성한다. 계산이 간단하지만, 분포에 상관없이 같은 수의 단어를 고려하므로 불필요한 단어가 포함될 수 있다.\n", + "- Top-p 방식은 전체 단어 확률의 누적합이 p를 초과하지 않는 단어들만 후보로 선택한다. 분포에 따라 포함되는 단어 수가 달라지므로 더 자연스러운 문장 생성이 가능하지만, 계산이 복잡하다." + ], + "metadata": { + "id": "s_TeJ5zXF6Ra" + } + } + ] +} \ No newline at end of file diff --git "a/Week12_\354\230\210\354\212\265\352\263\274\354\240\234_\355\225\250\354\230\210\353\246\260.ipynb" "b/Week12_\354\230\210\354\212\265\352\263\274\354\240\234_\355\225\250\354\230\210\353\246\260.ipynb" new file mode 100644 index 0000000..5a84e89 --- /dev/null +++ "b/Week12_\354\230\210\354\212\265\352\263\274\354\240\234_\355\225\250\354\230\210\353\246\260.ipynb" @@ -0,0 +1,1898 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [], + "gpuType": "T4" + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + }, + "accelerator": "GPU" + }, + "cells": [ + { + "cell_type": "markdown", + "source": [ + "**Transformer**" + ], + "metadata": { + "id": "RFxCMlpjc5hk" + } + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 455 + }, + "id": "pxdmDJ5tWpc3", + "outputId": "2d397dfc-c133-4156-c41a-c784db8bb0c6" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ], + "source": [ + "#위치 인코딩\n", + "import math\n", + "import torch\n", + "from torch import nn\n", + "from matplotlib import pyplot as plt\n", + "\n", + "class PositionalEncoding(nn.Module):\n", + " def __init__(self, d_model, max_len, dropout=0.1):\n", + " super().__init__()\n", + " self.dropout=nn.Dropout(p=dropout)\n", + "\n", + " position=torch.arange(max_len).unsqueeze(1)\n", + " div_term=torch.exp(\n", + " torch.arange(0, d_model, 2) * (-math.log(10000.0)/d_model)\n", + " )\n", + "\n", + " pe=torch.zeros(max_len, 1, d_model)\n", + " pe[:, 0, 0::2]=torch.sin(position*div_term)\n", + " pe[:, 0, 1::2]=torch.cos(position*div_term)\n", + " self.register_buffer(\"pe\", pe)\n", + "\n", + " def forward(self, x):\n", + " x=x+self.pe[: x.size(0)]\n", + " return self.dropout(x)\n", + "\n", + "encoding=PositionalEncoding(d_model=128, max_len=50)\n", + "\n", + "plt.pcolormesh(encoding.pe.numpy().squeeze(), cmap=\"RdBu\")\n", + "plt.xlabel(\"Embedding Dimension\")\n", + "plt.xlim((0,128))\n", + "plt.ylabel(\"Position\")\n", + "plt.colorbar()\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "source": [ + "!pip install torchdata torchtext portalocker" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "4EYtOVVm3xmJ", + "outputId": "9aa7f204-009f-422d-fe20-5f890b968479" + }, + "execution_count": 2, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Requirement already satisfied: torchdata in /usr/local/lib/python3.11/dist-packages (0.11.0)\n", + "Collecting torchtext\n", + " Downloading torchtext-0.18.0-cp311-cp311-manylinux1_x86_64.whl.metadata (7.9 kB)\n", + "Collecting portalocker\n", + " Downloading portalocker-3.1.1-py3-none-any.whl.metadata (8.6 kB)\n", + "Requirement already satisfied: urllib3>=1.25 in /usr/local/lib/python3.11/dist-packages (from torchdata) (2.4.0)\n", + "Requirement already satisfied: requests in /usr/local/lib/python3.11/dist-packages (from torchdata) (2.32.3)\n", + "Requirement already satisfied: torch>=2 in /usr/local/lib/python3.11/dist-packages (from torchdata) (2.6.0+cu124)\n", + "Requirement already satisfied: tqdm in /usr/local/lib/python3.11/dist-packages (from torchtext) (4.67.1)\n", + "Requirement already satisfied: numpy in /usr/local/lib/python3.11/dist-packages (from torchtext) (2.0.2)\n", + "Requirement already satisfied: filelock in /usr/local/lib/python3.11/dist-packages (from torch>=2->torchdata) (3.18.0)\n", + "Requirement already satisfied: typing-extensions>=4.10.0 in /usr/local/lib/python3.11/dist-packages (from torch>=2->torchdata) (4.13.2)\n", + "Requirement already satisfied: networkx in /usr/local/lib/python3.11/dist-packages (from torch>=2->torchdata) (3.4.2)\n", + "Requirement already satisfied: jinja2 in /usr/local/lib/python3.11/dist-packages (from torch>=2->torchdata) (3.1.6)\n", + "Requirement already satisfied: fsspec in /usr/local/lib/python3.11/dist-packages (from torch>=2->torchdata) (2025.3.2)\n", + "Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=2->torchdata)\n", + " Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)\n", + "Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=2->torchdata)\n", + " Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)\n", + "Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=2->torchdata)\n", + " Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)\n", + "Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=2->torchdata)\n", + " Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)\n", + "Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=2->torchdata)\n", + " Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)\n", + "Collecting nvidia-cufft-cu12==11.2.1.3 (from torch>=2->torchdata)\n", + " Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)\n", + "Collecting nvidia-curand-cu12==10.3.5.147 (from torch>=2->torchdata)\n", + " Downloading nvidia_curand_cu12-10.3.5.147-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)\n", + "Collecting nvidia-cusolver-cu12==11.6.1.9 (from torch>=2->torchdata)\n", + " Downloading nvidia_cusolver_cu12-11.6.1.9-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)\n", + "Collecting nvidia-cusparse-cu12==12.3.1.170 (from torch>=2->torchdata)\n", + " Downloading nvidia_cusparse_cu12-12.3.1.170-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)\n", + "Requirement already satisfied: nvidia-cusparselt-cu12==0.6.2 in /usr/local/lib/python3.11/dist-packages (from torch>=2->torchdata) (0.6.2)\n", + "Requirement already satisfied: nvidia-nccl-cu12==2.21.5 in /usr/local/lib/python3.11/dist-packages (from torch>=2->torchdata) (2.21.5)\n", + "Requirement already satisfied: nvidia-nvtx-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch>=2->torchdata) (12.4.127)\n", + "Collecting nvidia-nvjitlink-cu12==12.4.127 (from torch>=2->torchdata)\n", + " Downloading nvidia_nvjitlink_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)\n", + "Requirement already satisfied: triton==3.2.0 in /usr/local/lib/python3.11/dist-packages (from torch>=2->torchdata) (3.2.0)\n", + "Requirement already satisfied: sympy==1.13.1 in /usr/local/lib/python3.11/dist-packages (from torch>=2->torchdata) (1.13.1)\n", + "Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.11/dist-packages (from sympy==1.13.1->torch>=2->torchdata) (1.3.0)\n", + "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests->torchdata) (3.4.2)\n", + "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/dist-packages (from requests->torchdata) (3.10)\n", + "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.11/dist-packages (from requests->torchdata) (2025.4.26)\n", + "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.11/dist-packages (from jinja2->torch>=2->torchdata) (3.0.2)\n", + "Downloading torchtext-0.18.0-cp311-cp311-manylinux1_x86_64.whl (2.0 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.0/2.0 MB\u001b[0m \u001b[31m20.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading portalocker-3.1.1-py3-none-any.whl (19 kB)\n", + "Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl (363.4 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m363.4/363.4 MB\u001b[0m \u001b[31m1.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (13.8 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m13.8/13.8 MB\u001b[0m \u001b[31m99.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (24.6 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m24.6/24.6 MB\u001b[0m \u001b[31m81.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (883 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m883.7/883.7 kB\u001b[0m \u001b[31m43.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl (664.8 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m664.8/664.8 MB\u001b[0m \u001b[31m2.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl (211.5 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m211.5/211.5 MB\u001b[0m \u001b[31m5.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading nvidia_curand_cu12-10.3.5.147-py3-none-manylinux2014_x86_64.whl (56.3 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m56.3/56.3 MB\u001b[0m \u001b[31m10.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading nvidia_cusolver_cu12-11.6.1.9-py3-none-manylinux2014_x86_64.whl (127.9 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m127.9/127.9 MB\u001b[0m \u001b[31m7.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading nvidia_cusparse_cu12-12.3.1.170-py3-none-manylinux2014_x86_64.whl (207.5 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m207.5/207.5 MB\u001b[0m \u001b[31m4.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading nvidia_nvjitlink_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (21.1 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m21.1/21.1 MB\u001b[0m \u001b[31m81.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hInstalling collected packages: portalocker, nvidia-nvjitlink-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, nvidia-cusparse-cu12, nvidia-cudnn-cu12, nvidia-cusolver-cu12, torchtext\n", + " Attempting uninstall: nvidia-nvjitlink-cu12\n", + " Found existing installation: nvidia-nvjitlink-cu12 12.5.82\n", + " Uninstalling nvidia-nvjitlink-cu12-12.5.82:\n", + " Successfully uninstalled nvidia-nvjitlink-cu12-12.5.82\n", + " Attempting uninstall: nvidia-curand-cu12\n", + " Found existing installation: nvidia-curand-cu12 10.3.6.82\n", + " Uninstalling nvidia-curand-cu12-10.3.6.82:\n", + " Successfully uninstalled nvidia-curand-cu12-10.3.6.82\n", + " Attempting uninstall: nvidia-cufft-cu12\n", + " Found existing installation: nvidia-cufft-cu12 11.2.3.61\n", + " Uninstalling nvidia-cufft-cu12-11.2.3.61:\n", + " Successfully uninstalled nvidia-cufft-cu12-11.2.3.61\n", + " Attempting uninstall: nvidia-cuda-runtime-cu12\n", + " Found existing installation: nvidia-cuda-runtime-cu12 12.5.82\n", + " Uninstalling nvidia-cuda-runtime-cu12-12.5.82:\n", + " Successfully uninstalled nvidia-cuda-runtime-cu12-12.5.82\n", + " Attempting uninstall: nvidia-cuda-nvrtc-cu12\n", + " Found existing installation: nvidia-cuda-nvrtc-cu12 12.5.82\n", + " Uninstalling nvidia-cuda-nvrtc-cu12-12.5.82:\n", + " Successfully uninstalled nvidia-cuda-nvrtc-cu12-12.5.82\n", + " Attempting uninstall: nvidia-cuda-cupti-cu12\n", + " Found existing installation: nvidia-cuda-cupti-cu12 12.5.82\n", + " Uninstalling nvidia-cuda-cupti-cu12-12.5.82:\n", + " Successfully uninstalled nvidia-cuda-cupti-cu12-12.5.82\n", + " Attempting uninstall: nvidia-cublas-cu12\n", + " Found existing installation: nvidia-cublas-cu12 12.5.3.2\n", + " Uninstalling nvidia-cublas-cu12-12.5.3.2:\n", + " Successfully uninstalled nvidia-cublas-cu12-12.5.3.2\n", + " Attempting uninstall: nvidia-cusparse-cu12\n", + " Found existing installation: nvidia-cusparse-cu12 12.5.1.3\n", + " Uninstalling nvidia-cusparse-cu12-12.5.1.3:\n", + " Successfully uninstalled nvidia-cusparse-cu12-12.5.1.3\n", + " Attempting uninstall: nvidia-cudnn-cu12\n", + " Found existing installation: nvidia-cudnn-cu12 9.3.0.75\n", + " Uninstalling nvidia-cudnn-cu12-9.3.0.75:\n", + " Successfully uninstalled nvidia-cudnn-cu12-9.3.0.75\n", + " Attempting uninstall: nvidia-cusolver-cu12\n", + " Found existing installation: nvidia-cusolver-cu12 11.6.3.83\n", + " Uninstalling nvidia-cusolver-cu12-11.6.3.83:\n", + " Successfully uninstalled nvidia-cusolver-cu12-11.6.3.83\n", + "Successfully installed nvidia-cublas-cu12-12.4.5.8 nvidia-cuda-cupti-cu12-12.4.127 nvidia-cuda-nvrtc-cu12-12.4.127 nvidia-cuda-runtime-cu12-12.4.127 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.2.1.3 nvidia-curand-cu12-10.3.5.147 nvidia-cusolver-cu12-11.6.1.9 nvidia-cusparse-cu12-12.3.1.170 nvidia-nvjitlink-cu12-12.4.127 portalocker-3.1.1 torchtext-0.18.0\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "!pip uninstall torchtext -y\n", + "!pip install torchtext==0.17.0" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "fU3U7Lv3eGEF", + "outputId": "7b60bce7-4ec0-48fe-930b-972a7396e6cf" + }, + "execution_count": 3, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Found existing installation: torchtext 0.18.0\n", + "Uninstalling torchtext-0.18.0:\n", + " Successfully uninstalled torchtext-0.18.0\n", + "Collecting torchtext==0.17.0\n", + " Downloading torchtext-0.17.0-cp311-cp311-manylinux1_x86_64.whl.metadata (7.6 kB)\n", + "Requirement already satisfied: tqdm in /usr/local/lib/python3.11/dist-packages (from torchtext==0.17.0) (4.67.1)\n", + "Requirement already satisfied: requests in /usr/local/lib/python3.11/dist-packages (from torchtext==0.17.0) (2.32.3)\n", + "Collecting torch==2.2.0 (from torchtext==0.17.0)\n", + " Downloading torch-2.2.0-cp311-cp311-manylinux1_x86_64.whl.metadata (25 kB)\n", + "Requirement already satisfied: numpy in /usr/local/lib/python3.11/dist-packages (from torchtext==0.17.0) (2.0.2)\n", + "Collecting torchdata==0.7.1 (from torchtext==0.17.0)\n", + " Downloading torchdata-0.7.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (13 kB)\n", + "Requirement already satisfied: filelock in /usr/local/lib/python3.11/dist-packages (from torch==2.2.0->torchtext==0.17.0) (3.18.0)\n", + "Requirement already satisfied: typing-extensions>=4.8.0 in /usr/local/lib/python3.11/dist-packages (from torch==2.2.0->torchtext==0.17.0) (4.13.2)\n", + "Requirement already satisfied: sympy in /usr/local/lib/python3.11/dist-packages (from torch==2.2.0->torchtext==0.17.0) (1.13.1)\n", + "Requirement already satisfied: networkx in /usr/local/lib/python3.11/dist-packages (from torch==2.2.0->torchtext==0.17.0) (3.4.2)\n", + "Requirement already satisfied: jinja2 in /usr/local/lib/python3.11/dist-packages (from torch==2.2.0->torchtext==0.17.0) (3.1.6)\n", + "Requirement already satisfied: fsspec in /usr/local/lib/python3.11/dist-packages (from torch==2.2.0->torchtext==0.17.0) (2025.3.2)\n", + "Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch==2.2.0->torchtext==0.17.0)\n", + " Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)\n", + "Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch==2.2.0->torchtext==0.17.0)\n", + " Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)\n", + "Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch==2.2.0->torchtext==0.17.0)\n", + " Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)\n", + "Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch==2.2.0->torchtext==0.17.0)\n", + " Downloading nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)\n", + "Collecting nvidia-cublas-cu12==12.1.3.1 (from torch==2.2.0->torchtext==0.17.0)\n", + " Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)\n", + "Collecting nvidia-cufft-cu12==11.0.2.54 (from torch==2.2.0->torchtext==0.17.0)\n", + " Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)\n", + "Collecting nvidia-curand-cu12==10.3.2.106 (from torch==2.2.0->torchtext==0.17.0)\n", + " Downloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)\n", + "Collecting nvidia-cusolver-cu12==11.4.5.107 (from torch==2.2.0->torchtext==0.17.0)\n", + " Downloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)\n", + "Collecting nvidia-cusparse-cu12==12.1.0.106 (from torch==2.2.0->torchtext==0.17.0)\n", + " Downloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)\n", + "Collecting nvidia-nccl-cu12==2.19.3 (from torch==2.2.0->torchtext==0.17.0)\n", + " Downloading nvidia_nccl_cu12-2.19.3-py3-none-manylinux1_x86_64.whl.metadata (1.8 kB)\n", + "Collecting nvidia-nvtx-cu12==12.1.105 (from torch==2.2.0->torchtext==0.17.0)\n", + " Downloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.7 kB)\n", + "Collecting triton==2.2.0 (from torch==2.2.0->torchtext==0.17.0)\n", + " Downloading triton-2.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.4 kB)\n", + "Requirement already satisfied: urllib3>=1.25 in /usr/local/lib/python3.11/dist-packages (from torchdata==0.7.1->torchtext==0.17.0) (2.4.0)\n", + "Requirement already satisfied: nvidia-nvjitlink-cu12 in /usr/local/lib/python3.11/dist-packages (from nvidia-cusolver-cu12==11.4.5.107->torch==2.2.0->torchtext==0.17.0) (12.4.127)\n", + "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests->torchtext==0.17.0) (3.4.2)\n", + "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/dist-packages (from requests->torchtext==0.17.0) (3.10)\n", + "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.11/dist-packages (from requests->torchtext==0.17.0) (2025.4.26)\n", + "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.11/dist-packages (from jinja2->torch==2.2.0->torchtext==0.17.0) (3.0.2)\n", + "Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.11/dist-packages (from sympy->torch==2.2.0->torchtext==0.17.0) (1.3.0)\n", + "Downloading torchtext-0.17.0-cp311-cp311-manylinux1_x86_64.whl (2.0 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.0/2.0 MB\u001b[0m \u001b[31m24.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading torch-2.2.0-cp311-cp311-manylinux1_x86_64.whl (755.5 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m755.5/755.5 MB\u001b[0m \u001b[31m2.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading torchdata-0.7.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.7 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m4.7/4.7 MB\u001b[0m \u001b[31m79.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m410.6/410.6 MB\u001b[0m \u001b[31m4.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m14.1/14.1 MB\u001b[0m \u001b[31m91.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m23.7/23.7 MB\u001b[0m \u001b[31m83.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m823.6/823.6 kB\u001b[0m \u001b[31m32.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m731.7/731.7 MB\u001b[0m \u001b[31m2.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m121.6/121.6 MB\u001b[0m \u001b[31m7.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m56.5/56.5 MB\u001b[0m \u001b[31m12.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m124.2/124.2 MB\u001b[0m \u001b[31m7.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m196.0/196.0 MB\u001b[0m \u001b[31m5.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading nvidia_nccl_cu12-2.19.3-py3-none-manylinux1_x86_64.whl (166.0 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m166.0/166.0 MB\u001b[0m \u001b[31m6.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m99.1/99.1 kB\u001b[0m \u001b[31m6.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading triton-2.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (167.9 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m167.9/167.9 MB\u001b[0m \u001b[31m6.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hInstalling collected packages: triton, nvidia-nvtx-cu12, nvidia-nccl-cu12, nvidia-cusparse-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, nvidia-cusolver-cu12, nvidia-cudnn-cu12, torch, torchdata, torchtext\n", + " Attempting uninstall: triton\n", + " Found existing installation: triton 3.2.0\n", + " Uninstalling triton-3.2.0:\n", + " Successfully uninstalled triton-3.2.0\n", + " Attempting uninstall: nvidia-nvtx-cu12\n", + " Found existing installation: nvidia-nvtx-cu12 12.4.127\n", + " Uninstalling nvidia-nvtx-cu12-12.4.127:\n", + " Successfully uninstalled nvidia-nvtx-cu12-12.4.127\n", + " Attempting uninstall: nvidia-nccl-cu12\n", + " Found existing installation: nvidia-nccl-cu12 2.21.5\n", + " Uninstalling nvidia-nccl-cu12-2.21.5:\n", + " Successfully uninstalled nvidia-nccl-cu12-2.21.5\n", + " Attempting uninstall: nvidia-cusparse-cu12\n", + " Found existing installation: nvidia-cusparse-cu12 12.3.1.170\n", + " Uninstalling nvidia-cusparse-cu12-12.3.1.170:\n", + " Successfully uninstalled nvidia-cusparse-cu12-12.3.1.170\n", + " Attempting uninstall: nvidia-curand-cu12\n", + " Found existing installation: nvidia-curand-cu12 10.3.5.147\n", + " Uninstalling nvidia-curand-cu12-10.3.5.147:\n", + " Successfully uninstalled nvidia-curand-cu12-10.3.5.147\n", + " Attempting uninstall: nvidia-cufft-cu12\n", + " Found existing installation: nvidia-cufft-cu12 11.2.1.3\n", + " Uninstalling nvidia-cufft-cu12-11.2.1.3:\n", + " Successfully uninstalled nvidia-cufft-cu12-11.2.1.3\n", + " Attempting uninstall: nvidia-cuda-runtime-cu12\n", + " Found existing installation: nvidia-cuda-runtime-cu12 12.4.127\n", + " Uninstalling nvidia-cuda-runtime-cu12-12.4.127:\n", + " Successfully uninstalled nvidia-cuda-runtime-cu12-12.4.127\n", + " Attempting uninstall: nvidia-cuda-nvrtc-cu12\n", + " Found existing installation: nvidia-cuda-nvrtc-cu12 12.4.127\n", + " Uninstalling nvidia-cuda-nvrtc-cu12-12.4.127:\n", + " Successfully uninstalled nvidia-cuda-nvrtc-cu12-12.4.127\n", + " Attempting uninstall: nvidia-cuda-cupti-cu12\n", + " Found existing installation: nvidia-cuda-cupti-cu12 12.4.127\n", + " Uninstalling nvidia-cuda-cupti-cu12-12.4.127:\n", + " Successfully uninstalled nvidia-cuda-cupti-cu12-12.4.127\n", + " Attempting uninstall: nvidia-cublas-cu12\n", + " Found existing installation: nvidia-cublas-cu12 12.4.5.8\n", + " Uninstalling nvidia-cublas-cu12-12.4.5.8:\n", + " Successfully uninstalled nvidia-cublas-cu12-12.4.5.8\n", + " Attempting uninstall: nvidia-cusolver-cu12\n", + " Found existing installation: nvidia-cusolver-cu12 11.6.1.9\n", + " Uninstalling nvidia-cusolver-cu12-11.6.1.9:\n", + " Successfully uninstalled nvidia-cusolver-cu12-11.6.1.9\n", + " Attempting uninstall: nvidia-cudnn-cu12\n", + " Found existing installation: nvidia-cudnn-cu12 9.1.0.70\n", + " Uninstalling nvidia-cudnn-cu12-9.1.0.70:\n", + " Successfully uninstalled nvidia-cudnn-cu12-9.1.0.70\n", + " Attempting uninstall: torch\n", + " Found existing installation: torch 2.6.0+cu124\n", + " Uninstalling torch-2.6.0+cu124:\n", + " Successfully uninstalled torch-2.6.0+cu124\n", + " Attempting uninstall: torchdata\n", + " Found existing installation: torchdata 0.11.0\n", + " Uninstalling torchdata-0.11.0:\n", + " Successfully uninstalled torchdata-0.11.0\n", + "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", + "torchvision 0.21.0+cu124 requires torch==2.6.0, but you have torch 2.2.0 which is incompatible.\n", + "torchtune 0.6.1 requires torchdata==0.11.0, but you have torchdata 0.7.1 which is incompatible.\n", + "torchaudio 2.6.0+cu124 requires torch==2.6.0, but you have torch 2.2.0 which is incompatible.\u001b[0m\u001b[31m\n", + "\u001b[0mSuccessfully installed nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.19.3 nvidia-nvtx-cu12-12.1.105 torch-2.2.0 torchdata-0.7.1 torchtext-0.17.0 triton-2.2.0\n" + ] + }, + { + "output_type": "display_data", + "data": { + "application/vnd.colab-display-data+json": { + "pip_warning": { + "packages": [ + "torch", + "torchgen" + ] + }, + "id": "f3713ab3badf4fa78efd311e79fdaa0e" + } + }, + "metadata": {} + } + ] + }, + { + "cell_type": "code", + "source": [ + "!python -m spacy download de_core_news_sm\n", + "!python -m spacy download de_core_news_sm" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "hFVN4IdI1_SR", + "outputId": "b536fede-6d1f-4162-d65e-607c865c8dd9" + }, + "execution_count": 4, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\n", + "A module that was compiled using NumPy 1.x cannot be run in\n", + "NumPy 2.0.2 as it may crash. To support both 1.x and 2.x\n", + "versions of NumPy, modules must be compiled with NumPy 2.0.\n", + "Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.\n", + "\n", + "If you are a user of the module, the easiest solution will be to\n", + "downgrade to 'numpy<2' or try to upgrade the affected module.\n", + "We expect that some modules will need time to support NumPy 2.\n", + "\n", + "Traceback (most recent call last): File \"\", line 189, in _run_module_as_main\n", + " File \"\", line 148, in _get_module_details\n", + " File \"\", line 112, in _get_module_details\n", + " File \"/usr/local/lib/python3.11/dist-packages/spacy/__init__.py\", line 6, in \n", + " from .errors import setup_default_warnings\n", + " File \"/usr/local/lib/python3.11/dist-packages/spacy/errors.py\", line 3, in \n", + " from .compat import Literal\n", + " File \"/usr/local/lib/python3.11/dist-packages/spacy/compat.py\", line 4, in \n", + " from thinc.util import copy_array\n", + " File \"/usr/local/lib/python3.11/dist-packages/thinc/__init__.py\", line 5, in \n", + " from .config import registry\n", + " File \"/usr/local/lib/python3.11/dist-packages/thinc/config.py\", line 5, in \n", + " from .types import Decorator\n", + " File \"/usr/local/lib/python3.11/dist-packages/thinc/types.py\", line 27, in \n", + " from .compat import cupy, has_cupy\n", + " File \"/usr/local/lib/python3.11/dist-packages/thinc/compat.py\", line 35, in \n", + " import torch\n", + " File \"/usr/local/lib/python3.11/dist-packages/torch/__init__.py\", line 1471, in \n", + " from .functional import * # noqa: F403\n", + " File \"/usr/local/lib/python3.11/dist-packages/torch/functional.py\", line 9, in \n", + " import torch.nn.functional as F\n", + " File \"/usr/local/lib/python3.11/dist-packages/torch/nn/__init__.py\", line 1, in \n", + " from .modules import * # noqa: F403\n", + " File \"/usr/local/lib/python3.11/dist-packages/torch/nn/modules/__init__.py\", line 35, in \n", + " from .transformer import TransformerEncoder, TransformerDecoder, \\\n", + " File \"/usr/local/lib/python3.11/dist-packages/torch/nn/modules/transformer.py\", line 20, in \n", + " device: torch.device = torch.device(torch._C._get_default_device()), # torch.device('cpu'),\n", + "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)\n", + " device: torch.device = torch.device(torch._C._get_default_device()), # torch.device('cpu'),\n", + "Collecting de-core-news-sm==3.8.0\n", + " Downloading https://github.com/explosion/spacy-models/releases/download/de_core_news_sm-3.8.0/de_core_news_sm-3.8.0-py3-none-any.whl (14.6 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m14.6/14.6 MB\u001b[0m \u001b[31m55.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hInstalling collected packages: de-core-news-sm\n", + "Successfully installed de-core-news-sm-3.8.0\n", + "\u001b[38;5;2m✔ Download and installation successful\u001b[0m\n", + "You can now load the package via spacy.load('de_core_news_sm')\n", + "\u001b[38;5;3m⚠ Restart to reload dependencies\u001b[0m\n", + "If you are in a Jupyter or Colab notebook, you may need to restart Python in\n", + "order to load all the package's dependencies. You can do this by selecting the\n", + "'Restart kernel' or 'Restart runtime' option.\n", + "\n", + "A module that was compiled using NumPy 1.x cannot be run in\n", + "NumPy 2.0.2 as it may crash. To support both 1.x and 2.x\n", + "versions of NumPy, modules must be compiled with NumPy 2.0.\n", + "Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.\n", + "\n", + "If you are a user of the module, the easiest solution will be to\n", + "downgrade to 'numpy<2' or try to upgrade the affected module.\n", + "We expect that some modules will need time to support NumPy 2.\n", + "\n", + "Traceback (most recent call last): File \"\", line 189, in _run_module_as_main\n", + " File \"\", line 148, in _get_module_details\n", + " File \"\", line 112, in _get_module_details\n", + " File \"/usr/local/lib/python3.11/dist-packages/spacy/__init__.py\", line 6, in \n", + " from .errors import setup_default_warnings\n", + " File \"/usr/local/lib/python3.11/dist-packages/spacy/errors.py\", line 3, in \n", + " from .compat import Literal\n", + " File \"/usr/local/lib/python3.11/dist-packages/spacy/compat.py\", line 4, in \n", + " from thinc.util import copy_array\n", + " File \"/usr/local/lib/python3.11/dist-packages/thinc/__init__.py\", line 5, in \n", + " from .config import registry\n", + " File \"/usr/local/lib/python3.11/dist-packages/thinc/config.py\", line 5, in \n", + " from .types import Decorator\n", + " File \"/usr/local/lib/python3.11/dist-packages/thinc/types.py\", line 27, in \n", + " from .compat import cupy, has_cupy\n", + " File \"/usr/local/lib/python3.11/dist-packages/thinc/compat.py\", line 35, in \n", + " import torch\n", + " File \"/usr/local/lib/python3.11/dist-packages/torch/__init__.py\", line 1471, in \n", + " from .functional import * # noqa: F403\n", + " File \"/usr/local/lib/python3.11/dist-packages/torch/functional.py\", line 9, in \n", + " import torch.nn.functional as F\n", + " File \"/usr/local/lib/python3.11/dist-packages/torch/nn/__init__.py\", line 1, in \n", + " from .modules import * # noqa: F403\n", + " File \"/usr/local/lib/python3.11/dist-packages/torch/nn/modules/__init__.py\", line 35, in \n", + " from .transformer import TransformerEncoder, TransformerDecoder, \\\n", + " File \"/usr/local/lib/python3.11/dist-packages/torch/nn/modules/transformer.py\", line 20, in \n", + " device: torch.device = torch.device(torch._C._get_default_device()), # torch.device('cpu'),\n", + "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)\n", + " device: torch.device = torch.device(torch._C._get_default_device()), # torch.device('cpu'),\n", + "Collecting de-core-news-sm==3.8.0\n", + " Using cached https://github.com/explosion/spacy-models/releases/download/de_core_news_sm-3.8.0/de_core_news_sm-3.8.0-py3-none-any.whl (14.6 MB)\n", + "\u001b[38;5;2m✔ Download and installation successful\u001b[0m\n", + "You can now load the package via spacy.load('de_core_news_sm')\n", + "\u001b[38;5;3m⚠ Restart to reload dependencies\u001b[0m\n", + "If you are in a Jupyter or Colab notebook, you may need to restart Python in\n", + "order to load all the package's dependencies. You can do this by selecting the\n", + "'Restart kernel' or 'Restart runtime' option.\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "#데이터세트 다운로드 및 전처리\n", + "from torchtext.datasets import Multi30k\n", + "from torchtext.data.utils import get_tokenizer\n", + "from torchtext.vocab import build_vocab_from_iterator\n", + "\n", + "def generate_tokens(text_iter, language):\n", + " language_index={SRC_LANGUAGE: 0, TGT_LANGUAGE: 1}\n", + "\n", + " for text in text_iter:\n", + " yield token_transform[language](text[language_index[language]])\n", + "\n", + "SRC_LANGUAGE=\"de\"\n", + "TGT_LANGUAGE=\"en\"\n", + "UNK_IDX, PAD_IDX, BOS_IDX, EOS_IDX=0, 1, 2, 3\n", + "special_symbols=[\"\", \"\", \"\", \"\"]\n", + "\n", + "token_transform={\n", + " SRC_LANGUAGE: get_tokenizer(\"spacy\", language=\"de_core_news_sm\"),\n", + " TGT_LANGUAGE: get_tokenizer(\"spacy\", language=\"en_core_web_sm\"),\n", + "}\n", + "print(\"Token Transform:\")\n", + "print(token_transform)\n", + "\n", + "vocab_transform={}\n", + "for language in [SRC_LANGUAGE, TGT_LANGUAGE]:\n", + " train_iter=Multi30k(split=\"train\", language_pair=(SRC_LANGUAGE, TGT_LANGUAGE))\n", + " vocab_transform[language]=build_vocab_from_iterator(\n", + " generate_tokens(train_iter, language),\n", + " min_freq=1,\n", + " specials=special_symbols,\n", + " special_first=True,\n", + " )\n", + "\n", + "for language in [SRC_LANGUAGE, TGT_LANGUAGE]:\n", + " vocab_transform[language].set_default_index(UNK_IDX)\n", + "\n", + "print(\"Vocab Transform:\")\n", + "print(vocab_transform)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "0LGQ4J3a4XPt", + "outputId": "bc361e58-ebb1-4648-c515-01b78df77fbd" + }, + "execution_count": 1, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "\n", + "A module that was compiled using NumPy 1.x cannot be run in\n", + "NumPy 2.0.2 as it may crash. To support both 1.x and 2.x\n", + "versions of NumPy, modules must be compiled with NumPy 2.0.\n", + "Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.\n", + "\n", + "If you are a user of the module, the easiest solution will be to\n", + "downgrade to 'numpy<2' or try to upgrade the affected module.\n", + "We expect that some modules will need time to support NumPy 2.\n", + "\n", + "Traceback (most recent call last): File \"\", line 198, in _run_module_as_main\n", + " File \"\", line 88, in _run_code\n", + " File \"/usr/local/lib/python3.11/dist-packages/colab_kernel_launcher.py\", line 37, in \n", + " ColabKernelApp.launch_instance()\n", + " File \"/usr/local/lib/python3.11/dist-packages/traitlets/config/application.py\", line 992, in launch_instance\n", + " app.start()\n", + " File \"/usr/local/lib/python3.11/dist-packages/ipykernel/kernelapp.py\", line 712, in start\n", + " self.io_loop.start()\n", + " File \"/usr/local/lib/python3.11/dist-packages/tornado/platform/asyncio.py\", line 205, in start\n", + " self.asyncio_loop.run_forever()\n", + " File \"/usr/lib/python3.11/asyncio/base_events.py\", line 608, in run_forever\n", + " self._run_once()\n", + " File \"/usr/lib/python3.11/asyncio/base_events.py\", line 1936, in _run_once\n", + " handle._run()\n", + " File \"/usr/lib/python3.11/asyncio/events.py\", line 84, in _run\n", + " self._context.run(self._callback, *self._args)\n", + " File \"/usr/local/lib/python3.11/dist-packages/ipykernel/kernelbase.py\", line 510, in dispatch_queue\n", + " await self.process_one()\n", + " File \"/usr/local/lib/python3.11/dist-packages/ipykernel/kernelbase.py\", line 499, in process_one\n", + " await dispatch(*args)\n", + " File \"/usr/local/lib/python3.11/dist-packages/ipykernel/kernelbase.py\", line 406, in dispatch_shell\n", + " await result\n", + " File \"/usr/local/lib/python3.11/dist-packages/ipykernel/kernelbase.py\", line 730, in execute_request\n", + " reply_content = await reply_content\n", + " File \"/usr/local/lib/python3.11/dist-packages/ipykernel/ipkernel.py\", line 383, in do_execute\n", + " res = shell.run_cell(\n", + " File \"/usr/local/lib/python3.11/dist-packages/ipykernel/zmqshell.py\", line 528, in run_cell\n", + " return super().run_cell(*args, **kwargs)\n", + " File \"/usr/local/lib/python3.11/dist-packages/IPython/core/interactiveshell.py\", line 2975, in run_cell\n", + " result = self._run_cell(\n", + " File \"/usr/local/lib/python3.11/dist-packages/IPython/core/interactiveshell.py\", line 3030, in _run_cell\n", + " return runner(coro)\n", + " File \"/usr/local/lib/python3.11/dist-packages/IPython/core/async_helpers.py\", line 78, in _pseudo_sync_runner\n", + " coro.send(None)\n", + " File \"/usr/local/lib/python3.11/dist-packages/IPython/core/interactiveshell.py\", line 3257, in run_cell_async\n", + " has_raised = await self.run_ast_nodes(code_ast.body, cell_name,\n", + " File \"/usr/local/lib/python3.11/dist-packages/IPython/core/interactiveshell.py\", line 3473, in run_ast_nodes\n", + " if (await self.run_code(code, result, async_=asy)):\n", + " File \"/usr/local/lib/python3.11/dist-packages/IPython/core/interactiveshell.py\", line 3553, in run_code\n", + " exec(code_obj, self.user_global_ns, self.user_ns)\n", + " File \"\", line 2, in \n", + " from torchtext.datasets import Multi30k\n", + " File \"/usr/local/lib/python3.11/dist-packages/torchtext/__init__.py\", line 3, in \n", + " from torch.hub import _get_torch_home\n", + " File \"/usr/local/lib/python3.11/dist-packages/torch/__init__.py\", line 1471, in \n", + " from .functional import * # noqa: F403\n", + " File \"/usr/local/lib/python3.11/dist-packages/torch/functional.py\", line 9, in \n", + " import torch.nn.functional as F\n", + " File \"/usr/local/lib/python3.11/dist-packages/torch/nn/__init__.py\", line 1, in \n", + " from .modules import * # noqa: F403\n", + " File \"/usr/local/lib/python3.11/dist-packages/torch/nn/modules/__init__.py\", line 35, in \n", + " from .transformer import TransformerEncoder, TransformerDecoder, \\\n", + " File \"/usr/local/lib/python3.11/dist-packages/torch/nn/modules/transformer.py\", line 20, in \n", + " device: torch.device = torch.device(torch._C._get_default_device()), # torch.device('cpu'),\n", + "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)\n", + " device: torch.device = torch.device(torch._C._get_default_device()), # torch.device('cpu'),\n" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Token Transform:\n", + "{'de': functools.partial(, spacy=), 'en': functools.partial(, spacy=)}\n", + "Vocab Transform:\n", + "{'de': Vocab(), 'en': Vocab()}\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "#트랜스포머 모델 구성\n", + "import math\n", + "import torch\n", + "from torch import nn\n", + "\n", + "class PositionalEncoding(nn.Module):\n", + " def __init__(self, d_model, max_len, dropout=0.1):\n", + " super().__init__()\n", + " self.dropout=nn.Dropout(p=dropout)\n", + "\n", + " position=torch.arange(max_len).unsqueeze(1)\n", + " div_term=torch.exp(\n", + " torch.arange(0, d_model, 2) * (-math.log(10000.0) / d_model)\n", + " )\n", + "\n", + " pe=torch.zeros(max_len, 1, d_model)\n", + " pe[:, 0, 0::2]=torch.sin(position * div_term)\n", + " pe[:, 0, 1::2]=torch.cos(position * div_term)\n", + " self.register_buffer(\"pe\", pe)\n", + "\n", + " def forward(self, x):\n", + " x=x + self.pe[: x.size(0)]\n", + " return self.dropout(x)\n", + "\n", + "class TokenEmbedding(nn.Module):\n", + " def __init__(self, vocab_size, emb_size):\n", + " super().__init__()\n", + " self.embedding=nn.Embedding(vocab_size, emb_size)\n", + " self.emb_size=emb_size\n", + "\n", + " def forward(self, tokens):\n", + " return self.embedding(tokens.long()) * math.sqrt(self.emb_size)\n", + "\n", + "class Seq2SeqTransformer(nn.Module):\n", + " def __init__(\n", + " self,\n", + " num_encoder_layers,\n", + " num_decoder_layers,\n", + " emb_size,\n", + " max_len,\n", + " nhead,\n", + " src_vocab_size,\n", + " tgt_vocab_size,\n", + " dim_feedforward,\n", + " dropout=0.1,\n", + " ):\n", + " super().__init__()\n", + " self.src_tok_emb=TokenEmbedding(src_vocab_size, emb_size)\n", + " self.tgt_tok_emb=TokenEmbedding(tgt_vocab_size, emb_size)\n", + " self.positional_encoding=PositionalEncoding(\n", + " d_model=emb_size, max_len=max_len, dropout=dropout\n", + " )\n", + " self.transformer=nn.Transformer(\n", + " d_model=emb_size,\n", + " nhead=nhead,\n", + " num_encoder_layers=num_encoder_layers,\n", + " num_decoder_layers=num_decoder_layers,\n", + " dim_feedforward=dim_feedforward,\n", + " dropout=dropout,\n", + " )\n", + " self.generator=nn.Linear(emb_size, tgt_vocab_size)\n", + "\n", + " def forward(\n", + " self,\n", + " src,\n", + " trg,\n", + " src_mask,\n", + " tgt_mask,\n", + " src_padding_mask,\n", + " tgt_padding_mask,\n", + " memory_key_padding_mask,\n", + " ):\n", + " src_emb=self.positional_encoding(self.src_tok_emb(src))\n", + " tgt_emb=self.positional_encoding(self.tgt_tok_emb(trg))\n", + " outs=self.transformer(\n", + " src=src_emb,\n", + " tgt=tgt_emb,\n", + " src_mask=src_mask,\n", + " tgt_mask=tgt_mask,\n", + " memory_mask=None,\n", + " src_key_padding_mask=src_padding_mask,\n", + " tgt_key_padding_mask=tgt_padding_mask,\n", + " memory_key_padding_mask=memory_key_padding_mask,\n", + " )\n", + " return self.generator(outs)\n", + "\n", + " def encode(self, src, src_mask):\n", + " return self.transformer.encoder(\n", + " self.positional_encoding(self.src_tok_emb(src)), src_mask\n", + " )\n", + "\n", + " def decode(self, tgt, memory, tgt_mask):\n", + " return self.transformer.decoder(\n", + " self.positional_encoding(self.tgt_tok_emb(tgt)), memory, tgt_mask\n", + " )" + ], + "metadata": { + "id": "o_kLlHYAJZD5" + }, + "execution_count": 2, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "#트랜스포머 클래스\n", + "#transformer=torch.nn.Transformer(\n", + "# d_model=512,\n", + "# nhead=8,\n", + "# num_encoder_layers=6,\n", + "# num_decoder_layers=6,\n", + "# dim_feedforward=2048,\n", + "# dropout=0.1,\n", + "# activation=torch.nn.functional.relu,\n", + "# layer_norm_eps=1e-05,\n", + "#)" + ], + "metadata": { + "id": "lQ67VUNlRlLx" + }, + "execution_count": 3, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "#트랜스포머 순방향 메서드\n", + "#output=transformer.forward(\n", + "# src,\n", + "# tgt,\n", + "# src_mask=None,\n", + "# tgt_mask=None,\n", + "# memory_mask=None,\n", + "# src_key_padding_mask=None,\n", + "# tgt_key_padding_mask=None,\n", + "# memory_key_padding_mask=None,\n", + "#)" + ], + "metadata": { + "id": "DAFmeLBOSkMe" + }, + "execution_count": 4, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "#트랜스포머 모델 구조\n", + "from torch import optim\n", + "\n", + "BATCH_SIZE=128\n", + "DEVICE=\"cuda\" if torch.cuda.is_available() else \"cpu\"\n", + "\n", + "model=Seq2SeqTransformer(\n", + " num_encoder_layers=3,\n", + " num_decoder_layers=3,\n", + " emb_size=512,\n", + " max_len=512,\n", + " nhead=8,\n", + " src_vocab_size=len(vocab_transform[SRC_LANGUAGE]),\n", + " tgt_vocab_size=len(vocab_transform[TGT_LANGUAGE]),\n", + " dim_feedforward=512,\n", + ").to(DEVICE)\n", + "criterion=nn.CrossEntropyLoss(ignore_index=PAD_IDX).to(DEVICE)\n", + "optimizer=optim.Adam(model.parameters())\n", + "\n", + "for main_name, main_module in model.named_children():\n", + " print(main_name)\n", + " for sub_name, sub_module in main_module.named_children():\n", + " print(\"L\", sub_name)\n", + " for ssub_name, ssub_module in sub_module.named_children():\n", + " print(\"| L\", ssub_name)\n", + " for sssub_name, sssub_module in ssub_module.named_children():\n", + " print(\"| | L\", sssub_name)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "omnRjLHJTmq4", + "outputId": "f2ae5e44-e89a-40f0-8630-8378c32b8bac" + }, + "execution_count": 5, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/transformer.py:286: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)\n", + " warnings.warn(f\"enable_nested_tensor is True, but self.use_nested_tensor is False because {why_not_sparsity_fast_path}\")\n" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "src_tok_emb\n", + "L embedding\n", + "tgt_tok_emb\n", + "L embedding\n", + "positional_encoding\n", + "L dropout\n", + "transformer\n", + "L encoder\n", + "| L layers\n", + "| | L 0\n", + "| | L 1\n", + "| | L 2\n", + "| L norm\n", + "L decoder\n", + "| L layers\n", + "| | L 0\n", + "| | L 1\n", + "| | L 2\n", + "| L norm\n", + "generator\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "#배치 데이터 생성\n", + "from torch.utils.data import DataLoader\n", + "from torch.nn.utils.rnn import pad_sequence\n", + "\n", + "def sequential_transforms(*transforms):\n", + " def func(txt_input):\n", + " for transform in transforms:\n", + " txt_input=transform(txt_input)\n", + " return txt_input\n", + " return func\n", + "\n", + "def input_transform(token_ids):\n", + " return torch.cat(\n", + " (torch.tensor([BOS_IDX]), torch.tensor(token_ids), torch.tensor([EOS_IDX]))\n", + " )\n", + "\n", + "def collator(batch):\n", + " src_batch, tgt_batch=[], []\n", + " for src_sample, tgt_sample in batch:\n", + " src_batch.append(text_transform[SRC_LANGUAGE](src_sample.rstrip(\"\\n\")))\n", + " tgt_batch.append(text_transform[TGT_LANGUAGE](tgt_sample.rstrip(\"\\n\")))\n", + "\n", + " src_batch=pad_sequence(src_batch, padding_value=PAD_IDX)\n", + " tgt_batch=pad_sequence(tgt_batch, padding_value=PAD_IDX)\n", + " return src_batch, tgt_batch\n", + "\n", + "text_transform={}\n", + "for language in [SRC_LANGUAGE, TGT_LANGUAGE]:\n", + " text_transform[language]=sequential_transforms(\n", + " token_transform[language], vocab_transform[language], input_transform\n", + " )\n", + "\n", + "data_iter=Multi30k(split=\"valid\", language_pair=(SRC_LANGUAGE, TGT_LANGUAGE))\n", + "dataloader=DataLoader(data_iter, batch_size=BATCH_SIZE, collate_fn=collator)\n", + "source_tensor, target_tensor=next(iter(dataloader))\n", + "\n", + "print(\"(source, target):\")\n", + "print(next(iter(data_iter)))\n", + "\n", + "print(\"source_batch:\", source_tensor.shape)\n", + "print(source_tensor)\n", + "\n", + "print(\"target_batch:\", target_tensor.shape)\n", + "print(target_tensor)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "YWxod3OkUXsc", + "outputId": "50dd3bd8-4f3d-48b8-cbd8-be94550f8dfc" + }, + "execution_count": 6, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(source, target):\n", + "('Eine Gruppe von Männern lädt Baumwolle auf einen Lastwagen', 'A group of men are loading cotton onto a truck')\n", + "source_batch: torch.Size([35, 128])\n", + "tensor([[ 2, 2, 2, ..., 2, 2, 2],\n", + " [ 14, 5, 5, ..., 5, 21, 5],\n", + " [ 38, 12, 35, ..., 12, 1750, 69],\n", + " ...,\n", + " [ 1, 1, 1, ..., 1, 1, 1],\n", + " [ 1, 1, 1, ..., 1, 1, 1],\n", + " [ 1, 1, 1, ..., 1, 1, 1]])\n", + "target_batch: torch.Size([30, 128])\n", + "tensor([[ 2, 2, 2, ..., 2, 2, 2],\n", + " [ 6, 6, 6, ..., 250, 19, 6],\n", + " [ 39, 12, 35, ..., 12, 3254, 61],\n", + " ...,\n", + " [ 1, 1, 1, ..., 1, 1, 1],\n", + " [ 1, 1, 1, ..., 1, 1, 1],\n", + " [ 1, 1, 1, ..., 1, 1, 1]])\n" + ] + }, + { + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.11/dist-packages/torch/utils/data/datapipes/iter/combining.py:337: UserWarning: Some child DataPipes are not exhausted when __iter__ is called. We are resetting the buffer and each child DataPipe will read from the start again.\n", + " warnings.warn(\"Some child DataPipes are not exhausted when __iter__ is called. We are resetting \"\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "#어텐션 마스크 생성\n", + "def generate_square_subsequent_mask(s):\n", + " mask=(torch.triu(torch.ones((s, s), device=DEVICE))==1).transpose(0, 1)\n", + " mask=(\n", + " mask.float()\n", + " .masked_fill(mask==0, float(\"-inf\"))\n", + " .masked_fill(mask==1, float(0.0))\n", + " )\n", + " return mask\n", + "\n", + "def create_mask(src, tgt):\n", + " src_seq_len=src.shape[0]\n", + " tgt_seq_len=tgt.shape[0]\n", + "\n", + " tgt_mask=generate_square_subsequent_mask(tgt_seq_len)\n", + " src_mask=torch.zeros((src_seq_len, src_seq_len), device=DEVICE).type(torch.bool)\n", + "\n", + " src_padding_mask=(src==PAD_IDX).transpose(0, 1)\n", + " tgt_padding_mask=(tgt==PAD_IDX).transpose(0, 1)\n", + " return src_mask, tgt_mask, src_padding_mask, tgt_padding_mask\n", + "\n", + "target_input=target_tensor[:-1, :]\n", + "target_out=target_tensor[1:, :]\n", + "\n", + "source_mask, target_mask, source_padding_mask, target_padding_mask=create_mask(\n", + " source_tensor, target_input\n", + ")\n", + "\n", + "print(\"source_mask:\", source_mask.shape)\n", + "print(source_mask)\n", + "print(\"target_mask:\", target_mask.shape)\n", + "print(target_mask)\n", + "print(\"source_padding_mask:\", source_padding_mask.shape)\n", + "print(source_padding_mask)\n", + "print(\"target_padding_mask:\", target_padding_mask.shape)\n", + "print(target_padding_mask)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "92n_LBnvWzKs", + "outputId": "286a28b2-dd2d-4ca3-fa26-260bd8674922" + }, + "execution_count": 7, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "source_mask: torch.Size([35, 35])\n", + "tensor([[False, False, False, ..., False, False, False],\n", + " [False, False, False, ..., False, False, False],\n", + " [False, False, False, ..., False, False, False],\n", + " ...,\n", + " [False, False, False, ..., False, False, False],\n", + " [False, False, False, ..., False, False, False],\n", + " [False, False, False, ..., False, False, False]])\n", + "target_mask: torch.Size([29, 29])\n", + "tensor([[0., -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf,\n", + " -inf, -inf, -inf, -inf, -inf],\n", + " [0., 0., -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf,\n", + " -inf, -inf, -inf, -inf, -inf],\n", + " [0., 0., 0., -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf,\n", + " -inf, -inf, -inf, -inf, -inf],\n", + " [0., 0., 0., 0., -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf,\n", + " -inf, -inf, -inf, -inf, -inf],\n", + " [0., 0., 0., 0., 0., -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf,\n", + " -inf, -inf, -inf, -inf, -inf],\n", + " [0., 0., 0., 0., 0., 0., -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf,\n", + " -inf, -inf, -inf, -inf, -inf],\n", + " [0., 0., 0., 0., 0., 0., 0., -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf,\n", + " -inf, -inf, -inf, -inf, -inf],\n", + " [0., 0., 0., 0., 0., 0., 0., 0., -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf,\n", + " -inf, -inf, -inf, -inf, -inf],\n", + " [0., 0., 0., 0., 0., 0., 0., 0., 0., -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf,\n", + " -inf, -inf, -inf, -inf, -inf],\n", + " [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf,\n", + " -inf, -inf, -inf, -inf, -inf],\n", + " [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf,\n", + " -inf, -inf, -inf, -inf, -inf],\n", + " [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf,\n", + " -inf, -inf, -inf, -inf, -inf],\n", + " [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf,\n", + " -inf, -inf, -inf, -inf, -inf],\n", + " [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf,\n", + " -inf, -inf, -inf, -inf, -inf],\n", + " [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf,\n", + " -inf, -inf, -inf, -inf, -inf],\n", + " [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf,\n", + " -inf, -inf, -inf, -inf, -inf],\n", + " [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., -inf, -inf, -inf, -inf, -inf, -inf, -inf,\n", + " -inf, -inf, -inf, -inf, -inf],\n", + " [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., -inf, -inf, -inf, -inf, -inf, -inf,\n", + " -inf, -inf, -inf, -inf, -inf],\n", + " [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., -inf, -inf, -inf, -inf, -inf,\n", + " -inf, -inf, -inf, -inf, -inf],\n", + " [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., -inf, -inf, -inf, -inf,\n", + " -inf, -inf, -inf, -inf, -inf],\n", + " [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., -inf, -inf, -inf,\n", + " -inf, -inf, -inf, -inf, -inf],\n", + " [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., -inf, -inf,\n", + " -inf, -inf, -inf, -inf, -inf],\n", + " [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., -inf,\n", + " -inf, -inf, -inf, -inf, -inf],\n", + " [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n", + " -inf, -inf, -inf, -inf, -inf],\n", + " [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n", + " 0., -inf, -inf, -inf, -inf],\n", + " [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n", + " 0., 0., -inf, -inf, -inf],\n", + " [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n", + " 0., 0., 0., -inf, -inf],\n", + " [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n", + " 0., 0., 0., 0., -inf],\n", + " [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n", + " 0., 0., 0., 0., 0.]])\n", + "source_padding_mask: torch.Size([128, 35])\n", + "tensor([[False, False, False, ..., True, True, True],\n", + " [False, False, False, ..., True, True, True],\n", + " [False, False, False, ..., True, True, True],\n", + " ...,\n", + " [False, False, False, ..., True, True, True],\n", + " [False, False, False, ..., True, True, True],\n", + " [False, False, False, ..., True, True, True]])\n", + "target_padding_mask: torch.Size([128, 29])\n", + "tensor([[False, False, False, ..., True, True, True],\n", + " [False, False, False, ..., True, True, True],\n", + " [False, False, False, ..., True, True, True],\n", + " ...,\n", + " [False, False, False, ..., True, True, True],\n", + " [False, False, False, ..., True, True, True],\n", + " [False, False, False, ..., True, True, True]])\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "#모델 학습 및 평가\n", + "'''\n", + "def run(model, optimizer, criterion, split):\n", + " model.train() if split==\"train\" else model.eval()\n", + " data_iter=Multi30k(split=split, language_pair=(SRC_LANGUAGE, TGT_LANGUAGE))\n", + " dataloader=DataLoader(data_iter, batch_size=BATCH_SIZE, collate_fn=collator)\n", + "\n", + " losses=0\n", + " for source_batch, target_batch in dataloader:\n", + " source_batch=source_batch.to(DEVICE)\n", + " target_batch=target_batch.to(DEVICE)\n", + "\n", + " target_input=target_batch[:-1, :]\n", + " target_output=target_batch[1:, :]\n", + "\n", + " src_mask, tgt_mask, src_padding_mask, tgt_padding_mask=create_mask(\n", + " source_batch, target_input\n", + " )\n", + "\n", + " logits=model(\n", + " src=source_batch,\n", + " trg=target_input,\n", + " src_mask=src_mask,\n", + " tgt_mask=tgt_mask,\n", + " src_padding_mask=src_padding_mask,\n", + " tgt_padding_mask=tgt_padding_mask,\n", + " memory_key_padding_mask=src_padding_mask,\n", + " )\n", + "\n", + " optimizer.zero_grad()\n", + " loss=criterion(logits.reshape(-1, logits.shape[-1]), target_output.reshape(-1))\n", + " if split==\"train\":\n", + " loss.backward()\n", + " optimizer.step()\n", + " losses += loss.item()\n", + "\n", + " return losses / len(list(dataloader))\n", + "\n", + "for epoch in range(5):\n", + " train_loss=run(model, optimizer, criterion, \"train\")\n", + " valid_loss=run(model, optimizer, criterion, \"valid\")\n", + " print(f\"Epoch: {epoch+1}, Train loss: {train_loss:.3f}, Valid loss: {valid_loss:.3f}\")\n", + "'''" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 123 + }, + "id": "C7_ssU4bYycr", + "outputId": "ba4936f0-461e-4f95-e5cd-2ee26d116cf0" + }, + "execution_count": 9, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'\\ndef run(model, optimizer, criterion, split):\\n model.train() if split==\"train\" else model.eval()\\n data_iter=Multi30k(split=split, language_pair=(SRC_LANGUAGE, TGT_LANGUAGE))\\n dataloader=DataLoader(data_iter, batch_size=BATCH_SIZE, collate_fn=collator)\\n\\n losses=0\\n for source_batch, target_batch in dataloader:\\n source_batch=source_batch.to(DEVICE)\\n target_batch=target_batch.to(DEVICE)\\n\\n target_input=target_batch[:-1, :]\\n target_output=target_batch[1:, :]\\n\\n src_mask, tgt_mask, src_padding_mask, tgt_padding_mask=create_mask(\\n source_batch, target_input\\n )\\n\\n logits=model(\\n src=source_batch,\\n trg=target_input,\\n src_mask=src_mask,\\n tgt_mask=tgt_mask,\\n src_padding_mask=src_padding_mask,\\n tgt_padding_mask=tgt_padding_mask,\\n memory_key_padding_mask=src_padding_mask,\\n )\\n\\n optimizer.zero_grad()\\n loss=criterion(logits.reshape(-1, logits.shape[-1]), target_output.reshape(-1))\\n if split==\"train\":\\n loss.backward()\\n optimizer.step()\\n losses += loss.item()\\n\\n return losses / len(list(dataloader))\\n\\nfor epoch in range(5):\\n train_loss=run(model, optimizer, criterion, \"train\")\\n valid_loss=run(model, optimizer, criterion, \"valid\")\\n print(f\"Epoch: {epoch+1}, Train loss: {train_loss:.3f}, Valid loss: {valid_loss:.3f}\")\\n'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 9 + } + ] + }, + { + "cell_type": "code", + "source": [ + "#트랜스포머 모델 번역 결과\n", + "'''\n", + "def greedy_decode(model, source_tensor, source_mask, max_len, start_symbol):\n", + " source_tensor=source_tensor.to(DEVICE)\n", + " source_mask=source_mask.to(DEVICE)\n", + "\n", + " memory=model.encode(source_tensor, source_mask)\n", + " ys=torch.ones(1, 1).fill_(start_symbol).type(torch.long).to(DEVICE)\n", + " for i in range(max_len - 1):\n", + " memory=memory.to(DEVICE)\n", + " target_mask=generate_square_subsequent_mask(ys.size(0))\n", + " target_mask=target_mask.type(torch.bool).to(DEVICE)\n", + "\n", + " out=model.decode(ys, memory, target_mask)\n", + " out=out.transpose(0, 1)\n", + " prob=model.generator(out[:, -1])\n", + " _, next_word=torch.max(prob, dim=1)\n", + " next_word=next_word.item()\n", + "\n", + " ys=torch.cat(\n", + " [ys, torch.ones(1, 1).type_as(source_tensor.data).fill_(next_word)], dim=0\n", + " )\n", + " if next_word==EOS_IDX:\n", + " break\n", + " return ys\n", + "\n", + "def translate(model, source_sentence):\n", + " model.eval()\n", + " source_tensor=text_transform[SRC_LANGUAGE](source_sentence).view(-1, 1)\n", + " num_tokens=source_tensor.shape[0]\n", + " src_mask=(torch.zeros(num_tokens, num_tokens)).type(torch.bool)\n", + " tgt_tokens=greedy_decode(\n", + " model, source_tensor, src_mask, max_len=num_tokens + 5, start_symbol=BOS_IDX\n", + " ).flatten()\n", + " output=vocab_transform[TGT_LANGUAGE].lookup_tokens(list(tgt_tokens.cpu().numpy()))[1:-1]\n", + " return \" \".join(output)\n", + "\n", + "output_oov=translate(model, \"Eine Gruppe von Menschen steht vor einem Iglu .\")\n", + "output=translate(model, \"Eine Gruppe von Menschen steht vor einem Gebäude .\")\n", + "print(output_oov)\n", + "print(output)\n", + "'''" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 123 + }, + "id": "IBEFEM3Ka-IV", + "outputId": "ca87f515-d44d-41c5-c6d9-7715c3d86151" + }, + "execution_count": 10, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'\\ndef greedy_decode(model, source_tensor, source_mask, max_len, start_symbol):\\n source_tensor=source_tensor.to(DEVICE)\\n source_mask=source_mask.to(DEVICE)\\n\\n memory=model.encode(source_tensor, source_mask)\\n ys=torch.ones(1, 1).fill_(start_symbol).type(torch.long).to(DEVICE)\\n for i in range(max_len - 1):\\n memory=memory.to(DEVICE)\\n target_mask=generate_square_subsequent_mask(ys.size(0))\\n target_mask=target_mask.type(torch.bool).to(DEVICE)\\n \\n out=model.decode(ys, memory, target_mask)\\n out=out.transpose(0, 1)\\n prob=model.generator(out[:, -1])\\n _, next_word=torch.max(prob, dim=1)\\n next_word=next_word.item()\\n\\n ys=torch.cat(\\n [ys, torch.ones(1, 1).type_as(source_tensor.data).fill_(next_word)], dim=0\\n )\\n if next_word==EOS_IDX:\\n break\\n return ys\\n\\ndef translate(model, source_sentence):\\n model.eval()\\n source_tensor=text_transform[SRC_LANGUAGE](source_sentence).view(-1, 1)\\n num_tokens=source_tensor.shape[0]\\n src_mask=(torch.zeros(num_tokens, num_tokens)).type(torch.bool)\\n tgt_tokens=greedy_decode(\\n model, source_tensor, src_mask, max_len=num_tokens + 5, start_symbol=BOS_IDX\\n ).flatten()\\n output=vocab_transform[TGT_LANGUAGE].lookup_tokens(list(tgt_tokens.cpu().numpy()))[1:-1]\\n return \" \".join(output)\\n\\noutput_oov=translate(model, \"Eine Gruppe von Menschen steht vor einem Iglu .\")\\noutput=translate(model, \"Eine Gruppe von Menschen steht vor einem Gebäude .\")\\nprint(output_oov)\\nprint(output)\\n'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 10 + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "**GPT**" + ], + "metadata": { + "id": "CPVbpmVZb5oG" + } + }, + { + "cell_type": "code", + "source": [ + "!pip uninstall transformers -y\n", + "!pip uninstall torchao -y\n", + "!pip uninstall accelerate -y\n", + "!pip install transformers==4.38.2" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 791 + }, + "id": "ECNs9EyciqfQ", + "outputId": "20752d40-0f48-471b-b2fd-068dde63ac62" + }, + "execution_count": 2, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Found existing installation: transformers 4.52.3\n", + "Uninstalling transformers-4.52.3:\n", + " Successfully uninstalled transformers-4.52.3\n", + "\u001b[33mWARNING: Skipping torchao as it is not installed.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Skipping accelerate as it is not installed.\u001b[0m\u001b[33m\n", + "\u001b[0mCollecting transformers==4.38.2\n", + " Downloading transformers-4.38.2-py3-none-any.whl.metadata (130 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m130.7/130.7 kB\u001b[0m \u001b[31m3.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hRequirement already satisfied: filelock in /usr/local/lib/python3.11/dist-packages (from transformers==4.38.2) (3.18.0)\n", + "Requirement already satisfied: huggingface-hub<1.0,>=0.19.3 in /usr/local/lib/python3.11/dist-packages (from transformers==4.38.2) (0.31.2)\n", + "Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.11/dist-packages (from transformers==4.38.2) (2.0.2)\n", + "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.11/dist-packages (from transformers==4.38.2) (24.2)\n", + "Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.11/dist-packages (from transformers==4.38.2) (6.0.2)\n", + "Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.11/dist-packages (from transformers==4.38.2) (2024.11.6)\n", + "Requirement already satisfied: requests in /usr/local/lib/python3.11/dist-packages (from transformers==4.38.2) (2.32.3)\n", + "Collecting tokenizers<0.19,>=0.14 (from transformers==4.38.2)\n", + " Downloading tokenizers-0.15.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)\n", + "Requirement already satisfied: safetensors>=0.4.1 in /usr/local/lib/python3.11/dist-packages (from transformers==4.38.2) (0.5.3)\n", + "Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.11/dist-packages (from transformers==4.38.2) (4.67.1)\n", + "Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.11/dist-packages (from huggingface-hub<1.0,>=0.19.3->transformers==4.38.2) (2025.3.2)\n", + "Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.11/dist-packages (from huggingface-hub<1.0,>=0.19.3->transformers==4.38.2) (4.13.2)\n", + "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests->transformers==4.38.2) (3.4.2)\n", + "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/dist-packages (from requests->transformers==4.38.2) (3.10)\n", + "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/dist-packages (from requests->transformers==4.38.2) (2.4.0)\n", + "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.11/dist-packages (from requests->transformers==4.38.2) (2025.4.26)\n", + "Downloading transformers-4.38.2-py3-none-any.whl (8.5 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m8.5/8.5 MB\u001b[0m \u001b[31m63.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading tokenizers-0.15.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.6 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.6/3.6 MB\u001b[0m \u001b[31m85.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hInstalling collected packages: tokenizers, transformers\n", + " Attempting uninstall: tokenizers\n", + " Found existing installation: tokenizers 0.21.1\n", + " Uninstalling tokenizers-0.21.1:\n", + " Successfully uninstalled tokenizers-0.21.1\n", + "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", + "peft 0.15.2 requires accelerate>=0.21.0, which is not installed.\n", + "sentence-transformers 4.1.0 requires transformers<5.0.0,>=4.41.0, but you have transformers 4.38.2 which is incompatible.\n", + "torchtune 0.6.1 requires torchdata==0.11.0, but you have torchdata 0.7.1 which is incompatible.\u001b[0m\u001b[31m\n", + "\u001b[0mSuccessfully installed tokenizers-0.15.2 transformers-4.38.2\n" + ] + }, + { + "output_type": "display_data", + "data": { + "application/vnd.colab-display-data+json": { + "pip_warning": { + "packages": [ + "transformers" + ] + }, + "id": "f0e8f4b8126d467a890d0f8a4c9d3f55" + } + }, + "metadata": {} + } + ] + }, + { + "cell_type": "code", + "source": [ + "#문장 생성을 위한 GPT-2 모델의 구조\n", + "from transformers import GPT2LMHeadModel\n", + "\n", + "model=GPT2LMHeadModel.from_pretrained(pretrained_model_name_or_path=\"gpt2\")\n", + "\n", + "for main_name, main_module in model.named_children():\n", + " print(main_name)\n", + " for sub_name, sub_module in main_module.named_children():\n", + " print(\"L\", sub_name)\n", + " for ssub_name, ssub_module in sub_module.named_children():\n", + " print(\"| L\", ssub_name)\n", + " for sssub_name, sssub_module in ssub_module.named_children():\n", + " print(\"| | L\", sssub_name)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000, + "referenced_widgets": [ + "e737566dd38d46ab9a9ef1536f5d05aa", + "1a0e342718814cca94c1b012a913b5cb", + "92030f5fc71e411c88dbe59fde8a8943", + "27ff19d5432d406d95570656be644317", + "b5daf02a1cd94b0aa36e90619011c242", + "32d954eaa6f64d26876acff25f3cad7e", + "02da3c36eea648209ebf3a3ff3f5b663", + "37fbc1a3e5d344359599dac23c66c100", + "197bb669140543638089a414068c26f8", + "0c01df79d1fc4fdc949f895a02f51b51", + "6a8796241b4e4e95ac1cc4c3c37eef42", + "fa374ff8b9654558aaac536e6b1080b6", + "2796fa85c834469ca893bf1cc6c9267f", + "3adea502b9ce4440a4d1368c30b723a8", + "782dcc37e6cf4a40852955dba44cce86", + "0c1e4d3b2633458f87f6c89f8d8241c3", + "ef4503c9475047bfb41215cfbb9e58d8", + "5b63f473766a4f34ac4a9c433b89b00f", + "24f21c8c117840cf8b4771f03d4bfe4e", + "595ab61151fa4ad490d27c1cf80b76c8", + "86203ca640604c45896e54528289953f", + "6766bdc5fa6c4741a6ba4d8e958c625b", + "b59c28e374f8440cb92b6be3f700daaa", + "ce63c96a73ce4f43aa5ac20f3897ed6a", + "c4cc797911f94208b6be8caed9479755", + "4e6f54a6942946fcb2ef20a34a162d63", + "633a97c65c374a49b0aefe326a00c32f", + "991c897bdd4e47d98469121f14acce5a", + "b5238fc40bbc4b6392ce14bda84e1fa6", + "3f97c2dac5894369be88de9a7f22ddc1", + "04455aa7edb64e3093fc2c029b17af6c", + "d9678ea535104a10be6abcb5c1e17dcd", + "4be5388615a0484caa89d191779cb0d5" + ] + }, + "id": "4TXAf-XIeeUy", + "outputId": "21b17e58-b90c-4159-9e52-472f50022f61" + }, + "execution_count": 1, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "\n", + "A module that was compiled using NumPy 1.x cannot be run in\n", + "NumPy 2.0.2 as it may crash. To support both 1.x and 2.x\n", + "versions of NumPy, modules must be compiled with NumPy 2.0.\n", + "Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.\n", + "\n", + "If you are a user of the module, the easiest solution will be to\n", + "downgrade to 'numpy<2' or try to upgrade the affected module.\n", + "We expect that some modules will need time to support NumPy 2.\n", + "\n", + "Traceback (most recent call last): File \"\", line 198, in _run_module_as_main\n", + " File \"\", line 88, in _run_code\n", + " File \"/usr/local/lib/python3.11/dist-packages/colab_kernel_launcher.py\", line 37, in \n", + " ColabKernelApp.launch_instance()\n", + " File \"/usr/local/lib/python3.11/dist-packages/traitlets/config/application.py\", line 992, in launch_instance\n", + " app.start()\n", + " File \"/usr/local/lib/python3.11/dist-packages/ipykernel/kernelapp.py\", line 712, in start\n", + " self.io_loop.start()\n", + " File \"/usr/local/lib/python3.11/dist-packages/tornado/platform/asyncio.py\", line 205, in start\n", + " self.asyncio_loop.run_forever()\n", + " File \"/usr/lib/python3.11/asyncio/base_events.py\", line 608, in run_forever\n", + " self._run_once()\n", + " File \"/usr/lib/python3.11/asyncio/base_events.py\", line 1936, in _run_once\n", + " handle._run()\n", + " File \"/usr/lib/python3.11/asyncio/events.py\", line 84, in _run\n", + " self._context.run(self._callback, *self._args)\n", + " File \"/usr/local/lib/python3.11/dist-packages/ipykernel/kernelbase.py\", line 510, in dispatch_queue\n", + " await self.process_one()\n", + " File \"/usr/local/lib/python3.11/dist-packages/ipykernel/kernelbase.py\", line 499, in process_one\n", + " await dispatch(*args)\n", + " File \"/usr/local/lib/python3.11/dist-packages/ipykernel/kernelbase.py\", line 406, in dispatch_shell\n", + " await result\n", + " File \"/usr/local/lib/python3.11/dist-packages/ipykernel/kernelbase.py\", line 730, in execute_request\n", + " reply_content = await reply_content\n", + " File \"/usr/local/lib/python3.11/dist-packages/ipykernel/ipkernel.py\", line 383, in do_execute\n", + " res = shell.run_cell(\n", + " File \"/usr/local/lib/python3.11/dist-packages/ipykernel/zmqshell.py\", line 528, in run_cell\n", + " return super().run_cell(*args, **kwargs)\n", + " File \"/usr/local/lib/python3.11/dist-packages/IPython/core/interactiveshell.py\", line 2975, in run_cell\n", + " result = self._run_cell(\n", + " File \"/usr/local/lib/python3.11/dist-packages/IPython/core/interactiveshell.py\", line 3030, in _run_cell\n", + " return runner(coro)\n", + " File \"/usr/local/lib/python3.11/dist-packages/IPython/core/async_helpers.py\", line 78, in _pseudo_sync_runner\n", + " coro.send(None)\n", + " File \"/usr/local/lib/python3.11/dist-packages/IPython/core/interactiveshell.py\", line 3257, in run_cell_async\n", + " has_raised = await self.run_ast_nodes(code_ast.body, cell_name,\n", + " File \"/usr/local/lib/python3.11/dist-packages/IPython/core/interactiveshell.py\", line 3473, in run_ast_nodes\n", + " if (await self.run_code(code, result, async_=asy)):\n", + " File \"/usr/local/lib/python3.11/dist-packages/IPython/core/interactiveshell.py\", line 3553, in run_code\n", + " exec(code_obj, self.user_global_ns, self.user_ns)\n", + " File \"\", line 2, in \n", + " from transformers import GPT2LMHeadModel\n", + " File \"/usr/local/lib/python3.11/dist-packages/transformers/__init__.py\", line 26, in \n", + " from . import dependency_versions_check\n", + " File \"/usr/local/lib/python3.11/dist-packages/transformers/dependency_versions_check.py\", line 16, in \n", + " from .utils.versions import require_version, require_version_core\n", + " File \"/usr/local/lib/python3.11/dist-packages/transformers/utils/__init__.py\", line 33, in \n", + " from .generic import (\n", + " File \"/usr/local/lib/python3.11/dist-packages/transformers/utils/generic.py\", line 442, in \n", + " import torch.utils._pytree as _torch_pytree\n", + " File \"/usr/local/lib/python3.11/dist-packages/torch/__init__.py\", line 1471, in \n", + " from .functional import * # noqa: F403\n", + " File \"/usr/local/lib/python3.11/dist-packages/torch/functional.py\", line 9, in \n", + " import torch.nn.functional as F\n", + " File \"/usr/local/lib/python3.11/dist-packages/torch/nn/__init__.py\", line 1, in \n", + " from .modules import * # noqa: F403\n", + " File \"/usr/local/lib/python3.11/dist-packages/torch/nn/modules/__init__.py\", line 35, in \n", + " from .transformer import TransformerEncoder, TransformerDecoder, \\\n", + " File \"/usr/local/lib/python3.11/dist-packages/torch/nn/modules/transformer.py\", line 20, in \n", + " device: torch.device = torch.device(torch._C._get_default_device()), # torch.device('cpu'),\n", + "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)\n", + " device: torch.device = torch.device(torch._C._get_default_device()), # torch.device('cpu'),\n", + "/usr/local/lib/python3.11/dist-packages/huggingface_hub/file_download.py:943: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n", + " warnings.warn(\n", + "/usr/local/lib/python3.11/dist-packages/huggingface_hub/utils/_auth.py:94: UserWarning: \n", + "The secret `HF_TOKEN` does not exist in your Colab secrets.\n", + "To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.\n", + "You will be able to reuse this secret in all of your notebooks.\n", + "Please note that authentication is recommended but still optional to access public models or datasets.\n", + " warnings.warn(\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "config.json: 0%| | 0.00/665 [00:00