Feature/encoder #7

inmoonlight · 2020-09-18T08:02:24Z

Pull Request

레파지토리에 기여해주셔서 감사드립니다.

해당 PR을 제출하기 전에 아래 사항이 완료되었는지 확인 부탁드립니다:

작성한 코드가 어떤 에러나 경고없이 빌드가 되었나요?
충분한 테스트를 수행하셨나요?
- NOTE: 모델 관련 테스트 코드는 모델 전체가 만들어지고 training 코드까지 완료되면 업로드하겠습니다

1. 해당 PR은 어떤 내용인가요?

Transformer encoder

2. PR과 관련된 이슈가 있나요?

#1

codecov · 2020-09-18T08:34:40Z

Codecov Report

Merging #7 into master will decrease coverage by 3.87%.
The diff coverage is 95.10%.

@@             Coverage Diff             @@
##            master       #7      +/-   ##
===========================================
- Coverage   100.00%   96.12%   -3.88%     
===========================================
  Files            5        7       +2     
  Lines          183      232      +49     
===========================================
+ Hits           183      223      +40     
- Misses           0        9       +9

Flag	Coverage Δ
#tests	`96.12% <95.10%> (-3.88%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/dataloader.py	`86.95% <86.95%> (ø)`
src/data/data_utils.py	`94.73% <94.73%> (ø)`
src/utils.py	`98.00% <97.22%> (-2.00%)`	⬇️
src/data/dataset.py	`97.95% <97.95%> (ø)`
tests/test_dataset.py	`100.00% <100.00%> (ø)`
tests/test_utils.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e4ed23f...0680e26. Read the comment docs.

src/dataset.py

src/transformer.py

ssaru

@inmoonlight
코멘트 완료하였습니다!

src/dataset.py

script

ssaru · 2020-10-02T15:11:54Z

@inmoonlight
지형님! 혹시 PR 반영을 완료하신 것이라면, 멘션으로 걸어주시면
"제가 PR을 봐야하는구나!"라고 인지하기 더 좋을 것 같습니다! ㅎㅎ
아니면 슬랙 DM으로 쪼아주셔도 저는 좋습니다! ㅋㅋㅋㅋ
(제가 보틀넥이 될까 무섭네요 ㅠㅠ)

ssaru · 2020-10-02T15:14:55Z

src/data/data_utils.py

+        if len(source_tokens) > max_length or len(target_tokens) > max_length:
+            continue


이런 부분까지 codecov를 요구하는군요! ㄷㄷㄷ

src/data/data_utils.py#L28 Added line #L28 was not covered by tests

ssaru · 2020-10-02T15:16:40Z

src/data/dataset.py

+            self.source_lines = read_lines(root_dir / self.data_config.path.source_test)
+            self.target_lines = read_lines(root_dir / self.data_config.path.target_test)
+        else:
+            raise ValueError(


여기도 codecov가!! ㅋㅋㅋㅋ

src/data/dataset.py#L40 Added line #L40 was not covered by tests

ssaru · 2020-10-02T15:21:09Z

src/dataloader.py

+                indices_batch = torch.arange(start_idx, end_idx)
+                indices_batches.append(indices_batch)
+                start_idx = end_idx
+                source_sample_lens, target_sample_lens = [source_sample_lens[-1]], [


여기가 code cov안됬다는 것은 또 신기하네요.
돌았을 것 같은데...

아 test 안에서 if에 안걸렸었나 보군요

Added lines #L74 - L77 were not covered by tests

ssaru · 2020-10-02T15:21:58Z

src/dataloader.py

+            elif end_idx == len(dataset):
+                indices_batch = torch.arange(start_idx, end_idx)
+                indices_batches.append(indices_batch)


여기도 test 코드 내에서 해당조건에 대한 테스트 구문이 없는 것 같네요! ㅎㅎ

src/utils.py

ssaru

@inmoonlight

코멘트 완료하였습니다.
아우야... 테스트 코드를 완전 촘촘하게 짜셨군요!

본받겠습니다.

inmoonlight · 2020-10-06T00:09:23Z

@ssaru @aisolab test code 작성 외에는 리뷰 반영 하였습니다! ㅋㅋㅋㅋㅋ test code는.. 조금 우선순위가 내려가서 커버리지가 90 이상이면 우선 넘어가고 마지막에 100으로 맞춰볼게요!

seopbo · 2020-10-11T03:07:50Z

src/model/encoder.py

+        source_emb = self.embedding(source_tokens)
+        for i in range(self.num_layers):
+            source_emb, source_mask = self.encoder_layers[i](source_emb, source_mask)
+        return EncoderOut(


이 부분에서 좀 궁금한점이 있습니다. nn.Module에 forward가 리턴하는 객체가 custom class나 지형님이 사용한 것과 같이 NamedTuple을 이용하여, 인터페이스를 위한 데이터 오브젝트를 정의할 경우, 이부분을 DataLoader의 collate_fn을 새롭게 정의해야하는 게 맞는 거겠죠?

재밌는 컨셉이네요. 저도 와드! ㅎㅎ

@aisolab encoder는 나중에 transformer 전체 조립을 할 때 불러다 쓰는 용도라서요 ㅎㅎ transformer.py에서 data inout 의 형태가 결정될 것 같아요 (제가 질문을 잘 이해했는지 확신이 안드네요)

seopbo · 2020-10-11T03:09:43Z

src/model/modules.py

+        positional_encoding = positional_encoding.unsqueeze(0).transpose(
+            0, 1
+        )  # (max_len, 1, embedding_dim)
+        self.register_buffer(


hook을 사용한 것인가요? 혹시 설명부탁드려도 될까요? (PyTorch에 조예가 깊으신 지형님!

모듈 안에 변수를 넣는 방식인 것 같습니다.

https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.register_buffer

@ssaru 오 아예 내부에 저런식으로 넣는군요 이해했사옵니다.

@aisolab 이미 @ssaru 님이 잘 설명해주신 것 같은데요, 보통 파이토치로 모델을 짜면 학습하는 부분에 대해서만 모듈이 저장되는데요, positional encoding 의 경우 학습하는 부분은 아니지만 transformer 모델 구조에서 중요한 부분을 차지하고 있어서 buffer로 넣어주었습니당

seopbo

전체적으로 훑고 코멘트 남겼습니다. LGTM!

src/utils.py

ssaru

@inmoonlight

LGTM
재밌게 잘봤습니다.
코드에 대한 고민을 많이 하신듯 보입니다.

inmoonlight · 2020-10-17T07:15:24Z

추후 테스트코드 짜는 것을 잊지 않기 위한 이슈: #10

inmoonlight added 10 commits September 12, 2020 15:33

feat: add transformer base and big configs

c39d7ee

chore: add todo

9ec24ef

feat: remove template script

2ddc7e2

feat: split model into transformer-base and big

4d1ee7a

feat: add mask when prepare encoded tokens

db00364

feat: add encoder

dbf081b

style: apply isort and black

a67a773

chore: update model configs

c8ae7b6

fix: bugs related to config

9fa1879

fix: bugs related to dataset

dcc837a

inmoonlight requested review from seopbo and ssaru September 18, 2020 08:37

inmoonlight self-assigned this Sep 18, 2020

ssaru reviewed Sep 18, 2020

View reviewed changes

src/dataset.py Outdated Show resolved Hide resolved

ssaru reviewed Sep 18, 2020

View reviewed changes

src/transformer.py Outdated Show resolved Hide resolved

ssaru reviewed Sep 18, 2020

View reviewed changes

seopbo suggested changes Sep 21, 2020

View reviewed changes

src/dataset.py Outdated Show resolved Hide resolved

src/dataset.py Outdated Show resolved Hide resolved

inmoonlight added 6 commits October 2, 2020 18:15

refactor: rename and reorganize directory

2242760

fix: fix typo and save directory

a903208

feat & refactor: batch by token nums and split dataset and dataloader

64601bf

script

fix: fix bug

aa7effc

test: update test

b860d43

style: apply isort and black

a975d4f

ssaru reviewed Oct 2, 2020

View reviewed changes

src/utils.py Outdated Show resolved Hide resolved

ssaru reviewed Oct 2, 2020

View reviewed changes

inmoonlight added 6 commits October 6, 2020 08:59

chore: modify get_configs and get_config module to the builder

f39e062

fix: fix attention module to support masked attention

f7a1774

chore: add example tokenizer

9ccc1f3

refactor: update encoder

6b8bafe

style: apply isort and black

afcf303

chore: add type hinting

0680e26

seopbo reviewed Oct 11, 2020

View reviewed changes

seopbo approved these changes Oct 11, 2020

View reviewed changes

ssaru reviewed Oct 13, 2020

View reviewed changes

src/utils.py Show resolved Hide resolved

ssaru approved these changes Oct 13, 2020

View reviewed changes

inmoonlight mentioned this pull request Oct 17, 2020

[FEATURE] test code #10

Open

inmoonlight merged commit dea856a into master Oct 17, 2020

inmoonlight deleted the feature/encoder branch October 17, 2020 07:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/encoder #7

Feature/encoder #7

inmoonlight commented Sep 18, 2020 •

edited

Loading

codecov bot commented Sep 18, 2020 •

edited

Loading

ssaru left a comment

ssaru commented Oct 2, 2020

ssaru Oct 2, 2020 •

edited

Loading

ssaru Oct 2, 2020

ssaru Oct 2, 2020 •

edited

Loading

ssaru Oct 2, 2020

ssaru left a comment

inmoonlight commented Oct 6, 2020

seopbo Oct 11, 2020 •

edited

Loading

ssaru Oct 13, 2020

inmoonlight Oct 17, 2020

seopbo Oct 11, 2020

ssaru Oct 13, 2020

seopbo Oct 17, 2020

inmoonlight Oct 17, 2020

seopbo left a comment

ssaru left a comment

inmoonlight commented Oct 17, 2020

		if len(source_tokens) > max_length or len(target_tokens) > max_length:
		continue

Feature/encoder #7

Feature/encoder #7

Conversation

inmoonlight commented Sep 18, 2020 • edited Loading

Pull Request

1. 해당 PR은 어떤 내용인가요?

2. PR과 관련된 이슈가 있나요?

codecov bot commented Sep 18, 2020 • edited Loading

Codecov Report

ssaru left a comment

Choose a reason for hiding this comment

ssaru commented Oct 2, 2020

ssaru Oct 2, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ssaru Oct 2, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ssaru left a comment

Choose a reason for hiding this comment

inmoonlight commented Oct 6, 2020

seopbo Oct 11, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seopbo left a comment

Choose a reason for hiding this comment

ssaru left a comment

Choose a reason for hiding this comment

inmoonlight commented Oct 17, 2020

inmoonlight commented Sep 18, 2020 •

edited

Loading

codecov bot commented Sep 18, 2020 •

edited

Loading

ssaru Oct 2, 2020 •

edited

Loading

ssaru Oct 2, 2020 •

edited

Loading

seopbo Oct 11, 2020 •

edited

Loading