Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/encoder #7

Merged
merged 22 commits into from
Oct 17, 2020
Merged

Feature/encoder #7

merged 22 commits into from
Oct 17, 2020

Conversation

inmoonlight
Copy link
Member

@inmoonlight inmoonlight commented Sep 18, 2020

Pull Request

레파지토리에 기여해주셔서 감사드립니다.

해당 PR을 제출하기 전에 아래 사항이 완료되었는지 확인 부탁드립니다:

  • 작성한 코드가 어떤 에러나 경고없이 빌드가 되었나요?
  • 충분한 테스트를 수행하셨나요?
    • NOTE: 모델 관련 테스트 코드는 모델 전체가 만들어지고 training 코드까지 완료되면 업로드하겠습니다

1. 해당 PR은 어떤 내용인가요?

Transformer encoder

2. PR과 관련된 이슈가 있나요?

#1

@codecov
Copy link

codecov bot commented Sep 18, 2020

Codecov Report

Merging #7 into master will decrease coverage by 3.87%.
The diff coverage is 95.10%.

Impacted file tree graph

@@             Coverage Diff             @@
##            master       #7      +/-   ##
===========================================
- Coverage   100.00%   96.12%   -3.88%     
===========================================
  Files            5        7       +2     
  Lines          183      232      +49     
===========================================
+ Hits           183      223      +40     
- Misses           0        9       +9     
Flag Coverage Δ
#tests 96.12% <95.10%> (-3.88%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/dataloader.py 86.95% <86.95%> (ø)
src/data/data_utils.py 94.73% <94.73%> (ø)
src/utils.py 98.00% <97.22%> (-2.00%) ⬇️
src/data/dataset.py 97.95% <97.95%> (ø)
tests/test_dataset.py 100.00% <100.00%> (ø)
tests/test_utils.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e4ed23f...0680e26. Read the comment docs.

@inmoonlight inmoonlight requested review from seopbo and ssaru September 18, 2020 08:37
@inmoonlight inmoonlight self-assigned this Sep 18, 2020
src/dataset.py Outdated Show resolved Hide resolved
src/transformer.py Outdated Show resolved Hide resolved
Copy link
Member

@ssaru ssaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@inmoonlight
코멘트 완료하였습니다!

src/dataset.py Outdated Show resolved Hide resolved
src/dataset.py Outdated Show resolved Hide resolved
@ssaru
Copy link
Member

ssaru commented Oct 2, 2020

@inmoonlight
지형님! 혹시 PR 반영을 완료하신 것이라면, 멘션으로 걸어주시면
"제가 PR을 봐야하는구나!"라고 인지하기 더 좋을 것 같습니다! ㅎㅎ
아니면 슬랙 DM으로 쪼아주셔도 저는 좋습니다! ㅋㅋㅋㅋ
(제가 보틀넥이 될까 무섭네요 ㅠㅠ)

Comment on lines +27 to +28
if len(source_tokens) > max_length or len(target_tokens) > max_length:
continue
Copy link
Member

@ssaru ssaru Oct 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

이런 부분까지 codecov를 요구하는군요! ㄷㄷㄷ

src/data/data_utils.py#L28
Added line #L28 was not covered by tests

self.source_lines = read_lines(root_dir / self.data_config.path.source_test)
self.target_lines = read_lines(root_dir / self.data_config.path.target_test)
else:
raise ValueError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

여기도 codecov가!! ㅋㅋㅋㅋ

src/data/dataset.py#L40
Added line #L40 was not covered by tests

indices_batch = torch.arange(start_idx, end_idx)
indices_batches.append(indices_batch)
start_idx = end_idx
source_sample_lens, target_sample_lens = [source_sample_lens[-1]], [
Copy link
Member

@ssaru ssaru Oct 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

여기가 code cov안됬다는 것은 또 신기하네요.
돌았을 것 같은데...

아 test 안에서 if에 안걸렸었나 보군요

Added lines #L74 - L77 were not covered by tests

Comment on lines +81 to +83
elif end_idx == len(dataset):
indices_batch = torch.arange(start_idx, end_idx)
indices_batches.append(indices_batch)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

여기도 test 코드 내에서 해당조건에 대한 테스트 구문이 없는 것 같네요! ㅎㅎ

src/utils.py Outdated Show resolved Hide resolved
Copy link
Member

@ssaru ssaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@inmoonlight

코멘트 완료하였습니다.
아우야... 테스트 코드를 완전 촘촘하게 짜셨군요!

본받겠습니다.

@inmoonlight
Copy link
Member Author

@ssaru @aisolab test code 작성 외에는 리뷰 반영 하였습니다! ㅋㅋㅋㅋㅋ test code는.. 조금 우선순위가 내려가서 커버리지가 90 이상이면 우선 넘어가고 마지막에 100으로 맞춰볼게요!

source_emb = self.embedding(source_tokens)
for i in range(self.num_layers):
source_emb, source_mask = self.encoder_layers[i](source_emb, source_mask)
return EncoderOut(
Copy link

@seopbo seopbo Oct 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

이 부분에서 좀 궁금한점이 있습니다. nn.Moduleforward가 리턴하는 객체가 custom class나 지형님이 사용한 것과 같이 NamedTuple을 이용하여, 인터페이스를 위한 데이터 오브젝트를 정의할 경우, 이부분을 DataLoadercollate_fn을 새롭게 정의해야하는 게 맞는 거겠죠?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

재밌는 컨셉이네요. 저도 와드! ㅎㅎ

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aisolab encoder는 나중에 transformer 전체 조립을 할 때 불러다 쓰는 용도라서요 ㅎㅎ transformer.py에서 data inout 의 형태가 결정될 것 같아요 (제가 질문을 잘 이해했는지 확신이 안드네요)

positional_encoding = positional_encoding.unsqueeze(0).transpose(
0, 1
) # (max_len, 1, embedding_dim)
self.register_buffer(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hook을 사용한 것인가요? 혹시 설명부탁드려도 될까요? (PyTorch에 조예가 깊으신 지형님!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ssaru 오 아예 내부에 저런식으로 넣는군요 이해했사옵니다.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aisolab 이미 @ssaru 님이 잘 설명해주신 것 같은데요, 보통 파이토치로 모델을 짜면 학습하는 부분에 대해서만 모듈이 저장되는데요, positional encoding 의 경우 학습하는 부분은 아니지만 transformer 모델 구조에서 중요한 부분을 차지하고 있어서 buffer로 넣어주었습니당

Copy link

@seopbo seopbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

전체적으로 훑고 코멘트 남겼습니다. LGTM!

src/utils.py Show resolved Hide resolved
Copy link
Member

@ssaru ssaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@inmoonlight

LGTM
재밌게 잘봤습니다.
코드에 대한 고민을 많이 하신듯 보입니다.

@inmoonlight
Copy link
Member Author

추후 테스트코드 짜는 것을 잊지 않기 위한 이슈: #10

@inmoonlight inmoonlight merged commit dea856a into master Oct 17, 2020
@inmoonlight inmoonlight deleted the feature/encoder branch October 17, 2020 07:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants