-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/encoder #7
Conversation
Codecov Report
@@ Coverage Diff @@
## master #7 +/- ##
===========================================
- Coverage 100.00% 96.12% -3.88%
===========================================
Files 5 7 +2
Lines 183 232 +49
===========================================
+ Hits 183 223 +40
- Misses 0 9 +9
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@inmoonlight
코멘트 완료하였습니다!
@inmoonlight |
if len(source_tokens) > max_length or len(target_tokens) > max_length: | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
이런 부분까지 codecov를 요구하는군요! ㄷㄷㄷ
src/data/data_utils.py#L28
Added line #L28 was not covered by tests
self.source_lines = read_lines(root_dir / self.data_config.path.source_test) | ||
self.target_lines = read_lines(root_dir / self.data_config.path.target_test) | ||
else: | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
여기도 codecov가!! ㅋㅋㅋㅋ
src/data/dataset.py#L40
Added line #L40 was not covered by tests
indices_batch = torch.arange(start_idx, end_idx) | ||
indices_batches.append(indices_batch) | ||
start_idx = end_idx | ||
source_sample_lens, target_sample_lens = [source_sample_lens[-1]], [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
여기가 code cov안됬다는 것은 또 신기하네요.
돌았을 것 같은데...
아 test 안에서 if에 안걸렸었나 보군요
Added lines #L74 - L77 were not covered by tests
elif end_idx == len(dataset): | ||
indices_batch = torch.arange(start_idx, end_idx) | ||
indices_batches.append(indices_batch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
여기도 test 코드 내에서 해당조건에 대한 테스트 구문이 없는 것 같네요! ㅎㅎ
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ssaru @aisolab test code 작성 외에는 리뷰 반영 하였습니다! ㅋㅋㅋㅋㅋ test code는.. 조금 우선순위가 내려가서 커버리지가 90 이상이면 우선 넘어가고 마지막에 100으로 맞춰볼게요! |
source_emb = self.embedding(source_tokens) | ||
for i in range(self.num_layers): | ||
source_emb, source_mask = self.encoder_layers[i](source_emb, source_mask) | ||
return EncoderOut( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
이 부분에서 좀 궁금한점이 있습니다. nn.Module
에 forward
가 리턴하는 객체가 custom class
나 지형님이 사용한 것과 같이 NamedTuple
을 이용하여, 인터페이스를 위한 데이터 오브젝트를 정의할 경우, 이부분을 DataLoader
의 collate_fn
을 새롭게 정의해야하는 게 맞는 거겠죠?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
재밌는 컨셉이네요. 저도 와드! ㅎㅎ
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aisolab encoder는 나중에 transformer 전체 조립을 할 때 불러다 쓰는 용도라서요 ㅎㅎ transformer.py
에서 data inout 의 형태가 결정될 것 같아요 (제가 질문을 잘 이해했는지 확신이 안드네요)
positional_encoding = positional_encoding.unsqueeze(0).transpose( | ||
0, 1 | ||
) # (max_len, 1, embedding_dim) | ||
self.register_buffer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hook
을 사용한 것인가요? 혹시 설명부탁드려도 될까요? (PyTorch
에 조예가 깊으신 지형님!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
모듈 안에 변수를 넣는 방식인 것 같습니다.
https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.register_buffer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ssaru 오 아예 내부에 저런식으로 넣는군요 이해했사옵니다.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aisolab 이미 @ssaru 님이 잘 설명해주신 것 같은데요, 보통 파이토치로 모델을 짜면 학습하는 부분에 대해서만 모듈이 저장되는데요, positional encoding 의 경우 학습하는 부분은 아니지만 transformer 모델 구조에서 중요한 부분을 차지하고 있어서 buffer로 넣어주었습니당
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
전체적으로 훑고 코멘트 남겼습니다. LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
재밌게 잘봤습니다.
코드에 대한 고민을 많이 하신듯 보입니다.
추후 테스트코드 짜는 것을 잊지 않기 위한 이슈: #10 |
Pull Request
레파지토리에 기여해주셔서 감사드립니다.
해당 PR을 제출하기 전에 아래 사항이 완료되었는지 확인 부탁드립니다:
1. 해당 PR은 어떤 내용인가요?
Transformer encoder
2. PR과 관련된 이슈가 있나요?
#1