-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
attention mechanism class addressing issue #5 #30
base: main
Are you sure you want to change the base?
Conversation
…-metrics-2 Summary metrics for AUC, FNR, TNR.
This is super cool :D Do you have any results you might add to this description on its AUC/quality compared to the attentionless models? |
Yep, will push up soon.
…On Mon, Nov 6, 2017, 11:47 PM iislucas ***@***.***> wrote:
This is super cool :D Do you have any results you might add to this
description on its AUC/quality compared to the attentionless models?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#30 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABuWFyLKJScFBU3fKHgn3fuGdVfMG8L3ks5sz-DCgaJpZM4QSodV>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for submitting this PR! We're excited to have you thinking about this.
I think the build_dense_attention_layer function looks good.
Now I'm trying to wrap my head around the build_model function. Because this model was built as a CNN, attention is a bit of a weird concept. In this model, if I'm reading this correctly, the CNN component takes reduces the sentence to a vector (default 128), we then add a dense layer (of size max_num_tokens) and then use another dense layer (of size max_num_tokens) to compute "attention weights" and do a weighted sum of these last two vectors? Is there some significance to the attention weights in this context?
My only familiarity is with attention in the RNN context, and there, I can see your build_dense_attention_layer being quite useful.
src/model_tool.py
Outdated
|
||
x = Flatten()(x) | ||
x = Dropout(self.hparams['dropout_rate'], name="Dropout")(x) | ||
x = Dense(250, activation='relu', name="Dense_RELU")(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded 250 is not quite right. I think this should be self.hparams['max_sequence_length'] for the attention layer to work for a general set of hyperparameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now fixed at line 444
src/model_tool.py
Outdated
attention_mul = Multiply()([input_tensor, attention_probs]) | ||
return {'attention_probs': attention_probs, 'attention_preds': attention_mul} | ||
|
||
def build_probs(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the purpose of the build_probs function is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's obsolete, you're right. This was removed in the latest push.
src/model_tool.py
Outdated
preds = attention_dict['attention_preds'] | ||
preds = Dense(2, name="preds_dense", activation='softmax')(preds) | ||
rmsprop = RMSprop(lr=self.hparams['learning_rate']) | ||
self.model = Model(sequence_input, preds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to expose the attention weights as well as the predictions so we can visualize what the model is paying attention to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now save this here.
Thanks for continuing to push changes! Let us know when you're ready for us to have another look. |
Thanks for checking in! Yes I will let you know soon, want to clean it up
and catch up with your changes!
…On Mon, Dec 4, 2017 at 2:27 PM, nthain ***@***.***> wrote:
Thanks for continuing to push changes! Let us know when you're ready for
us to have another look.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#30 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABuWF5WTtlNi3NMNgqwDW_Y0qxcBLDwbks5s9EezgaJpZM4QSodV>
.
|
@rubinovitz : You might be interested in this: https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge :) |
Also, could you remove the compiled python files from the pull request. Thanks! |
Just pushed a bunch of updates for the 1DConv, but still need to push up the LSTM, so stay tuned and will let you know when I'm done. |
Thanks for the update! We'll stay tuned. |
Created a subclass of the ToxModel that includes an attention mechanism.
Analysis forthcoming.