Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

attention mechanism class addressing issue #5 #30

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

rubinovitz
Copy link
Contributor

Created a subclass of the ToxModel that includes an attention mechanism.
Analysis forthcoming.

@nthain nthain self-requested a review November 6, 2017 17:15
@iislucas
Copy link
Contributor

iislucas commented Nov 7, 2017

This is super cool :D Do you have any results you might add to this description on its AUC/quality compared to the attentionless models?

@rubinovitz
Copy link
Contributor Author

rubinovitz commented Nov 7, 2017 via email

Copy link
Contributor

@nthain nthain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submitting this PR! We're excited to have you thinking about this.

I think the build_dense_attention_layer function looks good.

Now I'm trying to wrap my head around the build_model function. Because this model was built as a CNN, attention is a bit of a weird concept. In this model, if I'm reading this correctly, the CNN component takes reduces the sentence to a vector (default 128), we then add a dense layer (of size max_num_tokens) and then use another dense layer (of size max_num_tokens) to compute "attention weights" and do a weighted sum of these last two vectors? Is there some significance to the attention weights in this context?

My only familiarity is with attention in the RNN context, and there, I can see your build_dense_attention_layer being quite useful.


x = Flatten()(x)
x = Dropout(self.hparams['dropout_rate'], name="Dropout")(x)
x = Dense(250, activation='relu', name="Dense_RELU")(x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded 250 is not quite right. I think this should be self.hparams['max_sequence_length'] for the attention layer to work for a general set of hyperparameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now fixed at line 444

attention_mul = Multiply()([input_tensor, attention_probs])
return {'attention_probs': attention_probs, 'attention_preds': attention_mul}

def build_probs(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the purpose of the build_probs function is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's obsolete, you're right. This was removed in the latest push.

preds = attention_dict['attention_preds']
preds = Dense(2, name="preds_dense", activation='softmax')(preds)
rmsprop = RMSprop(lr=self.hparams['learning_rate'])
self.model = Model(sequence_input, preds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to expose the attention weights as well as the predictions so we can visualize what the model is paying attention to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now save this here.

@nthain
Copy link
Contributor

nthain commented Dec 4, 2017

Thanks for continuing to push changes! Let us know when you're ready for us to have another look.

@rubinovitz
Copy link
Contributor Author

rubinovitz commented Dec 4, 2017 via email

@iislucas
Copy link
Contributor

@rubinovitz : You might be interested in this: https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge :)

@iislucas
Copy link
Contributor

Also, could you remove the compiled python files from the pull request. Thanks!

@rubinovitz
Copy link
Contributor Author

Just pushed a bunch of updates for the 1DConv, but still need to push up the LSTM, so stay tuned and will let you know when I'm done.

@nthain
Copy link
Contributor

nthain commented Mar 7, 2018

Thanks for the update! We'll stay tuned.

Base automatically changed from master to main March 25, 2021 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants