Skip to content

Commit 13d8649

Browse files
authored
Add references to paper (#15)
1 parent 1bf8aa3 commit 13d8649

File tree

1 file changed

+18
-3
lines changed

1 file changed

+18
-3
lines changed

README.md

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ Tensorflow Hub URLs will be enough.
1818
* [ConveRT finetuned on Ubuntu](#convert-finetuned-on-ubuntu)
1919
* [Keras layers](#keras-layers)
2020
* [Encoder client](#encoder-client)
21+
* [Citations](#citations)
2122
* [Development](#development)
2223

2324

@@ -31,7 +32,7 @@ Using these models requires [Tensorflow Hub](https://www.tensorflow.org/hub) and
3132
## ConveRT
3233

3334
This is the ConveRT dual-encoder model, using subword representations and lighter-weight more efficient transformer-style
34-
blocks to encode text, as described in TODO.
35+
blocks to encode text, as described in [the ConveRT paper](https://arxiv.org/abs/1911.03688).
3536
It provides powerful representations for conversational data, and can also be used as a response ranker.
3637
The model costs under $100 to train from scratch, can be quantized to under 60MB, and is competitive with larger Transformer networks on conversational tasks.
3738
We share an unquantized version of the model, facilitating fine-tuning. Please [get in touch](https://www.polyai.com/contact/) if you are interested in using the quantized ConveRT model. The Tensorflow Hub url is:
@@ -98,7 +99,7 @@ tokens = module(
9899

99100
## Multi-Context ConveRT
100101

101-
This is the multi-context ConveRT model from TODO, that uses extra contexts from the conversational history to refine the context representations. This is an unquantized version of the model. The Tensorflow Hub url is:
102+
This is the multi-context ConveRT model from [the ConveRT paper](https://arxiv.org/abs/1911.03688), that uses extra contexts from the conversational history to refine the context representations. This is an unquantized version of the model. The Tensorflow Hub url is:
102103

103104
```python
104105
module = tfhub.Module("http://models.poly-ai.com/multi_context_convert/v1/model.tar.gz")
@@ -136,7 +137,7 @@ See [`encoder_client.py`](encoder_client.py) for code that computes these featur
136137

137138
This is the multi-context ConveRT model, fine-tuned to the DSTC7 Ubuntu response ranking task. It has the exact same signatures as the extra context model, and has TFHub uri `http://models.poly-ai.com/ubuntu_convert/v1/model.tar.gz`. Note that this model requires prefixing the extra context features with `"0: "`, `"1: "`, `"2: "` etc.
138139

139-
The [`dstc7/evaluate_encoder.py`](dstc7/evaluate_encoder.py) script demonstrates using this encoder to reproduce the results from TODO.
140+
The [`dstc7/evaluate_encoder.py`](dstc7/evaluate_encoder.py) script demonstrates using this encoder to reproduce the results from [the ConveRT paper](https://arxiv.org/abs/1911.03688).
140141

141142
# Keras layers
142143

@@ -169,6 +170,20 @@ print(f"Best response: {candidate_responses[top_idx]}, score: {scores[top_idx]:.
169170

170171
Internally it implements caching, deduplication, and batching, to help speed up encoding. Note that because it does batching internally, you can pass very large lists of sentences to encode without going out of memory.
171172

173+
# Citations
174+
175+
* [ConveRT: Efficient and Accurate Conversational Representations from Transformers](https://arxiv.org/abs/1911.03688)
176+
```bibtext
177+
@article{Henderson2019convert,
178+
title={{ConveRT}: Efficient and Accurate Conversational Representations from Transformers},
179+
author={Matthew Henderson and I{\~{n}}igo Casanueva and Nikola Mrk\v{s}i\'{c} and Pei-Hao Su and Tsung-Hsien and Ivan Vuli\'{c}},
180+
journal={CoRR},
181+
volume={abs/1911.03688},
182+
year={2019},
183+
url={http://arxiv.org/abs/1911.03688},
184+
}
185+
```
186+
172187
# Development
173188

174189
Setting up an environment for development:

0 commit comments

Comments
 (0)