Skip to content

Improving and extending existing applications in long document summarization by using GPT-3 for zero-shot summarization

License

Notifications You must be signed in to change notification settings

manasi-ganti/long-doc-summarization-gpt

Repository files navigation

long-doc-summarization-gpt

Improving and extending existing applications in long document summarization by using GPT-3 for zero-shot summarization

Long document summarization is a critical but underdeveloped task in the field of natural language processing. Generating accurate and concise summaries from input documents of varying length contributes to the efficient comprehension of key information. This is especially useful in helping users avoid the high time and effort investment of reading extensive documents, a functionality that is demanded by the increasingly high amount and availability of long text data.

There are two main types of summarization techniques: extractive and abstractive summarization. Extractive summarization involves identifying and excerpting important sections of a document (e.g., ). For example, Google search results return a short snippet taken directly from the document so users get a quick rundown of the result’s content before they decide to click into it. In contrast, abstractive summarization aims to integrate and aggregate key information into a summary that contains mostly novel phrases, emulating the summarization that a human would perform (e.g., ).

Pretrained Transformer models have shown remarkable success in document summarization, with recent developments for handling long text sequences. However, they are computationally expensive. This is due to the high computational and memory costs of training models ot handle such sequences, as well as the difficulty of obtaining and evaluating on sufficient long sequence data. The existing techniques also usually require significant amounts of pre-training specifically for document summarization so that they can be effective for summarization. For example, PEGASUS leverages gap sequence pre-training in order to develop its model.

More recently, advances with large language models such as OpenAI’s GPT-3, which has incredible [something] power suggest the possibility of using such models to tackle summarization. Large language models (like [smth] <>, GPT-3 <>) have been used for many NLP tasks without any training. [This is referred to as zero-shot inference.] This allows them to extend to out-of-distribution data and remain robust to changes in the input data.

In this paper, we study if GPT-3 can be effectively used to abstractively summarize long documents, avoiding the costly pre-training and fine-tuning necessary with the existing state-of-the-art, and evaluate their performance and potential compared to traditional methods. We compare long document summarization on Pubmed and Arxiv datasets. We use the standard metric for document summarization ROUGE, while also examining its effectiveness for abstractive summarization, and comparing results to a different benchmark BERTScore.

About

Improving and extending existing applications in long document summarization by using GPT-3 for zero-shot summarization

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published