
بروزرسانی: 30 خرداد 1404
Generative Retrieval For Ranking Answers
The Microsoft researchers later conclude:
The research paper (Generative Retrieval for Conversational Question Answering) was published on GitHub by one of the research scientists.
What is proposed is a new way to rank passages from content using what they call Generative Retrieval For Conversational Question Answering, which they named GCoQA.
“Benefiting from fine-grained cross-interactions in the decoder module, GCoQA could attend to the conversation context more effectively.
Increasing the beam size didn’t help matters either, as it slowed the model down.
GCoQA heavily relies on the semantic relationship between the question and the passage identifiers for retrieving relevant passages.
An autoregressive language model predicts what the next word or phrase is.
The research paper then goes on to say that the process could be seen as a “hierarchical search.”
As sometimes happens, research papers have a way of disappearing behind a paywall, so there’s no guarantee that it will still be available in the future.
The researchers write that the next direction to take is exploring how to use it for general web search.
Generative Retrieval For Conversational Question Answering
Once those passages are retrieved, another autoregressive model generates the answer based on the retrieved passages.
Comparison With Other Methods
Visit that GitHub page to find the link to the PDF.
For example, it uses 1/10th the amount of memory resources than current models, which is a huge leap in efficiency, plus it’s faster.
They are used to identify the topic of a document and the topic of the passages contained in a section of the document.
The experiment was carried out on Wikipedia data, where the page titles and section titles can be relied upon to be descriptive.
Microsoft announced a new conversational question answering model that outperforms other methods, answering questions faster and accurately while using significantly less resources.
While GCoQA has been evaluated using three academic datasets, its effectiveness in real-world scenarios, where questions are often ambiguous and challenging to match with the identifiers, remains uncertain and requires further investigation.”
GCoQA Is A Promising New Technology
However, there are several limitations that need solving before this model can be applied.
Another limitation is that while Wikipedia is reliable about using headings in a meaningful way.
The “identifiers” are a way to encode all of that knowledge as a representation, which is mapped to the passages on the webpage and the titles.
The research paper concludes that there are two promising areas to continue studying:
Each generated identifier is assigned a language model score, enabling us to obtain a ranking list of generated identifiers based on these scores.
Generative Retrieval for Conversational Question Answering
Hierarchical, in this scenario, means ordering the results first by page topic and then by the passages within the page (using the section headings).
Ultimately, the researchers stated that the performance gains are a strong win. The limitations are something that need to be worked through.
In this implementation, they use the page title (to identify what the page is about) and section titles (to identify what a passage of the text is about).
Featured image by Shutterstock/Sundry Photography
منبع
In many ways, this new model promises to bring a profound change to conversational question answering.
This model uses autoregressive models that use “identifier strings” which in plain English are representations of passages in a document.
The researchers found that GCoQA outperformed many other commonly used methods that they compared it against.
“The generalizability of GCoQA is a legitimate concern.
But using it on webpages outside of Wikipedia could cause the model to run into a stumbling block.
“…we utilize beam search… a commonly-used technique, to generate multiple identifiers instead of just one.
It was useful for overcoming limitations (bottlenecks) in other methods.
The researchers write:
The passages that are retrieved are later put into another autoregressive model in order to generate the answers to questions.
Generative Retrieval
“(1) investigating the use of generative retrieval in more general Web search scenarios where identifiers are not directly available from titles; and (2) examining the integration of passage retrieval and answer prediction within a single, generative model in order to better understand their internal relationships.”
Value Of GCoQA
They found that GCoQA had limitations due to the use of the “beam search” technique, which limited the ability of GCoQA to recall “large-scale passages.”
The ranking identifiers could naturally correspond to a ranking list of passages.”
The researchers write:
“…it becomes more convenient and efficient to apply our method in practice.”
The value of GCoQA is that it shows how researchers are working to discover ways to use generative models to transform web search as we know it today.
For the retrieval part, the research paper says the model uses a technique called “beam search” to generate identifiers (representations of passages from the webpage) that are then ranked in order of the likelihood of being the answer.
GCoQA may not be coming soon to a search engine.
Additionally, GCoQA has lower memory consumption and higher inference efficiency in practice.”
Limitations Of GCoQA
This could be a preview of what the search engines of the relatively near future may look like.
Read the announcement and research paper abstract:
So it’s kind of like, if used in the real world, using the title element to learn what a webpage is about and the headings to understand what the sections of a webpage are about.
Many webpages on the Internet do a poor job of using their section headings to accurately denote what a passage is about (which is what SEOs and publishers are supposed to be doing).
The research paper observes: