The past decade has seen a significant increase in the usage and amount of information available on the internet, leading to the need for automated answering systems that can extract useful information from redundant and transitional knowledge sources. These systems rely heavily on the field of Question-Answering (QA) to provide users with relevant answers using Natural Language Understanding (NLU). QA involves several steps, including mapping user questions to pertinent queries, retrieving relevant information, and finding the best suitable answer from retrieved data. Recent improvements in deep learning models have shown compelling performance improvement in all these tasks. This review work analyzes research directions in QA based on question type, answer type, source of evidence-answer, and modeling approach. The paper also highlights open challenges such as automatic question generation, similarity detection, and low resource availability for language processing. The authors present a survey of available datasets and evaluation measures. State-of-the-art models on QA datasets are evaluated based on their performance metrics such as F1 score and EM score. The authors discuss various approaches used by researchers such as pre-training BERT or GPT models without feature extraction or using Bidirectional Long Short Term Memory (BLSTM) networks with embedding layers for end-to-end training. Hybrid Question Answering is another area where multiple semantic clues are required to constrain the answer set for complex questions. Researchers have proposed various methods like generating multiple query graphs for a given question or dividing the model into two parts: question interpretation and answer inference. Based on storage where we look for an answer, we can divide QA further into Raw Text-Based QA and Knowledge-Based QA (KBQA). While humans can easily detect an answer paragraph or sentence from a given passage, machines struggle with this task due to reasoning disparities between humans and machines. Overall, while deep learning techniques are gaining popularity for resource-rich languages in QA research; it is far off for low-resource languages where rule-based or machine learning approaches still prevail.
- - The internet has led to an increase in available information, requiring automated answering systems
- - Question-Answering (QA) is used to provide relevant answers using Natural Language Understanding (NLU)
- - QA involves mapping user questions, retrieving relevant information, and finding the best answer
- - Deep learning models have shown significant improvements in QA tasks
- - Open challenges include automatic question generation, similarity detection, and low resource availability for language processing
- - State-of-the-art models on QA datasets are evaluated based on performance metrics such as F1 score and EM score
- - Approaches used by researchers include pre-training BERT or GPT models without feature extraction or using Bidirectional Long Short Term Memory (BLSTM) networks with embedding layers for end-to-end training
- - Hybrid Question Answering requires multiple semantic clues to constrain the answer set for complex questions
- - QA can be divided into Raw Text-Based QA and Knowledge-Based QA (KBQA)
- - Deep learning techniques are gaining popularity for resource-rich languages in QA research but still far off for low-resource languages where rule-based or machine learning approaches prevail.
The internet has lots of information, so computers help answer questions. This is called Question-Answering (QA). QA helps find the best answer by looking at what people ask and finding relevant information. People are using deep learning to make QA even better. There are still some challenges like making questions and understanding different languages. Researchers use different methods to improve QA, like training models or using clues to help answer hard questions. There are two types of QA: Raw Text-Based QA and Knowledge-Based QA (KBQA). Some languages have more resources for deep learning than others."
Definitions- Internet: a network that connects computers all over the world
- Automated answering systems: computer programs that can answer questions without human help
- Question-Answering (QA): a way for computers to find answers to questions people ask
- Natural Language Understanding (NLU): the ability of computers to understand human language
- Deep learning models: computer programs that learn by themselves and get better with practice
- F1 score and EM score: ways to measure how well a computer program can answer questions
- Bidirectional Long Short Term Memory (BLSTM) networks: a type of deep learning model used in QA research
- Hybrid Question Answering: using multiple clues to help find answers to hard questions
- Resource-rich languages: languages with lots of information available for deep learning
- Low-resource languages: languages with less information available for deep learning
Question Answering: A Comprehensive Review of Recent Research
The past decade has seen a rapid increase in the amount of information available on the internet, leading to the need for automated answering systems that can extract useful information from redundant and transitional knowledge sources. These systems rely heavily on Question-Answering (QA) technology to provide users with relevant answers using Natural Language Understanding (NLU). QA involves several steps, including mapping user questions to pertinent queries, retrieving relevant information, and finding the best suitable answer from retrieved data. In recent years, deep learning models have shown promising performance improvement in all these tasks. This review paper analyzes research directions in QA based on question type, answer type, source of evidence-answer and modeling approach. The authors also highlight open challenges such as automatic question generation, similarity detection and low resource availability for language processing.
Question Types
Questions can be divided into two main categories: factoid questions which require a single word or phrase as an answer; and non-factoid questions which require more detailed answers such as explanations or descriptions. Factoid questions are further divided into closed-ended questions where there is only one correct answer; and open-ended questions which may have multiple possible answers depending on context or interpretation. Non-factoid questions can be further classified into descriptive (e.g., “What is a black hole?”), comparative (e.g., “How does a black hole differ from a neutron star?”), analytical (e.g., “What factors contribute to gravitational lensing around black holes?”) and subjective (e.g., “Do you think black holes are fascinating?”) types of queries.
Answer Types
Answers can be divided into three main categories: factual answers which contain facts about entities; procedural answers which explain how something works; and opinionated answers which express personal opinions or beliefs about topics or issues related to the query topic at hand. Factual answers are further subdivided into entity specific facts (ESF) such as names or dates; attribute specific facts (ASF) such as colors or sizes; relation specific facts (RSF) such as relationships between entities; definition specific facts (DSF); comparison specific facts (CSF); numerical values etc.; while procedural answers include instructions for completing tasks like cooking recipes etc.; opinionated answers include sentiment analysis results regarding topics under discussion etc..
Source of Evidence - Answer
The source of evidence used by QA systems is dependent upon both question type and answer type being sought after by users during their search process . For example , if users are looking for factual responses then they would typically look towards structured databases like Freebase , DBpedia , Wikidata etc . On the other hand , if they are seeking out procedural responses then they might look towards unstructured text sources like Wikipedia articles , blogs etc . Similarly , if they are searching for opinionated responses then social media platforms could serve as potential sources of evidence .
Modeling Approach
Recent advancements in deep learning techniques have enabled researchers to develop end -to -end models that use embedding layers along with Bidirectional Long Short Term Memory networks(BLSTM )for training purposes without any feature extraction required beforehand . Pre -training BERT/GPT models has also been explored by some researchers due to its ability to capture contextual features within text documents . Hybrid Question Answering is another area where multiple semantic clues must be taken into account when constraining an answer set for complex queries ; this requires generating multiple query graphs based off given input sentences along with dividing models into two parts : question interpretation & ; answer inference respectively .
Raw Text Based vs Knowledge Based QA
Depending upon storage location where we look for an answer , we can divide QA further into Raw Text Based QA & ; Knowledge Based QA(KBQA ). While humans find it easy enough detecting an appropriate paragraph/sentence from given passage containing desired response ; machines struggle with this task due to reasoning disparities between humans & ; machines . As far as current state -of -the art methods go , deep learning techniques remain popular choice among researchers working on resource rich languages but rule based /machine learning approaches still prevail when dealing with low resource languages due to lack of sufficient data sets available currently .
Conclusion
In conclusion , this review paper provides comprehensive overview regarding recent developments made in field of Question Answering over past few years highlighting various research directions along with datasets & ; evaluation measures used by researchers while evaluating their proposed solutions against existing ones already present in literature domain today . Deep Learning techniques continue gaining popularity among researchers working on resource rich languages whereas Rule Based /Machine Learning approaches still dominate scene when dealing with low resource languages due largely due insufficient data sets available currently making it difficult developing accurate solutions using DL methods alone at present time frame