This paper presents our system for the LIC-2021 multi-format Information Extraction (IE) task. The task aims to evaluate information extraction from various dimensions, including multiple slots relation extraction and event extraction at both sentence-level and document-level. To address the challenges in this competition, we employ different methods. For the relation extraction subtask, we tackle the issue of multiple-O-values schema by using a schema disintegration method. This helps in converting the subtask into a traditional triple extraction task. Additionally, we design a voting-based method that maximizes the utilization of existing models. For the sentence-level event extraction subtask, we convert it into a Named Entity Recognition (NER) task. We utilize a pointer labeling based approach for efficient event extraction. Furthermore, recognizing that annotated trigger information can aid in event extraction, we develop an auxiliary trigger recognition model. We integrate trigger features into the event extraction model using multi-task learning mechanism. In order to handle document-level event extraction subtask, we propose an Encoder-Decoder based method with a Transformer-alike decoder architecture. Our system achieves promising results and ranks No.4 on the test set leaderboard of this multi-format IE task with F1 scores obtained for relation extraction, sentence-level event extractions and document level event extractions being 79.887%, 85.179% and 70.828% respectively. However there is still room for improvement in our system as many triples are not annotated which negatively impacts performance and processing long text remains challenging in document level event extraction subtask along with extracting two arguments of one event correctly when they are far apart in either a sentence or a document being an area that requires further study .In conclusion , our system demonstrates effectiveness in addressing various challenges posed by the LIC 2021 multi format IE task and achieves competitive performance while there are opportunities for further exploration and improvement in future research .This work is supported by National Key R&D Program of China (No .2018YFC0830701), National Natural Science Foundation of China (No .61572120), Fundamental Research Funds for Central Universities (No .N181602013 & N171602003), Ten Thousand Talent Program (No .ZX20200035) & Liaoning Distinguished Professor (No .XLYC1902057).
- - System for LIC-2021 multi-format Information Extraction (IE) task
- - Evaluation of information extraction from multiple dimensions
- - Multiple slots relation extraction
- - Event extraction at sentence-level and document-level
- - Methods employed to address challenges in the competition
- - Schema disintegration method for relation extraction subtask
- - Voting-based method for maximizing model utilization
- - Conversion of sentence-level event extraction into Named Entity Recognition (NER) task
- - Pointer labeling based approach for efficient event extraction
- - Auxiliary trigger recognition model for aiding event extraction
- - Integration of trigger features using multi-task learning mechanism
- - Encoder-Decoder based method with Transformer-alike decoder architecture for document-level event extraction subtask
- - Achieved results and rankings on test set leaderboard:
- - Relation extraction: F1 score of 79.887%
- - Sentence-level event extractions: F1 score of 85.179%
- - Document level event extractions: F1 score of 70.828%
- - Room for improvement in the system:
- - Unannotated triples negatively impacting performance in relation extraction
- - Challenges in processing long text in document-level event extraction subtask
- - Correctly extracting two arguments of one event when they are far apart in a sentence or document requires further study.
- - Funding support from various sources.
The key points are about a competition called LIC-2021 where people tried to extract information from different types of text. They used different methods to solve the challenges in the competition, like breaking down relationships between things and using voting to make decisions. They also found ways to recognize important words and events in sentences and documents. The results showed how well their system worked, but there is still room for improvement, especially when dealing with long texts. The project was supported by funding from different sources.
Definitions- System: A way of doing things or a set of rules or tools that help accomplish a task.
- Evaluation: The process of judging or assessing something.
- Extraction: Taking out or getting information from something.
- Dimensions: Different aspects or parts of something.
- Methods: Ways or techniques used to do something.
- Challenges: Difficulties or problems that need to be overcome.
- Schema: A plan or structure for organizing information.
- Subtask: A smaller part of a bigger task.
- Voting-based method: Making decisions by counting votes from different options.
- Conversion: Changing one thing into another thing.
- Named Entity Recognition (NER): Identifying and classifying specific words in text, like names of people or places.
- Pointer labeling based approach: Using labels to point out important things in text.
- Auxiliary trigger recognition model: A tool that helps identify important events in text.
- Multi-task learning mechanism: A way of learning multiple things at the same time using one method
Exploring the Challenges of Multi-Format Information Extraction with a System for LIC-2021
Information extraction (IE) is an important task in natural language processing (NLP). It involves extracting structured information from unstructured text. The 2021 Language Intelligence Challenge (LIC) introduced a multi-format IE task to evaluate information extraction from various dimensions, including multiple slots relation extraction and event extraction at both sentence-level and document-level. In this article, we will discuss our system developed to address these challenges as well as its results on the test set leaderboard of this multi-format IE task.
Relation Extraction Subtask
The relation extraction subtask requires extracting relations between entities in the form of triples. To tackle the issue of multiple O values schema, we employed a schema disintegration method which converts it into a traditional triple extraction task. Additionally, we designed a voting based method that maximizes utilization of existing models.
Sentence Level Event Extraction Subtask
We converted this subtask into a Named Entity Recognition (NER) task by using pointer labeling based approach for efficient event extraction. We also developed an auxiliary trigger recognition model to recognize annotated trigger information which can aid in event extraction and integrated trigger features into the event extraction model using multi-task learning mechanism.
Document Level Event Extraction Subtask
To handle document level event extractions subtask, we proposed an Encoder Decoder based method with Transformer alike decoder architecture. This helps in recognizing events across different sentences or documents more effectively than traditional methods like rule based systems or bag of words approaches used earlier for such tasks .
Results and Conclusion
Our system achieved promising results and ranked No 4 on the test set leaderboard of this multi format IE task with F1 scores obtained for relation extraction ,sentence level event extractions and document level event extractions being 79 .887%, 85 .179% & 70 .828% respectively .However there is still room for improvement in our system as many triples are not annotated which negatively impacts performance & processing long text remains challenging in document level event extractions along with extracting two arguments correctly when they are far apart either within sentence or within documents being an area that requires further study .In conclusion ,our system demonstrates effectiveness in addressing various challenges posed by LIC 2021 multi format IE task & achieves competitive performance while there are opportunities for further exploration & improvement through future research .This work was supported by National Key R&D Program of China (No 2018YFC0830701), National Natural Science Foundation Of China (No 61572120), Fundamental Research Funds For Central Universities(No N181602013 & N171602003), Ten Thousand Talent Program(No ZX20200035) & Liaoning Distinguished Professor(No XLYC1902057).