, , , ,
In an effort to address the growing impact of climate change on coastal areas, particularly in active but fragile regions, collaboration among diverse stakeholders and disciplines is essential to formulate effective environmental protection policies. To aid in this endeavor, a specialized corpus comprising 2,491 sentences from 410 scientific abstracts related to coastal areas was introduced for the purpose of Automatic Term Extraction (ATE) and Classification (ATC) tasks. Drawing inspiration from the ARDI framework, which focuses on identifying Actors, Resources, Dynamics, and Interactions within a system, domain terms and their roles in coastal systems were automatically extracted using monolingual and multilingual transformer models. The annotation process involved two students specializing in Earth Sciences, a domain expert, and a PhD student who all simultaneously annotated papers using the INCEpTION tool. Over two months, 215 abstracts were annotated with a moderate agreement level of 43%, indicating the difficulty of manual annotation. Additionally, three domain-relevant knowledge bases were utilized to pre-annotate terms in a subset of abstracts, leading to refined term boundaries. Two datasets were created for studying coastal areas by homogenizing manually annotated subsets with KB-recommended annotations. The datasets underwent manual adaptation for terminology extraction by removing relations and pronouns indicating coreferences. The CoastTerm corpus for term extraction was developed from a larger collection of papers from Scopus containing terms such as "coastal areas" or "littoral" in their titles or abstracts. The annotation process involved undergraduate Master's students specialized in Earth Sciences conducting joint entity and relation extraction tasks based on guidelines provided by PhD students and domain experts. Only sentences providing information on the functioning of coastal zones were annotated using labels such as "Actor" for stakeholders and "Resource" for goods utilized within the system. Overall, this study represents an important step towards developing a specialized Knowledge Base dedicated to coastal areas by leveraging automated term extraction methods and incorporating insights from manual annotation processes.
- - Collaboration among diverse stakeholders and disciplines is essential for formulating effective environmental protection policies in coastal areas impacted by climate change.
- - A specialized corpus of scientific abstracts related to coastal areas was used for Automatic Term Extraction (ATE) and Classification (ATC) tasks.
- - Domain terms and their roles in coastal systems were automatically extracted using monolingual and multilingual transformer models inspired by the ARDI framework.
- - An annotation process involving students specializing in Earth Sciences, a domain expert, and a PhD student was conducted to annotate abstracts with a moderate agreement level of 43% over two months.
- - Three domain-relevant knowledge bases were utilized to pre-annotate terms in a subset of abstracts, leading to refined term boundaries and the creation of datasets for studying coastal areas.
Summary- People from different backgrounds and fields need to work together to make good rules for protecting the environment near the ocean from climate change.
- Scientists used a special group of short summaries about coastal areas to find important words automatically.
- Important words about coastal systems were found using computer models that understand languages and ideas, like ARDI.
- Students studying Earth Sciences, an expert in the field, and a PhD student worked together to mark important words in the summaries with some agreement.
- Three big collections of knowledge were used to help find important words in some summaries, making it easier to study coastal areas.
Definitions- Collaboration: Working together with others
- Stakeholders: People or groups who are involved or have an interest in something
- Disciplines: Different fields of study or expertise
- Formulating: Creating or coming up with something
- Policies: Rules or guidelines that are set by organizations
- Environmental protection: Taking care of nature and keeping it safe
- Coastal areas: Land near the ocean or sea
- Climate change: The long-term change in Earth's weather patterns
- Specialized corpus: A specific collection of written information
- Automatic Term Extraction (ATE): Finding important words automatically
- Classification (ATC) tasks: Sorting things into categories based on their characteristics
- Domain terms: Words related to a specific subject area
Monolingual and multilingual transformer models: Computer programs that can understand one language or many languages and transform data
Annotation
Introduction
Climate change has become an increasingly pressing issue, particularly in coastal areas where the impacts are most visible. In order to effectively address this problem, collaboration among diverse stakeholders and disciplines is essential to formulate effective environmental protection policies. However, one of the challenges in this process is the lack of a specialized corpus that can aid in Automatic Term Extraction (ATE) and Classification (ATC) tasks related to coastal areas.
In response to this need, a team of researchers developed a specialized corpus comprising 2,491 sentences from 410 scientific abstracts related to coastal areas. This corpus was created with the aim of identifying key terms and their roles within coastal systems using automated methods. The research paper titled "Automatic Term Extraction and Classification for Coastal Areas: A Corpus-based Approach" outlines the methodology used for creating this corpus and its potential applications.
The ARDI Framework
The development of this specialized corpus was inspired by the ARDI framework which focuses on identifying Actors, Resources, Dynamics, and Interactions within a system. This framework provides a comprehensive approach towards understanding complex systems such as coastal areas.
In order to apply this framework to coastal systems, domain-specific terms were automatically extracted using monolingual and multilingual transformer models. These models were trained on large datasets containing information about various domains including Earth Sciences.
The Annotation Process
To ensure accuracy and reliability of the annotations in the corpus, two students specializing in Earth Sciences along with a domain expert and PhD student conducted manual annotation using the INCEpTION tool. Over two months, 215 abstracts were annotated with a moderate agreement level of 43%, indicating the difficulty of manual annotation in complex systems like coastal areas.
Additionally, three domain-relevant knowledge bases were utilized to pre-annotate terms in a subset of abstracts which led to refined term boundaries. This helped improve the overall quality of the annotations in the corpus.
The CoastTerm Corpus
The resulting corpus, named "CoastTerm", was developed from a larger collection of papers from Scopus containing terms such as "coastal areas" or "littoral" in their titles or abstracts. This ensured that the corpus was focused on coastal systems and contained relevant information for further analysis.
The annotation process involved undergraduate Master's students specialized in Earth Sciences conducting joint entity and relation extraction tasks based on guidelines provided by PhD students and domain experts. Only sentences providing information on the functioning of coastal zones were annotated using labels such as "Actor" for stakeholders and "Resource" for goods utilized within the system.
Applications of CoastTerm Corpus
The development of this specialized corpus has several potential applications in understanding coastal systems and formulating effective policies to mitigate climate change impacts. The CoastTerm corpus can be used for automatic term extraction, classification, and knowledge base creation related to coastal areas. It can also serve as a valuable resource for researchers studying these regions.
Furthermore, by leveraging automated term extraction methods and incorporating insights from manual annotation processes, this study represents an important step towards developing a specialized Knowledge Base dedicated to coastal areas. Such a knowledge base would aid in better understanding the dynamics of these complex systems and inform decision-making processes related to environmental protection policies.
Conclusion
In conclusion, the research paper titled "Automatic Term Extraction and Classification for Coastal Areas: A Corpus-based Approach" presents a detailed methodology for creating a specialized corpus focused on coastal systems. The use of automated term extraction methods along with manual annotation processes has resulted in a high-quality dataset that can have significant implications in understanding these fragile regions.
This study highlights the importance of collaboration among diverse stakeholders and disciplines when addressing complex issues like climate change impacts on coastal areas. With further developments, the CoastTerm corpus has immense potential to contribute towards effective environmental protection policies and better management of these vulnerable regions.