CoastTerm: a Corpus for Multidisciplinary Term Extraction in Coastal Scientific Literature

AI-generated keywords: Climate Change

AI-generated Key Points

  • Collaboration among diverse stakeholders and disciplines is essential for formulating effective environmental protection policies in coastal areas impacted by climate change.
  • A specialized corpus of scientific abstracts related to coastal areas was used for Automatic Term Extraction (ATE) and Classification (ATC) tasks.
  • Domain terms and their roles in coastal systems were automatically extracted using monolingual and multilingual transformer models inspired by the ARDI framework.
  • An annotation process involving students specializing in Earth Sciences, a domain expert, and a PhD student was conducted to annotate abstracts with a moderate agreement level of 43% over two months.
  • Three domain-relevant knowledge bases were utilized to pre-annotate terms in a subset of abstracts, leading to refined term boundaries and the creation of datasets for studying coastal areas.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Julien Delaunay, Hanh Thi Hong Tran, Carlos-Emiliano González-Gallardo, Georgeta Bordea, Mathilde Ducos, Nicolas Sidere, Antoine Doucet, Senja Pollak, Olivier De Viron

License: CC BY 4.0

Abstract: The growing impact of climate change on coastal areas, particularly active but fragile regions, necessitates collaboration among diverse stakeholders and disciplines to formulate effective environmental protection policies. We introduce a novel specialized corpus comprising 2,491 sentences from 410 scientific abstracts concerning coastal areas, for the Automatic Term Extraction (ATE) and Classification (ATC) tasks. Inspired by the ARDI framework, focused on the identification of Actors, Resources, Dynamics and Interactions, we automatically extract domain terms and their distinct roles in the functioning of coastal systems by leveraging monolingual and multilingual transformer models. The evaluation demonstrates consistent results, achieving an F1 score of approximately 80\% for automated term extraction and F1 of 70\% for extracting terms and their labels. These findings are promising and signify an initial step towards the development of a specialized Knowledge Base dedicated to coastal areas.

Submitted to arXiv on 13 Jun. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2406.09128v1

, , , , In an effort to address the growing impact of climate change on coastal areas, particularly in active but fragile regions, collaboration among diverse stakeholders and disciplines is essential to formulate effective environmental protection policies. To aid in this endeavor, a specialized corpus comprising 2,491 sentences from 410 scientific abstracts related to coastal areas was introduced for the purpose of Automatic Term Extraction (ATE) and Classification (ATC) tasks. Drawing inspiration from the ARDI framework, which focuses on identifying Actors, Resources, Dynamics, and Interactions within a system, domain terms and their roles in coastal systems were automatically extracted using monolingual and multilingual transformer models. The annotation process involved two students specializing in Earth Sciences, a domain expert, and a PhD student who all simultaneously annotated papers using the INCEpTION tool. Over two months, 215 abstracts were annotated with a moderate agreement level of 43%, indicating the difficulty of manual annotation. Additionally, three domain-relevant knowledge bases were utilized to pre-annotate terms in a subset of abstracts, leading to refined term boundaries. Two datasets were created for studying coastal areas by homogenizing manually annotated subsets with KB-recommended annotations. The datasets underwent manual adaptation for terminology extraction by removing relations and pronouns indicating coreferences. The CoastTerm corpus for term extraction was developed from a larger collection of papers from Scopus containing terms such as "coastal areas" or "littoral" in their titles or abstracts. The annotation process involved undergraduate Master's students specialized in Earth Sciences conducting joint entity and relation extraction tasks based on guidelines provided by PhD students and domain experts. Only sentences providing information on the functioning of coastal zones were annotated using labels such as "Actor" for stakeholders and "Resource" for goods utilized within the system. Overall, this study represents an important step towards developing a specialized Knowledge Base dedicated to coastal areas by leveraging automated term extraction methods and incorporating insights from manual annotation processes.
Created on 05 Sep. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.