KLUE: Korean Language Understanding Evaluation

AI-generated keywords: KLUE benchmark NLU tasks NER task pretrained language models annotation protocols

AI-generated Key Points

The paper introduces the Korean Language Understanding Evaluation (KLUE) benchmark, consisting of 8 NLU tasks in Korean.
The tasks include Topic Classification, Semantic Textual Similarity, Natural Language Inference, Named Entity Recognition (NER), Relation Extraction, Dependency Parsing, Machine Reading Comprehension, and Dialogue State Tracking.
The authors built these tasks from scratch using diverse source corpora while respecting copyrights.
Annotation protocols were designed with ethical considerations in mind.
Suitable evaluation metrics and fine-tuning recipes for pretrained language models are provided for each task.
Two pretrained language models (KLUE-BERT and KLUE-RoBERTa) are released to reproduce baseline models on KLUE and facilitate future research.
Preliminary experiments show that KLUE-RoBERTa-large outperforms other baselines and existing open-source Korean PLMs.
Performance is minimally affected when personally identifiable information is replaced from the pretraining corpus, suggesting privacy and NLU capability are not at odds with each other.
BPE tokenization combined with morpheme-level pre-tokenization is effective in tasks involving morpheme-level tagging detection and generation.
Comprehensive documentation on creating KLUE is provided to accelerate Korean NLP research and facilitate similar resources for other languages in the future.
Section 2 discusses source corpora selection criteria; Section 3 presents detailed information about each task; Section 4 focuses on the Named Entity Recognition (NER) task.

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Sungjoon Park, Jihyung Moon, Sungdong Kim, Won Ik Cho, Jiyoon Han, Jangwon Park, Chisung Song, Junseong Kim, Yongsook Song, Taehwan Oh, Joohong Lee, Juhyun Oh, Sungwon Lyu, Younghoon Jeong, Inkwon Lee, Sangwoo Seo, Dongjun Lee, Hyunwoo Kim, Myeonghwa Lee, Seongbo Jang, Seungwon Do, Sunkyoung Kim, Kyungtae Lim, Jongwon Lee, Kyumin Park, Jamin Shin, Seonghyun Kim, Lucy Park, Alice Oh, Jungwoo Ha, Kyunghyun Cho Alice Oh Jungwoo Ha Kyunghyun Cho

arXiv: 2105.09680v1 - DOI (cs.CL)

76 pages, 10 figures, 36 tables

License: CC BY-SA 4.0

Abstract: We introduce Korean Language Understanding Evaluation (KLUE) benchmark. KLUE is a collection of 8 Korean natural language understanding (NLU) tasks, including Topic Classification, Semantic Textual Similarity, Natural Language Inference, Named Entity Recognition, Relation Extraction, Dependency Parsing, Machine Reading Comprehension, and Dialogue State Tracking. We build all of the tasks from scratch from diverse source corpora while respecting copyrights, to ensure accessibility for anyone without any restrictions. With ethical considerations in mind, we carefully design annotation protocols. Along with the benchmark tasks and data, we provide suitable evaluation metrics and fine-tuning recipes for pretrained language models for each task. We furthermore release the pretrained language models (PLM), KLUE-BERT and KLUE-RoBERTa, to help reproduce baseline models on KLUE and thereby facilitate future research. We make a few interesting observations from the preliminary experiments using the proposed KLUE benchmark suite, already demonstrating the usefulness of this new benchmark suite. First, we find KLUE-RoBERTa-large outperforms other baselines, including multilingual PLMs and existing open-source Korean PLMs. Second, we see minimal degradation in performance even when we replace personally identifiable information from the pretraining corpus, suggesting that privacy and NLU capability are not at odds with each other. Lastly, we find that using BPE tokenization in combination with morpheme-level pre-tokenization is effective in tasks involving morpheme-level tagging, detection and generation. In addition to accelerating Korean NLP research, our comprehensive documentation on creating KLUE will facilitate creating similar resources for other languages in the future. KLUE is available at this https URL (https://klue-benchmark.com/).

Submitted to arXiv on 20 May. 2021

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2105.09680v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

The paper introduces the Korean Language Understanding Evaluation (KLUE) benchmark, which consists of 8 natural language understanding (NLU) tasks in Korean. These tasks include Topic Classification, Semantic Textual Similarity, Natural Language Inference, Named Entity Recognition (NER), Relation Extraction, Dependency Parsing, Machine Reading Comprehension and Dialogue State Tracking. The authors built these tasks from scratch using diverse source corpora while respecting copyrights to ensure accessibility for anyone without restrictions. They also designed annotation protocols with ethical considerations in mind. In addition to the benchmark tasks and data, the authors provide suitable evaluation metrics and fine-tuning recipes for pretrained language models for each task. They also release two pretrained language models (PLMs), KLUE-BERT and KLUE-RoBERTa, to help reproduce baseline models on KLUE and facilitate future research. Preliminary experiments using the proposed KLUE benchmark suite have yielded interesting observations. First, KLUE-RoBERTa-large outperforms other baselines including multilingual PLMs and existing open-source Korean PLMs. Second, there is minimal degradation in performance even when personally identifiable information is replaced from the pretraining corpus suggesting that privacy and NLU capability are not at odds with each other. Lastly, using BPE tokenization in combination with morpheme-level pre-tokenization is effective in tasks involving morpheme-level tagging detection and generation. The paper also provides comprehensive documentation on creating KLUE to accelerate Korean NLP research and facilitate the creation of similar resources for other languages in the future. Section 2 discusses source corpora selection criteria and provides details about selected corpora; Section 3 presents detailed information about each task in the KLUE benchmark suite; Section 4 focuses on the Named Entity Recognition (NER) task including dataset construction evaluation metrics related work and conclusions. The authors use two corpora WIKITREE and NSMC to incorporate both formal and informal writing styles in the NER task. WIKITREE is a news article corpus suitable for NER due to its formal sentences with many entity types while NSMC includes colloquial reviews of movies or TV shows providing a noisy dataset that broadens the application field of NER models.

- The paper introduces the Korean Language Understanding Evaluation (KLUE) benchmark, consisting of 8 NLU tasks in Korean.
- The tasks include Topic Classification, Semantic Textual Similarity, Natural Language Inference, Named Entity Recognition (NER), Relation Extraction, Dependency Parsing, Machine Reading Comprehension, and Dialogue State Tracking.
- The authors built these tasks from scratch using diverse source corpora while respecting copyrights.
- Annotation protocols were designed with ethical considerations in mind.
- Suitable evaluation metrics and fine-tuning recipes for pretrained language models are provided for each task.
- Two pretrained language models (KLUE-BERT and KLUE-RoBERTa) are released to reproduce baseline models on KLUE and facilitate future research.
- Preliminary experiments show that KLUE-RoBERTa-large outperforms other baselines and existing open-source Korean PLMs.
- Performance is minimally affected when personally identifiable information is replaced from the pretraining corpus, suggesting privacy and NLU capability are not at odds with each other.
- BPE tokenization combined with morpheme-level pre-tokenization is effective in tasks involving morpheme-level tagging detection and generation.
- Comprehensive documentation on creating KLUE is provided to accelerate Korean NLP research and facilitate similar resources for other languages in the future.
- Section 2 discusses source corpora selection criteria; Section 3 presents detailed information about each task; Section 4 focuses on the Named Entity Recognition (NER) task.

The paper talks about a test called KLUE that helps us understand the Korean language better. It has 8 different tasks to do, like figuring out what a text is about or finding similar sentences. The people who made the test made sure to follow the rules and not copy anything without permission. They also thought about being fair and respectful when deciding how to mark the answers. They even made special computer programs to help with the tasks. The paper also tells us about some experiments they did, and how they found ways to protect people's privacy while still doing a good job understanding the language.

Introduction to the Korean Language Understanding Evaluation (KLUE) Benchmark

The Korean Language Understanding Evaluation (KLUE) benchmark is a new natural language understanding (NLU) resource for the Korean language. It consists of 8 tasks, including Topic Classification, Semantic Textual Similarity, Natural Language Inference, Named Entity Recognition (NER), Relation Extraction, Dependency Parsing, Machine Reading Comprehension and Dialogue State Tracking. The authors designed these tasks from scratch using diverse source corpora while respecting copyrights to ensure accessibility for anyone without restrictions. They also created annotation protocols with ethical considerations in mind. In addition to the benchmark tasks and data, the authors provide suitable evaluation metrics and fine-tuning recipes for pretrained language models for each task. They also release two pretrained language models (PLMs), KLUE-BERT and KLUE-RoBERTa, to help reproduce baseline models on KLUE and facilitate future research. Preliminary experiments using the proposed KLUE benchmark suite have yielded interesting observations:

KLUE-RoBERTa-large outperforms other baselines including multilingual PLMs and existing open-source Korean PLMs.
There is minimal degradation in performance even when personally identifiable information is replaced from the pretraining corpus suggesting that privacy and NLU capability are not at odds with each other.
Using BPE tokenization in combination with morpheme-level pre-tokenization is effective in tasks involving morpheme-level tagging detection and generation.

Source Corpora Selection Criteria

The authors used several criteria to select source corpora for building their NLU tasks: copyright compliance; availability of multiple domains; diversity of text styles; balance between formal/informal writing styles; high quality annotations; sufficient size of training set; low cost or free access etc. These criteria were used as guidelines during selection process ensuring that all selected corpora meet certain standards before being included into KLUE dataset.

Task Descriptions

The paper provides detailed information about each task in the KLUE benchmark suite:

Topic Classification

Semantic Textual Similarity

Natural Language Inference
This task involves determining whether one sentence entails another sentence by assigning it either an entailment label or a contradiction label.
Named Entity Recognition
This task requires identifying named entities within text such as people places organizations etc., along with their corresponding types.
Relation Extraction < This task involves extracting relationships between entities mentioned within text such as “John works at Google” where John is an entity type person while Google is an entity type organization.
< h 3 >Dependency Parsing < This task requires analyzing grammatical structure of sentences by identifying relationships among words such as subject verb object etc.

Created on 13 Sep. 2023

Assess the quality of the AI-generated content by voting
Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

64.3%
PADA: A Prompt-based Autoregressive Approach for Adaptation to Unseen Domains
cs.CL

63.3%
Training a Helpful and Harmless Assistant with Reinforcement Learning from Hu…
cs.CL

62.9%
Generate rather than Retrieve: Large Language Models are Strong Context Gener…
cs.CL

62.5%
An Empirical Survey of Data Augmentation for Limited Data Learning in NLP
cs.CL

62.3%
Exploring the Limits of Transfer Learning with Unified Model in the Cybersecu…
cs.CL

62.1%
Evaluation of BERT and ALBERT Sentence Embedding Performance on Downstream NL…
cs.CL

61.9%
We're Afraid Language Models Aren't Modeling Ambiguity
cs.CL

Navigate through even more similar papers through a
tree representation

(Beta)

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.