KLUE: Korean Language Understanding Evaluation
AI-generated keywords:
KLUE benchmark
NLU tasks
NER task
pretrained language models
annotation protocols
- The paper introduces the Korean Language Understanding Evaluation (KLUE) benchmark, consisting of 8 NLU tasks in Korean.
- The tasks include Topic Classification, Semantic Textual Similarity, Natural Language Inference, Named Entity Recognition (NER), Relation Extraction, Dependency Parsing, Machine Reading Comprehension, and Dialogue State Tracking.
- The authors built these tasks from scratch using diverse source corpora while respecting copyrights.
- Annotation protocols were designed with ethical considerations in mind.
- Suitable evaluation metrics and fine-tuning recipes for pretrained language models are provided for each task.
- Two pretrained language models (KLUE-BERT and KLUE-RoBERTa) are released to reproduce baseline models on KLUE and facilitate future research.
- Preliminary experiments show that KLUE-RoBERTa-large outperforms other baselines and existing open-source Korean PLMs.
- Performance is minimally affected when personally identifiable information is replaced from the pretraining corpus, suggesting privacy and NLU capability are not at odds with each other.
- BPE tokenization combined with morpheme-level pre-tokenization is effective in tasks involving morpheme-level tagging detection and generation.
- Comprehensive documentation on creating KLUE is provided to accelerate Korean NLP research and facilitate similar resources for other languages in the future.
- Section 2 discusses source corpora selection criteria; Section 3 presents detailed information about each task; Section 4 focuses on the Named Entity Recognition (NER) task.
Authors:
Sungjoon Park,
Jihyung Moon,
Sungdong Kim,
Won Ik Cho,
Jiyoon Han,
Jangwon Park,
Chisung Song,
Junseong Kim,
Yongsook Song,
Taehwan Oh,
Joohong Lee,
Juhyun Oh,
Sungwon Lyu,
Younghoon Jeong,
Inkwon Lee,
Sangwoo Seo,
Dongjun Lee,
Hyunwoo Kim,
Myeonghwa Lee,
Seongbo Jang,
Seungwon Do,
Sunkyoung Kim,
Kyungtae Lim,
Jongwon Lee,
Kyumin Park,
Jamin Shin,
Seonghyun Kim,
Lucy Park,
Alice Oh,
Jungwoo Ha,
Kyunghyun Cho Alice Oh Jungwoo Ha Kyunghyun Cho
76 pages, 10 figures, 36 tables
Abstract: We introduce Korean Language Understanding Evaluation (KLUE) benchmark. KLUE is a collection of 8 Korean natural language understanding (NLU) tasks, including Topic Classification, Semantic Textual Similarity, Natural Language Inference, Named Entity Recognition, Relation Extraction, Dependency Parsing, Machine Reading Comprehension, and Dialogue State Tracking. We build all of the tasks from scratch from diverse source corpora while respecting copyrights, to ensure accessibility for anyone without any restrictions. With ethical considerations in mind, we carefully design annotation protocols. Along with the benchmark tasks and data, we provide suitable evaluation metrics and fine-tuning recipes for pretrained language models for each task. We furthermore release the pretrained language models (PLM), KLUE-BERT and KLUE-RoBERTa, to help reproduce baseline models on KLUE and thereby facilitate future research. We make a few interesting observations from the preliminary experiments using the proposed KLUE benchmark suite, already demonstrating the usefulness of this new benchmark suite. First, we find KLUE-RoBERTa-large outperforms other baselines, including multilingual PLMs and existing open-source Korean PLMs. Second, we see minimal degradation in performance even when we replace personally identifiable information from the pretraining corpus, suggesting that privacy and NLU capability are not at odds with each other. Lastly, we find that using BPE tokenization in combination with morpheme-level pre-tokenization is effective in tasks involving morpheme-level tagging, detection and generation. In addition to accelerating Korean NLP research, our comprehensive documentation on creating KLUE will facilitate creating similar resources for other languages in the future. KLUE is available at this https URL (https://klue-benchmark.com/).
Submitted to arXiv on 20 May. 2021
- Comprehensive Summary
- Key points
- Layman's Summary
- Blog article
The paper introduces the Korean Language Understanding Evaluation (KLUE) benchmark, which consists of 8 natural language understanding (NLU) tasks in Korean. These tasks include Topic Classification, Semantic Textual Similarity, Natural Language Inference, Named Entity Recognition (NER), Relation Extraction, Dependency Parsing, Machine Reading Comprehension and Dialogue State Tracking. The authors built these tasks from scratch using diverse source corpora while respecting copyrights to ensure accessibility for anyone without restrictions. They also designed annotation protocols with ethical considerations in mind. In addition to the benchmark tasks and data, the authors provide suitable evaluation metrics and fine-tuning recipes for pretrained language models for each task. They also release two pretrained language models (PLMs), KLUE-BERT and KLUE-RoBERTa, to help reproduce baseline models on KLUE and facilitate future research. Preliminary experiments using the proposed KLUE benchmark suite have yielded interesting observations. First, KLUE-RoBERTa-large outperforms other baselines including multilingual PLMs and existing open-source Korean PLMs. Second, there is minimal degradation in performance even when personally identifiable information is replaced from the pretraining corpus suggesting that privacy and NLU capability are not at odds with each other. Lastly, using BPE tokenization in combination with morpheme-level pre-tokenization is effective in tasks involving morpheme-level tagging detection and generation. The paper also provides comprehensive documentation on creating KLUE to accelerate Korean NLP research and facilitate the creation of similar resources for other languages in the future. Section 2 discusses source corpora selection criteria and provides details about selected corpora; Section 3 presents detailed information about each task in the KLUE benchmark suite; Section 4 focuses on the Named Entity Recognition (NER) task including dataset construction evaluation metrics related work and conclusions. The authors use two corpora WIKITREE and NSMC to incorporate both formal and informal writing styles in the NER task. WIKITREE is a news article corpus suitable for NER due to its formal sentences with many entity types while NSMC includes colloquial reviews of movies or TV shows providing a noisy dataset that broadens the application field of NER models.
- - The paper introduces the Korean Language Understanding Evaluation (KLUE) benchmark, consisting of 8 NLU tasks in Korean.
- - The tasks include Topic Classification, Semantic Textual Similarity, Natural Language Inference, Named Entity Recognition (NER), Relation Extraction, Dependency Parsing, Machine Reading Comprehension, and Dialogue State Tracking.
- - The authors built these tasks from scratch using diverse source corpora while respecting copyrights.
- - Annotation protocols were designed with ethical considerations in mind.
- - Suitable evaluation metrics and fine-tuning recipes for pretrained language models are provided for each task.
- - Two pretrained language models (KLUE-BERT and KLUE-RoBERTa) are released to reproduce baseline models on KLUE and facilitate future research.
- - Preliminary experiments show that KLUE-RoBERTa-large outperforms other baselines and existing open-source Korean PLMs.
- - Performance is minimally affected when personally identifiable information is replaced from the pretraining corpus, suggesting privacy and NLU capability are not at odds with each other.
- - BPE tokenization combined with morpheme-level pre-tokenization is effective in tasks involving morpheme-level tagging detection and generation.
- - Comprehensive documentation on creating KLUE is provided to accelerate Korean NLP research and facilitate similar resources for other languages in the future.
- - Section 2 discusses source corpora selection criteria; Section 3 presents detailed information about each task; Section 4 focuses on the Named Entity Recognition (NER) task.
The paper talks about a test called KLUE that helps us understand the Korean language better. It has 8 different tasks to do, like figuring out what a text is about or finding similar sentences. The people who made the test made sure to follow the rules and not copy anything without permission. They also thought about being fair and respectful when deciding how to mark the answers. They even made special computer programs to help with the tasks. The paper also tells us about some experiments they did, and how they found ways to protect people's privacy while still doing a good job understanding the language.
Introduction to the Korean Language Understanding Evaluation (KLUE) Benchmark
The Korean Language Understanding Evaluation (KLUE) benchmark is a new natural language understanding (NLU) resource for the Korean language. It consists of 8 tasks, including Topic Classification, Semantic Textual Similarity, Natural Language Inference, Named Entity Recognition (NER), Relation Extraction, Dependency Parsing, Machine Reading Comprehension and Dialogue State Tracking. The authors designed these tasks from scratch using diverse source corpora while respecting copyrights to ensure accessibility for anyone without restrictions. They also created annotation protocols with ethical considerations in mind.
In addition to the benchmark tasks and data, the authors provide suitable evaluation metrics and fine-tuning recipes for pretrained language models for each task. They also release two pretrained language models (PLMs), KLUE-BERT and KLUE-RoBERTa, to help reproduce baseline models on KLUE and facilitate future research. Preliminary experiments using the proposed KLUE benchmark suite have yielded interesting observations:
- KLUE-RoBERTa-large outperforms other baselines including multilingual PLMs and existing open-source Korean PLMs.
- There is minimal degradation in performance even when personally identifiable information is replaced from the pretraining corpus suggesting that privacy and NLU capability are not at odds with each other.
- Using BPE tokenization in combination with morpheme-level pre-tokenization is effective in tasks involving morpheme-level tagging detection and generation.
Source Corpora Selection Criteria
The authors used several criteria to select source corpora for building their NLU tasks: copyright compliance; availability of multiple domains; diversity of text styles; balance between formal/informal writing styles; high quality annotations; sufficient size of training set; low cost or free access etc. These criteria were used as guidelines during selection process ensuring that all selected corpora meet certain standards before being included into KLUE dataset.
Task Descriptions
The paper provides detailed information about each task in the KLUE benchmark suite:
Topic Classification
This task involves classifying a given document into one of several predefined topics such as sports or politics based on its content.
Semantic Textual Similarity
This task requires measuring semantic similarity between two sentences by assigning them a score ranging from 0 to 5 depending on how similar they are.
Natural Language Inference b>
This task involves determining whether one sentence entails another sentence by assigning it either an entailment label or a contradiction label.
Named Entity Recognition b>
This task requires identifying named entities within text such as people places organizations etc., along with their corresponding types.
Relation Extraction b> h 3 >< This task involves extracting relationships between entities mentioned within text such as “John works at Google” where John is an entity type person while Google is an entity type organization.
< h 3 >Dependency Parsing b> h 3 >< This task requires analyzing grammatical structure of sentences by identifying relationships among words such as subject verb object etc.