On Robustness of Prompt-based Semantic Parsing with Large Pre-trained Language Model: An Empirical Study on Codex

AI-generated keywords: Semantic Parsing Adversarial Training Few-Shot Language Models CODEX Lexical Diversity

AI-generated Key Points

  • Semantic parsing is a technique for creating structured representations of natural-language questions.
  • Few-shot language models trained on code outperform traditional unimodal language models in generating these representations.
  • Existing fine-tuned neural semantic parsers are vulnerable to adversarial attacks on natural-language inputs.
  • Adversarial training enhances the robustness of smaller semantic parsers but is not feasible for large language models due to resource constraints.
  • The paper presents an empirical study on the adversarial robustness of CODEX, a large prompt-based language model.
  • State-of-the-art code-language models like CODEX are susceptible to carefully crafted adversarial examples.
  • The authors propose methods for improving robustness without requiring significant amounts of labeled data or heavy computational resources.
  • Different sampling strategies and linguistic complexity affect CODEX's performance, with higher lexical diversity in few-shot examples leading to stronger robustness and standard accuracy.
  • Future research is needed on various adversarial training strategies for prompt-based semantic parsers and investigating robustness across different semantic parsing datasets.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Terry Yue Zhuo, Zhuang Li, Yujin Huang, Yuan-Fang Li, Weiqing Wang, Gholamreza Haffari, Fatemeh Shiri

Accepted at EACL2023 (main)
License: CC BY 4.0

Abstract: Semantic parsing is a technique aimed at constructing a structured representation of the meaning of a natural-language question. Recent advancements in few-shot language models trained on code have demonstrated superior performance in generating these representations compared to traditional unimodal language models, which are trained on downstream tasks. Despite these advancements, existing fine-tuned neural semantic parsers are susceptible to adversarial attacks on natural-language inputs. While it has been established that the robustness of smaller semantic parsers can be enhanced through adversarial training, this approach is not feasible for large language models in real-world scenarios, as it requires both substantial computational resources and expensive human annotation on in-domain semantic parsing data. This paper presents the first empirical study on the adversarial robustness of a large prompt-based language model of code, \codex. Our results demonstrate that the state-of-the-art (SOTA) code-language models are vulnerable to carefully crafted adversarial examples. To address this challenge, we propose methods for improving robustness without the need for significant amounts of labeled data or heavy computational resources.

Submitted to arXiv on 30 Jan. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2301.12868v1

Semantic parsing is a technique used to create a structured representation of the meaning of a natural-language question. Recent advancements in few-shot language models trained on code have shown better performance in generating these representations compared to traditional unimodal language models. However, existing fine-tuned neural semantic parsers are vulnerable to adversarial attacks on natural-language inputs. Adversarial training has been proven effective for enhancing the robustness of smaller semantic parsers but is not feasible for large language models due to resource constraints. This paper presents an empirical study on the adversarial robustness of a large prompt-based language model called CODEX. The results demonstrate that state-of-the-art code-language models are susceptible to carefully crafted adversarial examples. To address this challenge, the authors propose methods for improving robustness without requiring significant amounts of labeled data or heavy computational resources. The study also investigates how different sampling strategies and linguistic complexity affect CODEX's performance and concludes that higher lexical diversity in sampled few-shot examples leads to stronger robustness and standard accuracy of CODEX. Finally, the paper calls for future research on various adversarial training strategies for prompt-based semantic parsers and expanding investigations into the robustness when varying semantic parsing datasets.
Created on 20 Sep. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.