Linear Matching of JavaScript Regular Expressions

AI-generated keywords: JavaScript

AI-generated Key Points

  • The paper discusses complexities and vulnerabilities in modern regex languages, particularly in JavaScript applications.
  • It highlights the evolution of regex languages, leading to exponential complexity blowups and denial-of-service vulnerabilities.
  • The study explores differences in regex semantics across languages and their impact on algorithmic design and worst-case matching complexity.
  • Authors identify a subset of JavaScript's regex language that can be matched with linear time guarantees.
  • New algorithms are introduced to address incorrect, inefficient, or overly restrictive existing algorithms while maintaining linear complexity.
  • Nonbacktracking algorithms for matching lookarounds in linear time are described, including support for captureless lookbehinds and leveraging JavaScript properties for unrestricted lookaheads and lookbehinds.
  • New time and space complexity tradeoffs for regex engines are presented with practical solutions validated through a prototype implementation.
  • Some algorithms have been integrated into the V8 JavaScript implementation used in Chrome and Node.js.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Aurèle Barrière (EPFL), Clément Pit-Claudel (EPFL)

License: CC BY 4.0

Abstract: Modern regex languages have strayed far from well-understood traditional regular expressions: they include features that fundamentally transform the matching problem. In exchange for these features, modern regex engines at times suffer from exponential complexity blowups, a frequent source of denial-of-service vulnerabilities in JavaScript applications. Worse, regex semantics differ across languages, and the impact of these divergences on algorithmic design and worst-case matching complexity has seldom been investigated. This paper provides a novel perspective on JavaScript's regex semantics by identifying a larger-than-previously-understood subset of the language that can be matched with linear time guarantees. In the process, we discover several cases where state-of-the-art algorithms were either wrong (semantically incorrect), inefficient (suffering from superlinear complexity) or excessively restrictive (assuming certain features could not be matched linearly). We introduce novel algorithms to restore correctness and linear complexity. We further advance the state-of-the-art in linear regex matching by presenting the first nonbacktracking algorithms for matching lookarounds in linear time: one supporting captureless lookbehinds in any regex language, and another leveraging a JavaScript property to support unrestricted lookaheads and lookbehinds. Finally, we describe new time and space complexity tradeoffs for regex engines. All of our algorithms are practical: we validated them in a prototype implementation, and some have also been merged in the V8 JavaScript implementation used in Chrome and Node.js.

Submitted to arXiv on 29 Nov. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2311.17620v1

, , , , The paper "Linear Matching of JavaScript Regular Expressions" delves into the complexities and vulnerabilities associated with modern regex languages, particularly in JavaScript applications. It highlights how these languages have evolved from traditional regular expressions, introducing features that can lead to exponential complexity blowups and denial-of-service vulnerabilities. The study explores the differences in regex semantics across languages and their impact on algorithmic design and worst-case matching complexity. The authors provide a fresh perspective on JavaScript's regex semantics by identifying a subset of the language that can be matched with linear time guarantees. They uncover instances where existing algorithms are incorrect, inefficient, or overly restrictive, and introduce novel algorithms to address these issues while maintaining linear complexity. Additionally, the paper introduces nonbacktracking algorithms for matching lookarounds in linear time, including support for captureless lookbehinds and leveraging JavaScript properties for unrestricted lookaheads and lookbehinds. Furthermore, it describes new time and space complexity tradeoffs for regex engines, offering practical solutions validated through a prototype implementation. Some of these algorithms have been integrated into the V8 JavaScript implementation used in Chrome and Node.js. Overall, this research advances the state-of-the-art in linear regex matching and provides valuable insights into optimizing performance and security in regex processing.
Created on 14 Jun. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.