Linear Matching of JavaScript Regular Expressions
AI-generated Key Points
- The paper discusses complexities and vulnerabilities in modern regex languages, particularly in JavaScript applications.
- It highlights the evolution of regex languages, leading to exponential complexity blowups and denial-of-service vulnerabilities.
- The study explores differences in regex semantics across languages and their impact on algorithmic design and worst-case matching complexity.
- Authors identify a subset of JavaScript's regex language that can be matched with linear time guarantees.
- New algorithms are introduced to address incorrect, inefficient, or overly restrictive existing algorithms while maintaining linear complexity.
- Nonbacktracking algorithms for matching lookarounds in linear time are described, including support for captureless lookbehinds and leveraging JavaScript properties for unrestricted lookaheads and lookbehinds.
- New time and space complexity tradeoffs for regex engines are presented with practical solutions validated through a prototype implementation.
- Some algorithms have been integrated into the V8 JavaScript implementation used in Chrome and Node.js.
Authors: Aurèle Barrière (EPFL), Clément Pit-Claudel (EPFL)
Abstract: Modern regex languages have strayed far from well-understood traditional regular expressions: they include features that fundamentally transform the matching problem. In exchange for these features, modern regex engines at times suffer from exponential complexity blowups, a frequent source of denial-of-service vulnerabilities in JavaScript applications. Worse, regex semantics differ across languages, and the impact of these divergences on algorithmic design and worst-case matching complexity has seldom been investigated. This paper provides a novel perspective on JavaScript's regex semantics by identifying a larger-than-previously-understood subset of the language that can be matched with linear time guarantees. In the process, we discover several cases where state-of-the-art algorithms were either wrong (semantically incorrect), inefficient (suffering from superlinear complexity) or excessively restrictive (assuming certain features could not be matched linearly). We introduce novel algorithms to restore correctness and linear complexity. We further advance the state-of-the-art in linear regex matching by presenting the first nonbacktracking algorithms for matching lookarounds in linear time: one supporting captureless lookbehinds in any regex language, and another leveraging a JavaScript property to support unrestricted lookaheads and lookbehinds. Finally, we describe new time and space complexity tradeoffs for regex engines. All of our algorithms are practical: we validated them in a prototype implementation, and some have also been merged in the V8 JavaScript implementation used in Chrome and Node.js.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.