Regular Expression Tester
Regular expressions (regex) are patterns for matching text. This tool analyzes regex syntax, explains pattern components, and provides a test area to try matches against sample text. Supports JavaScript, PCRE, and Python regex formats.
Specifications
Common Use Cases
- Debug why a regex isn't matching expected text
- Learn regex syntax by seeing explanations
- Test patterns before adding to code
- Validate input patterns for form fields
- Extract data from structured text (logs, CSV, HTML)
Features
- Detect regex format (JavaScript /pattern/flags, PCRE #pattern#, Python r"pattern")
- Explain pattern components in plain English
- Test against sample text with match highlighting
- Single match or find all matches mode
- Display capture groups and their contents
- Named capture group display
- Show match positions and indices
- Regex execution time display
- Pattern feature badges (anchors, quantifiers, etc.)
- Pattern warnings
- Library of common patterns (email, URL, phone, etc.)
Examples
Tips
- Use online testers to debug complex patterns before production.
- Escape special characters (. ^ $ * + ? { } [ ] \ | ( )) with backslash.
- Named capture groups (?<name>...) make matches more readable.
- The g flag finds all matches; without it, only the first match is returned.
- Be careful with greedy quantifiers (* +); use non-greedy (*? +?) when appropriate.
Understanding Regular Expression
Regular expressions (regex) are a pattern matching language used across virtually every programming language, text editor, and command-line tool. A regex defines a search pattern using a combination of literal characters and metacharacters that match classes of strings. They are essential for input validation, text extraction, search-and-replace operations, and log analysis.
The core syntax includes character classes ([a-z], \\d, \\w, \\s), quantifiers (*, +, ?, {n,m}), anchors (^, $, \\b), alternation (|), and grouping with parentheses. Captured groups extract matched substrings and enable backreferences. Lookaheads (?=...) and lookbehinds (?<=...) match positions without consuming characters, enabling complex assertions about surrounding context.
JavaScript regex has its own flavor with specific flags: g (global), i (case-insensitive), m (multiline), s (dotAll), u (Unicode), and y (sticky). The u flag is increasingly important for correctly handling Unicode text, as without it, patterns may not match multi-byte characters correctly. Named capture groups (?<name>...) improve readability over numbered groups.
Common regex tasks include email validation, URL parsing, extracting data from structured text (log files, CSV), and replacing patterns in code refactoring. However, regex has limits — it cannot parse nested structures like HTML or JSON (use a proper parser), and overly complex patterns become unreadable and unmaintainable. The joke "now you have two problems" reflects real maintenance challenges with complex regex.
HTML is a context-free language with nested structures that regular expressions fundamentally cannot handle. A regex cannot correctly match opening and closing tags with arbitrary nesting depth. Simple cases may seem to work but break on edge cases involving nested tags, attributes containing >, or comments. For HTML processing, use a DOM parser such as DOMParser in the browser, cheerio for Node.js, or BeautifulSoup for Python.
The difference between .* (greedy) and .*? (lazy) quantifiers is significant for matching behavior. Greedy .* matches as many characters as possible, then backtracks if needed. Lazy .*? matches as few characters as possible. In a string like "<b>one</b><b>two</b>", the greedy pattern <b>.*</b> matches the entire string, while the lazy pattern <b>.*?</b> matches only the first "<b>one</b>". Choosing the wrong mode is one of the most common sources of unexpected regex behavior.
In JavaScript, the u flag (/pattern/u) enables Unicode-aware matching. Without it, the . metacharacter does not match characters outside the Basic Multilingual Plane (such as emoji), and \\w only matches ASCII word characters. The u flag makes quantifiers and character classes handle surrogate pairs correctly and enables Unicode property escapes like \\p{Letter}. Lookaheads (?=...) and lookbehinds (?<=...) are zero-width assertions that check for patterns at a position without including them in the match. Positive lookahead (?=...) asserts what must follow, while negative lookahead (?!...) asserts what must not follow. For example, \\d+(?= dollars) matches numbers that are followed by " dollars" but captures only the number itself. Lookbehinds work the same way but check the preceding text.