v2 / vlib / regex / pcre / README.md
172 lines · 143 sloc · 6.18 KB · 53bb04c6f0ed259b6cc322ea61f5fa6a0f579a8c
Raw

regex.pcre Module Documentation

The regex.pcre module is a high-performance Virtual Machine (VM) based regular expression engine for V.

Key Features

Supported Syntax

| Feature | Syntax | Description | | :--- | :--- | :--- | | Literals | abc | Matches exact characters (UTF-8 supported). | | Wildcard | . | Matches any character (excluding \n unless (?s) flag is used). | | Alternation | | | Matches the left OR right expression (e.g., cat|dog). | | Quantifiers | *, +, ? | Matches 0+, 1+, or 0-1 times. | | Lazy | *?, +?, ?? | Non-greedy versions of the above. | | Repetition | {m,n} | Matches between m and n times. {m,} for m or more. | | Groups | (...) | Capturing group. | | | (?:...) | Non-capturing group. | | | (?P<name>...) | Named capturing group. | | Anchors | ^, $ | Start/End of string (or line with (?m)). | | | \b, \B | Word boundary and Non-word boundary. | | Classes | [abc], [^abc] | Character set and Negated character set. | | | [a-z] | Range of characters. | | | \w, \W | Word / Non-word ([a-zA-Z0-9_]). | | | \d, \D | Digit / Non-digit. | | | \s, \S | Whitespace / Non-whitespace ( \t\n\r\v\f). | | | \a, \A | Lowercase / Uppercase ASCII character class. | | Flags | (?i) | Case-insensitive matching. | | | (?m) | Multiline mode (^ and $ match start/end of lines). | | | (?s) | Dot-all mode (. matches newlines). |


Structs

Regex

The compiled regular expression object.

pub struct Regex {
pub:
    pattern      string         // The original pattern
    prog         []Inst         // Compiled VM bytecode
    total_groups int            // Number of capture groups
    group_map    map[string]int // Map for named groups
}

Match

Represents the result of a successful search.

pub struct Match {
pub:
    text   string   // The full substring that matched
    start  int      // Byte index where match starts
    end    int      // Byte index where match ends
    groups []string // Text captured by each group
}

Core Functions

compile

Compiles a pattern into a Regex object.

fn compile(pattern string) !Regex

find

Finds the first match in the text. Returns none if no match is found.

fn (r Regex) find(text string) ?Match

find_all

Returns all non-overlapping matches in a string.

fn (r Regex) find_all(text string) []Match

replace

Replaces the first match in text with repl. Supports backreferences like $1, $2.

fn (r Regex) replace(text string, repl string) string

change_stack_depth

Updates the maximum backtracking depth for the VM. Default is 1024. Use this if your pattern is extremely complex and returns none prematurely.

fn (mut r Regex) change_stack_depth(depth int)

Named Groups Example

import regex.pcre

fn main() {
    r := pcre.compile(r'(?P<year>\d{4})-(?P<month>\d{2})')!
    m := r.find('Date: 2026-02') or { return }

    year := r.group_by_name(m, 'year')
    month := r.group_by_name(m, 'month')
    println('Year: ${year}, Month: ${month}') // Year: 2026, Month: 02
}

PCRE Compatibility Layer

To facilitate easier migration from other engines, a compatibility layer is provided:

| Function | Equivalent To | | :--- | :--- | | new_regex(pattern, flags) | compile(pattern) | | r.match_str(text, start, flags) | r.find_from(text, start) | | m.get(idx) | Retrieves match text (0) or capture group (1+). | | m.get_all() | Returns [full_match, group1, group2, ...] |

Example:

import regex.pcre

r := pcre.new_regex(r'(\w+) (\w+)', 0)!
if m := r.match_str('hello world', 0, 0) {
    println(m.get(0)?) // "hello world"
    println(m.get(1)?) // "hello"
    println(m.get(2)?) // "world"
}

Performance Note

Here is a clear summary of the optimizations implemented in the code: