Skip to main content

Regular Expressions

Regular expressions are an important tool for defining and testing patterns, making them useful in a range of policy use cases. Regular expressions enable specifying and enforcing rules on text data, such as validating input formats or extracting relevant substrings for further processing.

Rego's regular expression functions use the RE21 standard, known for its safety and performance features. RE2 avoids slow performance in common cases making it good for use in performance sensitive environments like policy evaluation.

Here is a simple rule based on a regular expression:

email_valid := regex.match(`^[^@]+@[^@]+\.[^@]+$`, "name@example.com")

In this example, the email_valid will be true as the email matches the pattern. Also note that the pattern is defined as a raw string, which is a common practice as it avoids the need to escape special characters2.

While regular expressions are useful in many policies, it's important to consider performance and readability. For simple string operations, such as checking for a substring or performing exact matches, Rego's built-in string matching functions can be faster and easier to read by non-developers.

tip

Check out regex101.com and use the RE2 syntax to test your Rego patterns in a visual way.

FunctionDescriptionMeta
regex.find_all_string_submatch_n

output := regex.find_all_string_submatch_n(pattern, value, number)

Returns all successive matches of the expression.

Arguments:
pattern (string)

regular expression

value (string)

string to match

number (number)

number of matches to return; -1 means all matches

Returns:
output (array[array[string]])

array of all matches

Wasm
regex.find_n

output := regex.find_n(pattern, value, number)

Returns the specified number of matches when matching the input against the pattern.

Arguments:
pattern (string)

regular expression

value (string)

string to match

number (number)

number of matches to return, if -1, returns all matches

Returns:
output (array[string])

collected matches

SDK-dependent
regex.globs_match

result := regex.globs_match(glob1, glob2)

Checks if the intersection of two glob-style regular expressions matches a non-empty set of non-empty strings. The set of regex symbols is limited for this builtin: only ., *, +, [, -, ] and \ are treated as special symbols.

Arguments:
glob1 (string)

first glob-style regular expression

glob2 (string)

second glob-style regular expression

Returns:
result (boolean)

true if the intersection of glob1 and glob2 matches a non-empty set of non-empty strings

SDK-dependent
regex.is_valid

result := regex.is_valid(pattern)

Checks if a string is a valid regular expression: the detailed syntax for patterns is defined by https://github.com/google/re2/wiki/Syntax.

Arguments:
pattern (string)

regular expression

Returns:
result (boolean)

true if pattern is a valid regular expression

v0.23.0 Wasm
regex.match

result := regex.match(pattern, value)

Matches a string against a regular expression.

Arguments:
pattern (string)

regular expression

value (string)

value to match against pattern

Returns:
result (boolean)

true if value matches pattern

v0.23.0 Wasm
regex.replace

output := regex.replace(s, pattern, value)

Find and replaces the text using the regular expression pattern.

Arguments:
s (string)

string being processed

pattern (string)

regex pattern to be applied

value (string)

regex value

Returns:
output (string)

string with replaced substrings

v0.45.0 SDK-dependent
regex.split

output := regex.split(pattern, value)

Splits the input string by the occurrences of the given pattern.

Arguments:
pattern (string)

regular expression

value (string)

string to match

Returns:
output (array[string])

the parts obtained by splitting value

SDK-dependent
regex.template_match

result := regex.template_match(template, value, delimiter_start, delimiter_end)

Matches a string against a pattern, where there pattern may be glob-like

Arguments:
template (string)

template expression containing 0..n regular expressions

value (string)

string to match

delimiter_start (string)

start delimiter of the regular expression in template

delimiter_end (string)

end delimiter of the regular expression in template

Returns:
result (boolean)

true if value matches the template

SDK-dependent

Examples

match

regex.match() is a commonly used built-in function that checks if a string matches a given regular expression pattern. The function returns true if the string matches the pattern and false otherwise.

Some examples of policy use cases where regex.match() might be used include:

  • Validating formats, such as ensuring an email address follows a specific pattern or checking if a credit card number matches common formats.
  • Matching HTTP paths to specific patterns for routing or access control purposes.
tip

Check out regex101.com and use the RE2 syntax to test your Rego patterns in a visual way.

Pattern email validation

Validating emails with Regular Expressions is a common policy task. Email validation is more complicated than just checking an email matches a pattern, but since a Rego policy is often a first point of contact, doing a pattern based test on emails is still a good idea as it can help surface issues to users early if they make a mistake.

regex.match is the best way to validate emails in Rego.

policy.rego
package play

import rego.v1

example_email_1 := "foo [at] example.com"

example_email_2 := "foo@example.com"

match_1 := regex.match(`^[^@]+@[^@]+\.[^@]+$`, example_email_1)

match_2 := regex.match(`^[^@]+@[^@]+\.[^@]+$`, example_email_2)

match_3 := regex.match(`^[^@]+@[^@]+\.[^@]+$`, input.email)
input.json
{
"email": "hello at example.com"
}
data.json
{}

Open in OPA Playground

Path-based access

Managing access control in web applications is crucial for security. The following example uses Rego's regex.match to define role-based access to different URL paths. By associating URL patterns with user roles like "intern" and "admin," it ensures that users only access authorized paths.

policy.rego
package play

import rego.v1

news_pattern := `^/news/.*`

admin_pattern := `^/admin/.*`

path_patterns := {
"intern": {news_pattern},
"admin": {news_pattern, admin_pattern},
}

default allow := false

allow if {
some pattern in path_patterns[input.role]
regex.match(pattern, input.path)
}
input.json
{
"role": "intern",
"path": "/admin/staff/123/salary"
}
data.json
{}

Open in OPA Playground

Validating user text input

Text provided by users is often unstructured and untrusted. To ensure that the data is both safe to use and error-free, regex.match() can be used to validate the data against a simple pattern.

data.json
{}
input.json
{}
policy.rego
package play

import rego.v1

name_pattern := `^(\p{L}+\s?)+\p{L}+$`

valid_name1 := regex.match(name_pattern, "Juan Pérez")

valid_name2 := regex.match(name_pattern, "张伟")

invalid_name1 := regex.match(name_pattern, "Juan ")

invalid_name2 := regex.match(name_pattern, "- 张伟")

Case insensitive matching

Sometimes data can be supplied in a variety of cases, and matches need to be the same regardless of case. One example of this when matching GitHub usernames.

This is where the (?i) modifier comes in. In the following example we can see how repos with different cases are matched.

policy.rego
package play

import rego.v1

matching_repos contains repo if {
some repo, url in input.repos

regex.match(`(?i)^github.com\/styrainc\/`, url)
}
input.json
{
"repos": {
"regal": "github.com/styrainc/regal",
"demos": "github.com/StyraInc/opa-sdk-demos",
"enterprise-opa": "github.com/styrainc/enterprise-opa",
"opa": "github.com/open-policy-agent/opa"
}
}
data.json
{}

Open in OPA Playground

tip

Here are the common modifiers for regular expressions:

FlagDescription
icase-insensitive (default false)
mmulti-line mode: ^ and $ match begin/end line in addition to begin/end text
slet . match \n (default false)

Read more here on the RE2 Wiki.

template_match

regex.template_match() is an advanced function for matching inputs against complex patterns. Sometimes, an input string needs to be validated as a series of distinct components. This function allows you to offer patterns to validate specific parts of the string separately.

warning

Before continuing, make sure your use case is not solved by the simpler regex.match() or glob.match functions.

This functions are easier to use and thus less error prone for simpler use cases.

Advanced path pattern matching

In the example that follows, we have a complex path which represents an AWS ARN owned by a project with a UUID v4 identifier. The path is validated in two parts using two separate patterns, each contained to particular segments of the path.

data.json
{}
input.json
{}
policy.rego
package play

import rego.v1

uuid_v4_pattern := `[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}`

aws_arn_pattern := `arn:(aws[a-zA-Z-]*):([a-zA-Z0-9-]+):([a-zA-Z0-9-]*):([0-9]*):([a-zA-Z0-9-:/]+)`

path := "/projects/10ceef56-2b18-4cf7-895f-14d2dc45cc66/arn:aws:ec2:us-west-2:123456789012:instance/i-1234567890abcdef0"

path_pattern_template := sprintf("/projects/{%s}/{%s}", [
uuid_v4_pattern,
aws_arn_pattern,
])

matches := regex.template_match(path_pattern_template, path, "{", "}")

find_all_string_submatch_n

regex.find_all_string_submatch_n() is an advanced function for matching inputs against patterns with capture groups. This function returns a list matches, where matches are themselves lists of strings containing the full match followed by each of the submatches.

warning

Before continuing, make sure your use case is not solved by the simpler regex.match() function.

This function is easier to use and thus less error prone for simpler use cases.

Controlling Plus Addressing in Emails

In the example that follows, we show a policy that uses the regex.find_all_string_submatch_n built-in to extract the 'plus suffix', if present, from an email address.

This policy ensures that plus addresses are only permitted for use by internal users to avoid potential abuse.

policy.rego
package play

import rego.v1

internal_domain := "example.com"

allow if count(deny) == 0

deny contains "plus addressing not allowed unless internal" if {
email_matches[1] != ""
email_matches[2] != internal_domain
}

email_matches := regex.find_all_string_submatch_n(`^[^+@]+(\+[^@]*)?@([^@]+)$`, input.email, 1)[0]
input.json
{
"email": "foo+test@example.com"
}
data.json
{}

Open in OPA Playground

Parsing of scopes

Here we see how regex.find_all_string_submatch_n can be used to create structured data from unstructured text. In this example, we parse a list of scopes from a string and use that to create an object we can use in policies to look up permissions.

policy.rego
package play

import rego.v1

scope_pattern := `(\w+):(\w+)`

scope_map[scope[2]] := scope[1] if {
some scope in regex.find_all_string_submatch_n(scope_pattern, input.token.payload.scopes, -1)
}

resource := split(input.path, "/")[1]

default allow := false

allow if {
input.method == "GET"
scope_map[resource] in {"read", "write"}
}
input.json
{
"path": "/users/1234567890",
"method": "GET",
"token": {
"header": {
"alg": "HS256",
"typ": "JWT"
},
"payload": {
"sub": "1234567890",
"name": "John Doe",
"iat": 1516239022,
"scopes": "read:users write:posts delete:comments"
},
"signature": "SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c"
}
}
data.json
{}

Open in OPA Playground

globs_match

regex.globs_match() is a less commonly used built-in function that checks if two patterns overlap. This can be useful when using patterns to define permissions or access control rules. The function returns true if the two patterns overlap and false otherwise.

Pattern based access

This example demonstrates using regex.globs_match in Rego to ensure actions are allowed only if the user's permissions overlap with the required permissions for the action. The user's permissions are defined by patterns, as are the permissions required by any given action.

policy.rego
package play

import rego.v1

user_roles := data.user_roles[input.user_id]

action_requirements := data.action_requirements[input.action]

permission_patterns contains pattern if {
some role in user_roles
some pattern in data.role_permissions[role]
}

default allow := false

allow if {
every requirement in action_requirements {
some pattern in permission_patterns
regex.globs_match(pattern, requirement)
}
}
input.json
{
"user_id": "c2655539-8422-476d-9430-a26a4efa51b2",
"action": "tenant.create",
"props": {
"name": "my-new-tenant"
}
}
data.json
{
"user_roles": {
"c2655539-8422-476d-9430-a26a4efa51b2": [
"developer"
]
},
"role_permissions": {
"developer": [
"dns.*",
"compute.*"
]
},
"action_requirements": {
"tenant.create": [
"dns.records.create",
"compute.containers.create",
"compute.containers.scale.*"
]
}
}

Open in OPA Playground

Footnotes

  1. Read more about the RE2 syntax

  2. See non-raw Regex Regal rule.