Regular Expressions
Regular expressions are an important tool for defining and testing patterns, making them useful in a range of policy use cases. Regular expressions enable specifying and enforcing rules on text data, such as validating input formats or extracting relevant substrings for further processing.
Rego's regular expression functions use the RE21 standard, known for its safety and performance features. RE2 avoids slow performance in common cases making it good for use in performance sensitive environments like policy evaluation.
Here is a simple rule based on a regular expression:
email_valid := regex.match(`^[^@]+@[^@]+\.[^@]+$`, "name@example.com")
In this example, the email_valid
will be true as the email matches the pattern.
Also note that the pattern is defined as a raw string, which is a common practice
as it avoids the need to escape special characters2.
While regular expressions are useful in many policies, it's important to consider performance and readability. For simple string operations, such as checking for a substring or performing exact matches, Rego's built-in string matching functions can be faster and easier to read by non-developers.
Check out regex101.com and use the RE2 syntax to test your Rego patterns in a visual way.
Function | Description | Meta |
---|---|---|
regex. |
Returns all successive matches of the expression. Arguments: Returns:pattern (string)regular expression value (string)string to match number (number)number of matches to return; output (array[array[string]])array of all matches | Wasm |
regex. |
Returns the specified number of matches when matching the input against the pattern. Arguments: Returns:pattern (string)regular expression value (string)string to match number (number)number of matches to return, if output (array[string])collected matches | SDK-dependent |
regex. |
Checks if the intersection of two glob-style regular expressions matches a non-empty set of non-empty strings.
The set of regex symbols is limited for this builtin: only Arguments: Returns:glob1 (string)first glob-style regular expression glob2 (string)second glob-style regular expression result (boolean)true if the intersection of | SDK-dependent |
regex. |
Checks if a string is a valid regular expression: the detailed syntax for patterns is defined by https://github.com/google/re2/wiki/Syntax. Arguments: Returns:pattern (string)regular expression result (boolean)true if | v0.23.0 Wasm |
regex. |
Matches a string against a regular expression. Arguments: Returns:pattern (string)regular expression value (string)value to match against result (boolean)true if | v0.23.0 Wasm |
regex. |
Find and replaces the text using the regular expression pattern. Arguments: Returns:s (string)string being processed pattern (string)regex pattern to be applied value (string)regex value output (string)string with replaced substrings | v0.45.0 SDK-dependent |
regex. |
Splits the input string by the occurrences of the given pattern. Arguments: Returns:pattern (string)regular expression value (string)string to match output (array[string])the parts obtained by splitting | SDK-dependent |
regex. |
Matches a string against a pattern, where there pattern may be glob-like Arguments: Returns:template (string)template expression containing value (string)string to match delimiter_start (string)start delimiter of the regular expression in delimiter_end (string)end delimiter of the regular expression in result (boolean)true if | SDK-dependent |
Examples
match
regex.match()
is a commonly used built-in function that checks if a string matches a
given regular expression pattern. The function returns true
if the string matches the
pattern and false
otherwise.
Some examples of policy use cases where regex.match()
might be used include:
- Validating formats, such as ensuring an email address follows a specific pattern or checking if a credit card number matches common formats.
- Matching HTTP paths to specific patterns for routing or access control purposes.
Check out regex101.com and use the RE2 syntax to test your Rego patterns in a visual way.
Pattern email validation
Validating emails with Regular Expressions is a common policy task. Email validation is more complicated than just checking an email matches a pattern, but since a Rego policy is often a first point of contact, doing a pattern based test on emails is still a good idea as it can help surface issues to users early if they make a mistake.
regex.match
is the best way to validate emails in Rego.
package play
import rego.v1
example_email_1 := "foo [at] example.com"
example_email_2 := "foo@example.com"
match_1 := regex.match(`^[^@]+@[^@]+\.[^@]+$`, example_email_1)
match_2 := regex.match(`^[^@]+@[^@]+\.[^@]+$`, example_email_2)
match_3 := regex.match(`^[^@]+@[^@]+\.[^@]+$`, input.email)
{
"email": "hello at example.com"
}
{}
Path-based access
Managing access control in web applications is crucial for security. The
following example uses Rego's regex.match
to define role-based access to
different URL paths. By associating URL patterns with user roles like "intern"
and "admin," it ensures that users only access authorized paths.
package play
import rego.v1
news_pattern := `^/news/.*`
admin_pattern := `^/admin/.*`
path_patterns := {
"intern": {news_pattern},
"admin": {news_pattern, admin_pattern},
}
default allow := false
allow if {
some pattern in path_patterns[input.role]
regex.match(pattern, input.path)
}
{
"role": "intern",
"path": "/admin/staff/123/salary"
}
{}
Validating user text input
Text provided by users is often unstructured and untrusted.
To ensure that the data is both safe to use and error-free, regex.match()
can be used to validate the data against a simple pattern.
{}
{}
package play
import rego.v1
name_pattern := `^(\p{L}+\s?)+\p{L}+$`
valid_name1 := regex.match(name_pattern, "Juan Pérez")
valid_name2 := regex.match(name_pattern, "张伟")
invalid_name1 := regex.match(name_pattern, "Juan ")
invalid_name2 := regex.match(name_pattern, "- 张伟")
Case insensitive matching
Sometimes data can be supplied in a variety of cases, and matches need to be the same regardless of case. One example of this when matching GitHub usernames.
This is where the (?i)
modifier comes in. In the following example we can see
how repos with different cases are matched.
package play
import rego.v1
matching_repos contains repo if {
some repo, url in input.repos
regex.match(`(?i)^github.com\/styrainc\/`, url)
}
{
"repos": {
"regal": "github.com/styrainc/regal",
"demos": "github.com/StyraInc/opa-sdk-demos",
"enterprise-opa": "github.com/styrainc/enterprise-opa",
"opa": "github.com/open-policy-agent/opa"
}
}
{}
Here are the common modifiers for regular expressions:
Flag | Description |
---|---|
i | case-insensitive (default false) |
m | multi-line mode: ^ and $ match begin/end line in addition to begin/end text |
s | let . match \n (default false) |
Read more here on the RE2 Wiki.
template_match
regex.template_match()
is an advanced function for matching inputs against
complex patterns. Sometimes, an input string needs to be validated as a series
of distinct components. This function allows you to offer patterns to validate
specific parts of the string separately.
Before continuing, make sure your use case is not solved by the simpler
regex.match()
or
glob.match
functions.
This functions are easier to use and thus less error prone for simpler use cases.
Advanced path pattern matching
In the example that follows, we have a complex path which represents an AWS ARN owned by a project with a UUID v4 identifier. The path is validated in two parts using two separate patterns, each contained to particular segments of the path.
{}
{}
package play
import rego.v1
uuid_v4_pattern := `[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}`
aws_arn_pattern := `arn:(aws[a-zA-Z-]*):([a-zA-Z0-9-]+):([a-zA-Z0-9-]*):([0-9]*):([a-zA-Z0-9-:/]+)`
path := "/projects/10ceef56-2b18-4cf7-895f-14d2dc45cc66/arn:aws:ec2:us-west-2:123456789012:instance/i-1234567890abcdef0"
path_pattern_template := sprintf("/projects/{%s}/{%s}", [
uuid_v4_pattern,
aws_arn_pattern,
])
matches := regex.template_match(path_pattern_template, path, "{", "}")
find_all_string_submatch_n
regex.find_all_string_submatch_n()
is an advanced function for matching inputs
against patterns with capture groups. This function returns a list matches,
where matches are themselves lists of strings containing the full match followed
by each of the submatches.
Before continuing, make sure your use case is not solved by the simpler
regex.match()
function.
This function is easier to use and thus less error prone for simpler use cases.
Controlling Plus Addressing in Emails
In the example that follows, we show a policy that uses the
regex.find_all_string_submatch_n
built-in to extract the 'plus suffix', if
present, from an email address.
This policy ensures that plus addresses are only permitted for use by internal users to avoid potential abuse.
package play
import rego.v1
internal_domain := "example.com"
allow if count(deny) == 0
deny contains "plus addressing not allowed unless internal" if {
email_matches[1] != ""
email_matches[2] != internal_domain
}
email_matches := regex.find_all_string_submatch_n(`^[^+@]+(\+[^@]*)?@([^@]+)$`, input.email, 1)[0]
{
"email": "foo+test@example.com"
}
{}
Parsing of scopes
Here we see how regex.find_all_string_submatch_n
can be used to create
structured data from unstructured text. In this example, we parse a list of
scopes from a string and use that to create an object we can use in policies to
look up permissions.
package play
import rego.v1
scope_pattern := `(\w+):(\w+)`
scope_map[scope[2]] := scope[1] if {
some scope in regex.find_all_string_submatch_n(scope_pattern, input.token.payload.scopes, -1)
}
resource := split(input.path, "/")[1]
default allow := false
allow if {
input.method == "GET"
scope_map[resource] in {"read", "write"}
}
{
"path": "/users/1234567890",
"method": "GET",
"token": {
"header": {
"alg": "HS256",
"typ": "JWT"
},
"payload": {
"sub": "1234567890",
"name": "John Doe",
"iat": 1516239022,
"scopes": "read:users write:posts delete:comments"
},
"signature": "SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c"
}
}
{}
globs_match
regex.globs_match()
is a less commonly used built-in function that checks if two patterns
overlap. This can be useful when using patterns to define permissions or access control
rules. The function returns true
if the two patterns overlap and false
otherwise.
Pattern based access
This example demonstrates using regex.globs_match
in Rego to ensure actions are
allowed only if the user's permissions overlap with the required permissions for
the action. The user's permissions are defined by patterns, as are the
permissions required by any given action.
package play
import rego.v1
user_roles := data.user_roles[input.user_id]
action_requirements := data.action_requirements[input.action]
permission_patterns contains pattern if {
some role in user_roles
some pattern in data.role_permissions[role]
}
default allow := false
allow if {
every requirement in action_requirements {
some pattern in permission_patterns
regex.globs_match(pattern, requirement)
}
}
{
"user_id": "c2655539-8422-476d-9430-a26a4efa51b2",
"action": "tenant.create",
"props": {
"name": "my-new-tenant"
}
}
{
"user_roles": {
"c2655539-8422-476d-9430-a26a4efa51b2": [
"developer"
]
},
"role_permissions": {
"developer": [
"dns.*",
"compute.*"
]
},
"action_requirements": {
"tenant.create": [
"dns.records.create",
"compute.containers.create",
"compute.containers.scale.*"
]
}
}
Footnotes
-
Read more about the RE2 syntax ↩