Regular Expressions

Regular expressions are an important tool for defining and testing patterns, making them useful in a range of policy use cases. Regular expressions enable specifying and enforcing rules on text data, such as validating input formats or extracting relevant substrings for further processing.

Rego's regular expression functions use the RE2¹ standard, known for its safety and performance features. RE2 avoids slow performance in common cases making it good for use in performance sensitive environments like policy evaluation.

Here is a simple rule based on a regular expression:

email_valid := regex.match(`^[^@]+@[^@]+\.[^@]+$`, "name@example.com")

In this example, the email_valid will be true as the email matches the pattern. Also note that the pattern is defined as a raw string, which is a common practice as it avoids the need to escape special characters².

While regular expressions are useful in many policies, it's important to consider performance and readability. For simple string operations, such as checking for a substring or performing exact matches, Rego's built-in string matching functions can be faster and easier to read by non-developers.

tip

Check out regex101.com and use the RE2 syntax to test your Rego patterns in a visual way.

Function	Description	Meta
`regex.find_all_string_submatch_n`	`output := regex.find_all_string_submatch_n(pattern, value, number)` Returns all successive matches of the expression. Arguments: `pattern` (string) regular expression `value` (string) string to match `number` (number) number of matches to return; `-1` means all matches Returns: `output` (array[array[string]]) array of all matches	Wasm
`regex.find_n`	`output := regex.find_n(pattern, value, number)` Returns the specified number of matches when matching the input against the pattern. Arguments: `pattern` (string) regular expression `value` (string) string to match `number` (number) number of matches to return, if `-1`, returns all matches Returns: `output` (array[string]) collected matches	SDK-dependent
`regex.globs_match`	`result := regex.globs_match(glob1, glob2)` Checks if the intersection of two glob-style regular expressions matches a non-empty set of non-empty strings. The set of regex symbols is limited for this builtin: only `.`, ``, `+`, `[`, `-`, `]` and `\` are treated as special symbols. Arguments:* `glob1` (string) first glob-style regular expression `glob2` (string) second glob-style regular expression Returns: `result` (boolean) true if the intersection of `glob1` and `glob2` matches a non-empty set of non-empty strings	SDK-dependent
`regex.is_valid`	`result := regex.is_valid(pattern)` Checks if a string is a valid regular expression: the detailed syntax for patterns is defined by https://github.com/google/re2/wiki/Syntax. Arguments: `pattern` (string) regular expression Returns: `result` (boolean) true if `pattern` is a valid regular expression	v0.23.0 Wasm
`regex.match`	`result := regex.match(pattern, value)` Matches a string against a regular expression. Arguments: `pattern` (string) regular expression `value` (string) value to match against `pattern` Returns: `result` (boolean) true if `value` matches `pattern`	v0.23.0 Wasm
`regex.replace`	`output := regex.replace(s, pattern, value)` Find and replaces the text using the regular expression pattern. Arguments: `s` (string) string being processed `pattern` (string) regex pattern to be applied `value` (string) regex value Returns: `output` (string) string with replaced substrings	v0.45.0 SDK-dependent
`regex.split`	`output := regex.split(pattern, value)` Splits the input string by the occurrences of the given pattern. Arguments: `pattern` (string) regular expression `value` (string) string to match Returns: `output` (array[string]) the parts obtained by splitting `value`	SDK-dependent
`regex.template_match`	`result := regex.template_match(template, value, delimiter_start, delimiter_end)` Matches a string against a pattern, where there pattern may be glob-like Arguments: `template` (string) template expression containing `0..n` regular expressions `value` (string) string to match `delimiter_start` (string) start delimiter of the regular expression in `template` `delimiter_end` (string) end delimiter of the regular expression in `template` Returns: `result` (boolean) true if `value` matches the `template`	SDK-dependent

Examples

`match`

regex.match() is a commonly used built-in function that checks if a string matches a given regular expression pattern. The function returns true if the string matches the pattern and false otherwise.

Some examples of policy use cases where regex.match() might be used include:

Validating formats, such as ensuring an email address follows a specific pattern or checking if a credit card number matches common formats.
Matching HTTP paths to specific patterns for routing or access control purposes.

tip

Check out regex101.com and use the RE2 syntax to test your Rego patterns in a visual way.

Pattern email validation

Validating emails with Regular Expressions is a common policy task. Email validation is more complicated than just checking an email matches a pattern, but since a Rego policy is often a first point of contact, doing a pattern based test on emails is still a good idea as it can help surface issues to users early if they make a mistake.

regex.match is the best way to validate emails in Rego.

policy.rego
package play

import rego.v1

example_email_1 := "foo [at] example.com"

example_email_2 := "foo@example.com"

match_1 := regex.match(`^[^@]+@[^@]+\.[^@]+$`, example_email_1)

match_2 := regex.match(`^[^@]+@[^@]+\.[^@]+$`, example_email_2)

match_3 := regex.match(`^[^@]+@[^@]+\.[^@]+$`, input.email)

input.json
{
  "email": "hello at example.com"
}

data.json
{}

Open in OPA Playground

Path-based access

Managing access control in web applications is crucial for security. The following example uses Rego's regex.match to define role-based access to different URL paths. By associating URL patterns with user roles like "intern" and "admin," it ensures that users only access authorized paths.

policy.rego
package play

import rego.v1

news_pattern := `^/news/.*`

admin_pattern := `^/admin/.*`

path_patterns := {
	"intern": {news_pattern},
	"admin": {news_pattern, admin_pattern},
}

default allow := false

allow if {
	some pattern in path_patterns[input.role]
	regex.match(pattern, input.path)
}

input.json
{
  "role": "intern",
  "path": "/admin/staff/123/salary"
}

data.json
{}

Open in OPA Playground

Validating user text input

Text provided by users is often unstructured and untrusted. To ensure that the data is both safe to use and error-free, regex.match() can be used to validate the data against a simple pattern.

data.json
{}

input.json
{}

policy.rego
package play

import rego.v1

name_pattern := `^(\p{L}+\s?)+\p{L}+$`

valid_name1 := regex.match(name_pattern, "Juan Pérez")

valid_name2 := regex.match(name_pattern, "张伟")

invalid_name1 := regex.match(name_pattern, "Juan ")

invalid_name2 := regex.match(name_pattern, "- 张伟")

Case insensitive matching

Sometimes data can be supplied in a variety of cases, and matches need to be the same regardless of case. One example of this when matching GitHub usernames.

This is where the (?i) modifier comes in. In the following example we can see how repos with different cases are matched.

policy.rego
package play

import rego.v1

matching_repos contains repo if {
	some repo, url in input.repos

	regex.match(`(?i)^github.com\/styrainc\/`, url)
}

input.json
{
  "repos": {
    "regal": "github.com/styrainc/regal",
    "demos": "github.com/StyraInc/opa-sdk-demos",
    "enterprise-opa": "github.com/styrainc/enterprise-opa",
    "opa": "github.com/open-policy-agent/opa"
  }
}

data.json
{}

Open in OPA Playground

tip

Here are the common modifiers for regular expressions:

Flag	Description
`i`	case-insensitive (default false)
`m`	multi-line mode: `^` and `$` match begin/end line in addition to begin/end text
`s`	let `.` match `\n` (default false)

`template_match`

regex.template_match() is an advanced function for matching inputs against complex patterns. Sometimes, an input string needs to be validated as a series of distinct components. This function allows you to offer patterns to validate specific parts of the string separately.

warning

Before continuing, make sure your use case is not solved by the simpler regex.match() or glob.match functions.

This functions are easier to use and thus less error prone for simpler use cases.

Advanced path pattern matching

In the example that follows, we have a complex path which represents an AWS ARN owned by a project with a UUID v4 identifier. The path is validated in two parts using two separate patterns, each contained to particular segments of the path.

data.json
{}

input.json
{}

policy.rego
package play

import rego.v1

uuid_v4_pattern := `[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}`

aws_arn_pattern := `arn:(aws[a-zA-Z-]*):([a-zA-Z0-9-]+):([a-zA-Z0-9-]*):([0-9]*):([a-zA-Z0-9-:/]+)`

path := "/projects/10ceef56-2b18-4cf7-895f-14d2dc45cc66/arn:aws:ec2:us-west-2:123456789012:instance/i-1234567890abcdef0"

path_pattern_template := sprintf("/projects/{%s}/{%s}", [
	uuid_v4_pattern,
	aws_arn_pattern,
])

matches := regex.template_match(path_pattern_template, path, "{", "}")

`find_all_string_submatch_n`

regex.find_all_string_submatch_n() is an advanced function for matching inputs against patterns with capture groups. This function returns a list matches, where matches are themselves lists of strings containing the full match followed by each of the submatches.

warning

Before continuing, make sure your use case is not solved by the simpler regex.match() function.

This function is easier to use and thus less error prone for simpler use cases.

Controlling Plus Addressing in Emails

In the example that follows, we show a policy that uses the regex.find_all_string_submatch_n built-in to extract the 'plus suffix', if present, from an email address.

This policy ensures that plus addresses are only permitted for use by internal users to avoid potential abuse.

policy.rego
package play

import rego.v1

internal_domain := "example.com"

allow if count(deny) == 0

deny contains "plus addressing not allowed unless internal" if {
	email_matches[1] != ""
	email_matches[2] != internal_domain
}

email_matches := regex.find_all_string_submatch_n(`^[^+@]+(\+[^@]*)?@([^@]+)$`, input.email, 1)[0]

input.json
{
  "email": "foo+test@example.com"
}

data.json
{}

Open in OPA Playground

Parsing of scopes

Here we see how regex.find_all_string_submatch_n can be used to create structured data from unstructured text. In this example, we parse a list of scopes from a string and use that to create an object we can use in policies to look up permissions.

policy.rego
package play

import rego.v1

scope_pattern := `(\w+):(\w+)`

scope_map[scope[2]] := scope[1] if {
	some scope in regex.find_all_string_submatch_n(scope_pattern, input.token.payload.scopes, -1)
}

resource := split(input.path, "/")[1]

default allow := false

allow if {
	input.method == "GET"
	scope_map[resource] in {"read", "write"}
}

input.json
{
  "path": "/users/1234567890",
  "method": "GET",
  "token": {
    "header": {
      "alg": "HS256",
      "typ": "JWT"
    },
    "payload": {
      "sub": "1234567890",
      "name": "John Doe",
      "iat": 1516239022,
      "scopes": "read:users write:posts delete:comments"
    },
    "signature": "SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c"
  }
}

data.json
{}

Open in OPA Playground

`globs_match`

regex.globs_match() is a less commonly used built-in function that checks if two patterns overlap. This can be useful when using patterns to define permissions or access control rules. The function returns true if the two patterns overlap and false otherwise.

Pattern based access

This example demonstrates using regex.globs_match in Rego to ensure actions are allowed only if the user's permissions overlap with the required permissions for the action. The user's permissions are defined by patterns, as are the permissions required by any given action.

policy.rego
package play

import rego.v1

user_roles := data.user_roles[input.user_id]

action_requirements := data.action_requirements[input.action]

permission_patterns contains pattern if {
	some role in user_roles
	some pattern in data.role_permissions[role]
}

default allow := false

allow if {
	every requirement in action_requirements {
		some pattern in permission_patterns
		regex.globs_match(pattern, requirement)
	}
}

input.json
{
  "user_id": "c2655539-8422-476d-9430-a26a4efa51b2",
  "action": "tenant.create",
  "props": {
    "name": "my-new-tenant"
  }
}

data.json
{
  "user_roles": {
    "c2655539-8422-476d-9430-a26a4efa51b2": [
      "developer"
    ]
  },
  "role_permissions": {
    "developer": [
      "dns.*",
      "compute.*"
    ]
  },
  "action_requirements": {
    "tenant.create": [
      "dns.records.create",
      "compute.containers.create",
      "compute.containers.scale.*"
    ]
  }
}

Open in OPA Playground

Read more about the RE2 syntax ↩
See non-raw Regex Regal rule. ↩

Regular Expressions

`regex.find_all_string_submatch_n`

`regex.find_n`

`regex.globs_match`

`regex.is_valid`

`regex.match`

`regex.replace`

`regex.split`

`regex.template_match`

Examples

`match`

Pattern email validation

Path-based access

Validating user text input

Case insensitive matching

`template_match`

Advanced path pattern matching

`find_all_string_submatch_n`

Controlling Plus Addressing in Emails

Parsing of scopes

`globs_match`

Pattern based access

regex.find_all_string_submatch_n​

regex.find_n​

regex.globs_match​

regex.is_valid​

regex.match​

regex.replace​

regex.split​

regex.template_match​

Examples​

match​

Pattern email validation

Path-based access

Validating user text input

Case insensitive matching

template_match​

Advanced path pattern matching

find_all_string_submatch_n​

Controlling Plus Addressing in Emails

Parsing of scopes

globs_match​

Pattern based access

Footnotes​

`regex.find_all_string_submatch_n`

`regex.find_n`

`regex.globs_match`

`regex.is_valid`

`regex.match`

`regex.replace`

`regex.split`

`regex.template_match`

Examples

`match`

`template_match`

`find_all_string_submatch_n`

`globs_match`

Footnotes