1. URL patterns
1.1. Introduction
A URL pattern consists of several components, each of which represents a pattern which could be matched against the corresponding component of a URL.
It can be constructed using a string for each component, or from a shorthand string. It can optionally be resolved relative to a base URL.
The shorthand "https://example.com/:category/*
" corresponds to the following components:
- protocol
- "
https
" - username
- "
*
" - password
- "
*
" - hostname
- "
example.com
" - port
- ""
- pathname
- "
/:category/*
" - search
- "
*
" - hash
- "
*
"
It matches the following URLs:
-
https://example.com/products/
-
https://example.com/blog/our-greatest-product-ever
It does not match the following URLs:
-
https://example.com/
-
http://example.com/products/
-
https://example.com:8443/blog/our-greatest-product-ever
This is a fairly simple pattern which requires most components to either match an exact string, or allows any string ("*
"). The pathname component matches any path with at least two /
-separated path components, the first of which is captured as "category
".
The shorthand "http{s}?://{:subdomain.}?shop.example/products/:id([0-9]+)#reviews
" corresponds to the following components:
- protocol
- "
http{s}?
" - username
- "
*
" - password
- "
*
" - hostname
- "
{:subdomain.}?shop.example
" - port
- ""
- pathname
- "
/products/:id([0-9]+)
" - search
- ""
- hash
- "
reviews
"
It matches the following URLs:
-
https://shop.example/products/74205#reviews
-
https://kathryn@voyager.shop.example/products/74656#reviews
-
http://insecure.shop.example/products/1701#reviews
It does not match the following URLs:
-
https://shop.example/products/2000
-
http://shop.example:8080/products/0#reviews
-
https://nx.shop.example/products/01?speed=5#reviews
-
https://shop.example/products/chair#reviews
This is a more complicated pattern which includes:
-
optional parts marked with
?
(braces are needed to make it unambiguous exactly what is optional), and -
a regexp part named "
id
" which uses a regular expression to define what sorts of substrings match (the parentheses are required to mark it as a regular expression, and are not part of the regexp itself).
The shorthand "../admin/*
" with the base URL "https://discussion.example/forum/?page=2
" corresponds to the following components:
- protocol
- "
https
" - username
- "
*
" - password
- "
*
" - hostname
- "
discussion.example
" - port
- ""
- pathname
- "
/admin/*
" - search
- "
*
" - hash
- "
*
"
It matches the following URLs:
-
https://discussion.example/admin/
-
https://edd:librarian@discussion.example/admin/update?id=1
It does not match the following URLs:
-
https://discussion.example/forum/admin/
-
http://discussion.example:8080/admin/update?id=1
This pattern demonstrates how pathnames are resolved relative to a base URL, in a similar way to relative URLs.
1.2. The URLPattern
class
typedef (USVString or URLPatternInit ); [
URLPatternInput Exposed =(Window ,Worker )]interface {
URLPattern constructor (URLPatternInput ,
input USVString ,
baseURL optional URLPatternOptions = {});
options constructor (optional URLPatternInput = {},
input optional URLPatternOptions = {});
options boolean test (optional URLPatternInput = {},
input optional USVString );
baseURL URLPatternResult ?exec (optional URLPatternInput = {},
input optional USVString );
baseURL readonly attribute USVString protocol ;readonly attribute USVString username ;readonly attribute USVString password ;readonly attribute USVString hostname ;readonly attribute USVString port ;readonly attribute USVString pathname ;readonly attribute USVString search ;readonly attribute USVString hash ;readonly attribute boolean hasRegExpGroups ; };dictionary {
URLPatternInit USVString ;
protocol USVString ;
username USVString ;
password USVString ;
hostname USVString ;
port USVString ;
pathname USVString ;
search USVString ;
hash USVString ; };
baseURL dictionary {
URLPatternOptions boolean =
ignoreCase false ; };dictionary {
URLPatternResult sequence <URLPatternInput >;
inputs URLPatternComponentResult ;
protocol URLPatternComponentResult ;
username URLPatternComponentResult ;
password URLPatternComponentResult ;
hostname URLPatternComponentResult ;
port URLPatternComponentResult ;
pathname URLPatternComponentResult ;
search URLPatternComponentResult ; };
hash dictionary {
URLPatternComponentResult USVString ;
input record <USVString , (USVString or undefined )>; };
groups
Each URLPattern
has an associated URL pattern, a URL pattern.
urlPattern = new
URLPattern
(input)- Constructs a new
URLPattern
object. The input is an object containing separate patterns for each URL component; e.g. hostname, pathname, etc. Missing components will default to a wildcard pattern. In addition, input can contain abaseURL
property that provides static text patterns for any missing components. urlPattern = new
URLPattern
(patternString, baseURL)- Constructs a new
URLPattern
object. patternString is a URL string containing pattern syntax for one or more components. If baseURL is provided, then patternString can be relative. This constructor will always set at least an empty string value and does not default any components to wildcard patterns. urlPattern = new
URLPattern
(input, options)-
Constructs a new
URLPattern
object. The options is an object containing the additional configuration options that can affect how the components are matched. Currently it has only one propertyignoreCase
which can be set to true to enable case-insensitive matching.Note that by default, that is in the absence of the options argument, matching is always case-sensitive.
urlPattern = new
URLPattern
(patternString, baseURL, options)- Constructs a new
URLPattern
object. This overrides supports aURLPatternOptions
object when constructing a pattern from a patternString object, describing the patterns for individual components, and base URL. matches = urlPattern.
test
(input)- Tests if urlPattern matches the given arguments. The input is an object containing strings representing each URL component; e.g. hostname, pathname, etc. Missing components are treated as empty strings. In addition, input can contain a
baseURL
property that provides values for any missing components. If urlPattern matches the input on a component-by-component basis then true is returned. Otherwise, false is returned. matches = urlPattern.
test
(url, baseURL)-
Tests if urlPattern matches the given arguments. url is a URL string. If baseURL is provided, then url can be relative.
If urlPattern matches the input on a component-by-component basis then true is returned. Otherwise, false is returned.
result = urlPattern.
exec
(input)-
Executes the urlPattern against the given arguments. The input is an object containing strings representing each URL component; e.g. hostname, pathname, etc. Missing components are treated as empty strings. In addition, input can contain a baseURL property that provides values for any missing components.
If urlPattern matches the input on a component-by-component basis then an object is returned containing the results. Matched group values are contained in per-component group objects within the result object; e.g.
matches.pathname.groups.id
. If urlPattern does not match the input, then result is null. result = urlPattern.
exec
(url, baseURL)-
Executes the urlPattern against the given arguments. url is a URL string. If baseURL is provided, then input can be relative.
If urlPattern matches the input on a component-by-component basis then an object is returned containing the results. Matched group values are contained in per-component group objects within the result object; e.g.
matches.pathname.groups.id
. If urlPattern does not match the input, then result is null. urlPattern.
protocol
-
Returns urlPattern’s normalized protocol pattern string.
urlPattern.
username
-
Returns urlPattern’s normalized username pattern string.
urlPattern.
password
-
Returns urlPattern’s normalized password pattern string.
urlPattern.
hostname
-
Returns urlPattern’s normalized hostname pattern string.
urlPattern.
port
-
Returns urlPattern’s normalized port pattern string.
urlPattern.
pathname
-
Returns urlPattern’s normalized pathname pattern string.
urlPattern.
search
-
Returns urlPattern’s normalized search pattern string.
urlPattern.
hash
-
Returns urlPattern’s normalized hash pattern string.
urlPattern.
hasRegExpGroups
-
Returns whether urlPattern contains one or more groups which uses regular expression matching.
new URLPattern(input, baseURL, options)
constructor steps are:
-
Run initialize given this, input, baseURL, and options.
new URLPattern(input, options)
constructor steps are:
-
Run initialize given this, input, null, and options.
URLPattern
given a URLPattern
this, URLPatternInput
input, string or null baseURL, and URLPatternOptions
options:
-
Set this’s associated URL pattern to the result of create given input, baseURL, and options.
protocol
getter steps are:
-
Return this's associated URL pattern's protocol component's pattern string.
username
getter steps are:
-
Return this's associated URL pattern's username component's pattern string.
password
getter steps are:
-
Return this's associated URL pattern's password component's pattern string.
hostname
getter steps are:
-
Return this's associated URL pattern's hostname component's pattern string.
port
getter steps are:
-
Return this's associated URL pattern's port component's pattern string.
pathname
getter steps are:
-
Return this's associated URL pattern's pathname component's pattern string.
search
getter steps are:
-
Return this's associated URL pattern's search component's pattern string.
hash
getter steps are:
-
Return this's associated URL pattern's hash component's pattern string.
hasRegExpGroups
getter steps are:
-
If this's associated URL pattern's has regexp groups, then return true.
-
Return false.
test(input, baseURL)
method steps are:
-
Let result be the result of match given this's associated URL pattern, input, and baseURL if given.
-
If result is null, return false.
-
Return true.
exec(input, baseURL)
method steps are:
-
Return the result of match given this's associated URL pattern, input, and baseURL if given.
1.3. The URL pattern struct
A URL pattern is a struct with the following items:
-
protocol component, a component
-
username component, a component
-
password component, a component
-
hostname component, a component
-
port component, a component
-
pathname component, a component
-
search component, a component
-
hash component, a component
A component is a struct with the following items:
-
pattern string, a well formed pattern string
-
regular expression, a
RegExp
-
group name list, a list of strings
-
has regexp groups, a boolean
1.4. High-level operations
URLPatternInput
input, string or null baseURL, and URLPatternOptions
options:
-
Let init be null.
-
If input is a scalar value string then:
-
Otherwise:
-
Assert: input is a
URLPatternInit
. -
If baseURL is not null, then throw a
TypeError
. -
Set init to input.
-
-
Let processedInit be the result of process a URLPatternInit given init, "
pattern
", null, null, null, null, null, null, null, and null. -
For each componentName of « "
protocol
", "username
", "password
", "hostname
", "port
", "pathname
", "search
", "hash
" »: -
If processedInit["
protocol
"] is a special scheme and processedInit["port
"] is a string which represents its corresponding default port in radix-10 using ASCII digits then set processedInit["port
"] to the empty string. -
Let urlPattern be a new URL pattern.
-
Set urlPattern’s protocol component to the result of compiling a component given processedInit["
protocol
"], canonicalize a protocol, and default options. -
Set urlPattern’s username component to the result of compiling a component given processedInit["
username
"], canonicalize a username, and default options. -
Set urlPattern’s password component to the result of compiling a component given processedInit["
password
"], canonicalize a password, and default options. -
If the result running hostname pattern is an IPv6 address given processedInit["
hostname
"] is true, then set urlPattern’s hostname component to the result of compiling a component given processedInit["hostname
"], canonicalize an IPv6 hostname, and hostname options. -
Otherwise, set urlPattern’s hostname component to the result of compiling a component given processedInit["
hostname
"], canonicalize a hostname, and hostname options. -
Set urlPattern’s port component to the result of compiling a component given processedInit["
port
"], canonicalize a port, and default options. -
Let compileOptions be a copy of the default options with the ignore case property set to options["
ignoreCase
"]. -
If the result of running protocol component matches a special scheme given urlPattern’s protocol component is true, then:
-
Let pathCompileOptions be copy of the pathname options with the ignore case property set to options["
ignoreCase
"]. -
Set urlPattern’s pathname component to the result of compiling a component given processedInit["
pathname
"], canonicalize a pathname, and pathCompileOptions.
-
-
Otherwise set urlPattern’s pathname component to the result of compiling a component given processedInit["
pathname
"], canonicalize an opaque pathname, and compileOptions. -
Set urlPattern’s search component to the result of compiling a component given processedInit["
search
"], canonicalize a search, and compileOptions. -
Set urlPattern’s hash component to the result of compiling a component given processedInit["
hash
"], canonicalize a hash, and compileOptions. -
Return urlPattern.
URLPatternInput
or URL input, and an optional string baseURLString:
-
Let protocol be the empty string.
-
Let username be the empty string.
-
Let password be the empty string.
-
Let hostname be the empty string.
-
Let port be the empty string.
-
Let pathname be the empty string.
-
Let search be the empty string.
-
Let hash be the empty string.
-
Let inputs be an empty list.
-
Append input to inputs.
-
If input is a
URLPatternInit
then:-
If baseURLString was given, throw a
TypeError
. -
Let applyResult be the result of process a URLPatternInit given input, "url", protocol, username, password, hostname, port, pathname, search, and hash. If this throws an exception, catch it, and return null.
-
Set protocol to applyResult["
protocol
"]. -
Set username to applyResult["
username
"]. -
Set password to applyResult["
password
"]. -
Set hostname to applyResult["
hostname
"]. -
Set port to applyResult["
port
"]. -
Set pathname to applyResult["
pathname
"]. -
Set search to applyResult["
search
"]. -
Set hash to applyResult["
hash
"].
-
-
Otherwise:
-
Let url be input.
-
If input is a
USVString
: -
Set protocol to url’s scheme.
-
Set username to url’s username.
-
Set password to url’s password.
-
Set hostname to url’s host, serialized, or the empty string if the value is null.
-
Set port to url’s port, serialized, or the empty string if the value is null.
-
Set pathname to the result of URL path serializing url.
-
Set search to url’s query or the empty string if the value is null.
-
Set hash to url’s fragment or the empty string if the value is null.
-
-
Let protocolExecResult be RegExpBuiltinExec(urlPattern’s protocol component's regular expression, protocol).
-
Let usernameExecResult be RegExpBuiltinExec(urlPattern’s username component's regular expression, username).
-
Let passwordExecResult be RegExpBuiltinExec(urlPattern’s password component's regular expression, password).
-
Let hostnameExecResult be RegExpBuiltinExec(urlPattern’s hostname component's regular expression, hostname).
-
Let portExecResult be RegExpBuiltinExec(urlPattern’s port component's regular expression, port).
-
Let pathnameExecResult be RegExpBuiltinExec(urlPattern’s pathname component's regular expression, pathname).
-
Let searchExecResult be RegExpBuiltinExec(urlPattern’s search component's regular expression, search).
-
Let hashExecResult be RegExpBuiltinExec(urlPattern’s hash component's regular expression, hash).
-
If protocolExecResult, usernameExecResult, passwordExecResult, hostnameExecResult, portExecResult, pathnameExecResult, searchExecResult, or hashExecResult are null then return null.
-
Let result be a new
URLPatternResult
. -
Set result["
inputs
"] to inputs. -
Set result["
protocol
"] to the result of creating a component match result given urlPattern’s protocol component, protocol, and protocolExecResult. -
Set result["
username
"] to the result of creating a component match result given urlPattern’s username component, username, and usernameExecResult. -
Set result["
password
"] to the result of creating a component match result given urlPattern’s password component, password, and passwordExecResult. -
Set result["
hostname
"] to the result of creating a component match result given urlPattern’s hostname component, hostname, and hostnameExecResult. -
Set result["
port
"] to the result of creating a component match result given urlPattern’s port component, port, and portExecResult. -
Set result["
pathname
"] to the result of creating a component match result given urlPattern’s pathname component, pathname, and pathnameExecResult. -
Set result["
search
"] to the result of creating a component match result given urlPattern’s search component, search, and searchExecResult. -
Set result["
hash
"] to the result of creating a component match result given urlPattern’s hash component, hash, and hashExecResult. -
Return result.
-
If urlPattern’s protocol component has regexp groups is true, then return true.
-
If urlPattern’s username component has regexp groups is true, then return true.
-
If urlPattern’s password component has regexp groups is true, then return true.
-
If urlPattern’s hostname component has regexp groups is true, then return true.
-
If urlPattern’s port component has regexp groups is true, then return true.
-
If urlPattern’s pathname component has regexp groups is true, then return true.
-
If urlPattern’s search component has regexp groups is true, then return true.
-
If urlPattern’s hash component has regexp groups is true, then return true.
-
Return false.
1.5. Internals
-
Let part list be the result of running parse a pattern string given input, options, and encoding callback.
-
Let (regular expression string, name list) be the result of running generate a regular expression and name list given part list and options.
-
Let flags be an empty string.
-
If options’s ignore case is true then set flags to "
vi
". -
Otherwise set flags to "
v
" -
Let regular expression be RegExpCreate(regular expression string, flags). If this throws an exception, catch it, and throw a
TypeError
.The specification uses regular expressions to perform all matching, but this is not mandated. Implementations are free to perform matching directly against the part list when possible; e.g. when there are no custom regexp matching groups. If there are custom regular expressions, however, its important that they be immediately evaluated in the compile a component algorithm so an error can be thrown if they are invalid.
-
Let pattern string be the result of running generate a pattern string given part list and options.
-
Let has regexp groups be false.
-
For each part of part list:
-
Return a new component whose pattern string is pattern string, regular expression is regular expression, group name list is name list, and has regexp groups is has regexp groups.
-
Let result be a new
URLPatternComponentResult
. -
Set result["
input
"] to input. -
Let groups be a
record<
.USVString
, (USVString
orundefined
)> -
Let index be 1.
-
While index is less than Get(execResult, "
length
"):-
Let name be component’s group name list[index − 1].
-
Set groups[name] to value.
-
Increment index by 1.
-
-
Set result["
groups
"] to groups. -
Return result.
The default options is an options struct with delimiter code point set to the empty string and prefix code point set to the empty string.
The hostname options is an options struct with delimiter code point set ".
" and prefix code point set to the empty string.
The pathname options is an options struct with delimiter code point set "/
" and prefix code point set to "/
".
-
Let special scheme list be a list populated with all of the special schemes.
-
For each scheme of special scheme list:
-
Let test result be RegExpBuiltinExec(protocol component’s regular expression, scheme).
-
If test result is not null, then return true.
-
-
Return false.
-
If input’s code point length is less than 2, then return false.
-
Let input code points be input interpreted as a list of code points.
-
If input code points[0] is U+005B (
[
), then return true. -
If input code points[0] is U+007B (
{
) and input code points[1] is U+005B ([
), then return true. -
If input code points[0] is U+005C (
\
) and input code points[1] is U+005B ([
), then return true. -
Return false.
1.6. Constructor string parsing
A constructor string parser is a struct.
A constructor string parser has an associated input, a string, which must be set upon creation.
A constructor string parser has an associated token list, a token list, which must be set upon creation.
A constructor string parser has an associated result, a URLPatternInit
, initially set to a new URLPatternInit
.
A constructor string parser has an associated component start, a number, initially set to 0.
A constructor string parser has an associated token index, a number, initially set to 0.
A constructor string parser has an associated token increment, a number, initially set to 1.
A constructor string parser has an associated group depth, a number, initially set to 0.
A constructor string parser has an associated hostname IPv6 bracket depth, a number, initially set to 0.
A constructor string parser has an associated protocol matches a special scheme flag, a boolean, initially set to false.
A constructor string parser has an associated state, a string, initially set to "init
". It must be one of the following:
- "
init
" - "
protocol
" - "
authority
" - "
username
" - "
password
" - "
hostname
" - "
port
" - "
pathname
" - "
search
" - "
hash
" - "
done
"
The URLPattern constructor string algorithm is very similar to the basic URL parser algorithm, but some differences prevent us from using that algorithm directly.
First, the URLPattern constructor string parser operates on tokens generated using the "lenient
" tokenize policy. In constrast, basic URL parser operates on code points. Operating on tokens allows the URLPattern constructor string parser to more easily distinguish between code points that are significant pattern syntax and code points that might be a URL component separator. For example, it makes it trivial to handle named groups like ":hmm
" in "https://a.c:hmm.example.com:8080
" without getting confused with the port number.
Second, the URLPattern constructor string parser needs to avoid applying URL canonicalization to all code points like basic URL parser does. Instead we perform canonicalization on only parts of the pattern string we know are safe later when compiling each component pattern string.
Finally, the URLPattern constructor string parser does not handle some parts of the basic URL parser state machine. For example, it does not treat backslashes specially as they would all be treated as pattern characters and would require excessive escaping. In addition, this parser might not handle some more esoteric parts of the URL parsing algorithm like file URLs with a hostname. The goal with this parser was to handle the most common URLs while allowing any niche case to be handled instead via the URLPatternInit
constructor.
In the constructor string algorithm, the pathname, search, and hash are wildcarded if earlier components are specified but later ones are not. For example, "https://example.com/foo
" matches any search and any hash. Similarly, "https://example.com
" matches any URL on that origin. This is analogous to the notion of a more specific component in the notes about process a URLPatternInit (e.g., a search is more specific than a pathname), but the constructor syntax only has a few cases where it is possible to specify a more specific component without also specifying the less specific components.
The username and password components are always wildcard unless they are explicitly specified.
If a hostname is specified and the port is not, the port is assumed to be the default port. If authors want to match any port, they have to write :*
explicitly. For example, "https://*
" is any HTTPS origin on port 443, and "https://*:*
" is any HTTPS origin on any port.
-
Let parser be a new constructor string parser whose input is input and token list is the result of running tokenize given input and "
lenient
". -
While parser’s token index is less than parser’s token list size:
-
Set parser’s token increment to 1.
On every iteration of the parse loop the parser’s token index will be incremented by its token increment value. Typically this means incrementing by 1, but at certain times it is set to zero. The token increment is then always reset back to 1 at the top of the loop.
-
If parser’s token list[parser’s token index]'s type is "
end
" then:-
If we reached the end of the string in the "
init
" state, then we failed to find a protocol terminator and this has to be a relative URLPattern constructor string.-
Run rewind given parser.
We next determine at which component the relative pattern begins. Relative pathnames are most common, but URLs and URLPattern constructor strings can begin with the search or hash components as well.
-
If the result of running is a hash prefix given parser is true, then run change state given parser, "
hash
" and 1. -
Otherwise if the result of running is a search prefix given parser is true:
-
Run change state given parser, "
search
" and 1.
-
-
Otherwise:
-
Run change state given parser, "
pathname
" and 0.
-
-
Increment parser’s token index by parser’s token increment.
-
-
If parser’s state is "
authority
":If we reached the end of the string in the "
authority
" state, then we failed to find an "@
". Therefore there is no username or password.-
Run rewind and set state given parser, and "
hostname
". -
Increment parser’s token index by parser’s token increment.
-
-
Run change state given parser, "
done
" and 0.
-
-
If the result of running is a group open given parser is true:
We ignore all code points within "
{ ... }
" pattern groupings. It would not make sense to allow a URL component boundary to lie within a grouping; e.g. "https://example.c{om/fo}o
". While not supported within well formed pattern strings, we handle nested groupings here to avoid parser confusion.It is not necessary to perform this logic for regexp or named groups since those values are collapsed into individual tokens by the tokenize algorithm.
-
Increment parser’s group depth by 1.
-
Increment parser’s token index by parser’s token increment.
-
-
If parser’s group depth is greater than 0:
-
If the result of running is a group close given parser is true, then decrement parser’s group depth by 1.
-
Otherwise:
-
Increment parser’s token index by parser’s token increment.
-
-
-
Switch on parser’s state and run the associated steps:
- "
init
" -
-
If the result of running is a protocol suffix given parser is true:
-
Run rewind and set state given parser and "
protocol
".
-
-
- "
protocol
" -
-
If the result of running is a protocol suffix given parser is true:
-
Run compute protocol matches a special scheme flag given parser.
We need to eagerly compile the protocol component to determine if it matches any special schemes. If it does then certain special rules apply. It determines if the pathname defaults to a "
/
" and also whether we will look for the username, password, hostname, and port components. Authority slashes can also cause us to look for these components as well. Otherwise we treat this as an "opaque path URL" and go straight to the pathname component. -
Let next state be "
pathname
". -
Let skip be 1.
-
If the result of running next is authority slashes given parser is true:
-
Set next state to "
authority
". -
Set skip to 3.
-
-
Otherwise if parser’s protocol matches a special scheme flag is true, then set next state to "
authority
". -
Run change state given parser, next state, and skip.
-
-
- "
authority
" -
-
If the result of running is an identity terminator given parser is true, then run rewind and set state given parser and "
username
". -
Otherwise if any of the following are true:
- the result of running is a pathname start given parser;
- the result of running is a search prefix given parser; or
- the result of running is a hash prefix given parser,
then run rewind and set state given parser and "
hostname
".
-
- "
username
" -
-
If the result of running is a password prefix given parser is true, then run change state given parser, "
password
", and 1. -
Otherwise if the result of running is an identity terminator given parser is true, then run change state given parser, "
hostname
", and 1.
-
- "
password
" -
-
If the result of running is an identity terminator given parser is true, then run change state given parser, "
hostname
", and 1.
-
- "
hostname
" -
-
If the result of running is an IPv6 open given parser is true, then increment parser’s hostname IPv6 bracket depth by 1.
-
Otherwise if the result of running is an IPv6 close given parser is true, then decrement parser’s hostname IPv6 bracket depth by 1.
-
Otherwise if the result of running is a port prefix given parser is true and parser’s hostname IPv6 bracket depth is zero, then run change state given parser, "
port
", and 1. -
Otherwise if the result of running is a pathname start given parser is true, then run change state given parser, "
pathname
", and 0. -
Otherwise if the result of running is a search prefix given parser is true, then run change state given parser, "
search
", and 1. -
Otherwise if the result of running is a hash prefix given parser is true, then run change state given parser, "
hash
", and 1.
-
- "
port
" -
-
If the result of running is a pathname start given parser is true, then run change state given parser, "
pathname
", and 0. -
Otherwise if the result of running is a search prefix given parser is true, then run change state given parser, "
search
", and 1. -
Otherwise if the result of running is a hash prefix given parser is true, then run change state given parser, "
hash
", and 1.
-
- "
pathname
" -
-
If the result of running is a search prefix given parser is true, then run change state given parser, "
search
", and 1. -
Otherwise if the result of running is a hash prefix given parser is true, then run change state given parser, "
hash
", and 1.
-
- "
search
" -
-
If the result of running is a hash prefix given parser is true, then run change state given parser, "
hash
", and 1.
-
- "
hash
" -
-
Do nothing.
-
- "
done
" -
-
Assert: This step is never reached.
-
- "
-
Increment parser’s token index by parser’s token increment.
-
-
If parser’s result contains "
hostname
" and not "port
", then set parser’s result["port
"] to the empty string.This is special-cased because when an author does not specify a port, they usually intend the default port. If any port is acceptable, the author can specify it as a wildcard explicitly. For example, "https://example.com/*
" does not match URLs beginning with "https://example.com:8443/
", which is a different origin. -
Return parser’s result.
-
If parser’s state is not "
init
", not "authority
", and not "done
", then set parser’s result[parser’s state] to the result of running make a component string given parser. -
If parser’s state is not "
init
" and new state is not "done
", then:-
If parser’s state is "
protocol
", "authority
", "username
", or "password
"; new state is "port
", "pathname
", "search
", or "hash
"; and parser’s result["hostname
"] does not exist, then set parser’s result["hostname
"] to the empty string. -
If parser’s state is "
protocol
", "authority
", "username
", "password
", "hostname
", or "port
"; new state is "search
" or "hash
"; and parser’s result["pathname
"] does not exist, then:-
If parser’s protocol matches a special scheme flag is true, then set parser’s result["
pathname
"] to "/
". -
Otherwise, set parser’s result["
pathname
"] to the empty string.
-
-
If parser’s state is "
protocol
", "authority
", "username
", "password
", "hostname
", "port
", or "pathname
"; new state is "hash
"; and parser’s result["search
"] does not exist, then set parser’s result["search
"] to the empty string.
-
-
Set parser’s state to new state.
-
Increment parser’s token index by skip.
-
Set parser’s component start to parser’s token index.
-
Set parser’s token increment to 0.
-
Set parser’s token index to parser’s component start.
-
Set parser’s token increment to 0.
-
If index is less than parser’s token list's size, then return parser’s token list[index].
-
Assert: parser’s token list's size is greater than or equal to 1.
-
Let last index be parser’s token list's size − 1.
-
Let token be parser’s token list[last index].
-
Return token.
-
Let token be the result of running get a safe token given parser and index.
-
If token’s value is not value, then return false.
-
If any of the following are true:
- token’s type is "
char
"; - token’s type is "
escaped-char
"; or - token’s type is "
invalid-char
",
then return true.
- token’s type is "
-
Return false.
-
Return the result of running is a non-special pattern char given parser, parser’s token index, and "
:
".
-
If the result of running is a non-special pattern char given parser, parser’s token index + 1, and "
/
" is false, then return false. -
If the result of running is a non-special pattern char given parser, parser’s token index + 2, and "
/
" is false, then return false. -
Return true.
-
Return the result of running is a non-special pattern char given parser, parser’s token index, and "
@
".
-
Return the result of running is a non-special pattern char given parser, parser’s token index, and "
:
".
-
Return the result of running is a non-special pattern char given parser, parser’s token index, and "
:
".
-
Return the result of running is a non-special pattern char given parser, parser’s token index, and "
/
".
-
If result of running is a non-special pattern char given parser, parser’s token index and "
?
" is true, then return true. -
If parser’s token list[parser’s token index]'s value is not "
?
", then return false. -
Let previous index be parser’s token index − 1.
-
If previous index is less than 0, then return true.
-
Let previous token be the result of running get a safe token given parser and previous index.
-
If any of the following are true, then return false:
-
Return true.
-
Return the result of running is a non-special pattern char given parser, parser’s token index and "
#
".
-
If parser’s token list[parser’s token index]'s type is "
open
", then return true. -
Otherwise return false.
-
If parser’s token list[parser’s token index]'s type is "
close
", then return true. -
Otherwise return false.
-
Return the result of running is a non-special pattern char given parser, parser’s token index, and "
[
".
-
Return the result of running is a non-special pattern char given parser, parser’s token index, and "
]
".
-
Assert: parser’s token index is less than parser’s token list's size.
-
Let token be parser’s token list[parser’s token index].
-
Let component start token be the result of running get a safe token given parser and parser’s component start.
-
Let component start input index be component start token’s index.
-
Let end index be token’s index.
-
Return the code point substring from component start input index to end index within parser’s input.
-
Let protocol string be the result of running make a component string given parser.
-
Let protocol component be the result of compiling a component given protocol string, canonicalize a protocol, and default options.
-
If the result of running protocol component matches a special scheme given protocol component is true, then set parser’s protocol matches a special scheme flag to true.
2. Pattern strings
A pattern string is a string that is written to match a set of target strings. A well formed pattern string conforms to a particular pattern syntax. This pattern syntax is directly based on the syntax used by the popular path-to-regexp JavaScript library.
It can be parsed to produce a part list which describes, in order, what must appear in a component string for the pattern string to match.
/
in the pathname, .
in the hostname). For example, the pathname pattern "/blog/:title
" will match "/blog/hello-world
" but not "/blog/2012/02
".
A regular expression enclosed in parentheses can also be used instead, so the pathname pattern "/blog/:year(\\d+)/:month(\\d+)
" will match "/blog/2012/02
".
A group can also be made optional, or repeated, by using a modifier. For example, the pathname pattern "/products/:id?"
will match both "/products
" and "/products/2
" (but not "/products/
"). In the pathname specifically, groups automatically require a leading /
; to avoid this, the group can be explicitly deliminated, as in the pathname pattern "/products/{:id}?
".
A full wildcard *
can also be used to match as much as possible, as in the pathname pattern "/products/*
".
2.1. Parsing pattern strings
2.1.1. Tokens
A token list is a list containing zero or more token structs.
A token is a struct representing a single lexical token within a pattern string.
A token has an associated type, a string, initially "invalid-char
". It must be one of the following:
- "
open
" - The token represents a U+007B (
{
) code point. - "
close
" - The token represents a U+007D (
}
) code point. - "
regexp
" - The token represents a string of the form "
(<regular expression>)
". The regular expression is required to consist of only ASCII code points. - "
name
" - The token represents a string of the form "
:<name>
". The name value is restricted to code points that are consistent with JavaScript identifiers. - "
char
" - The token represents a valid pattern code point without any special syntactical meaning.
- "
escaped-char
" - The token represents a code point escaped using a backslash like "
\<char>
". - "
other-modifier
" - The token represents a matching group modifier that is either the U+003F (
?
) or U+002B (+
) code points. - "
asterisk
" - The token represents a U+002A (
*
) code point that can be either a wildcard matching group or a matching group modifier. - "
end
" - The token represents the end of the pattern string.
- "
invalid-char
" - The token represents a code point that is invalid in the pattern. This could be because of the code point value itself or due to its location within the pattern relative to other syntactic elements.
A token has an associated index, a number, initially 0. It is the position of the first code point in the pattern string represented by the token.
A token has an associated value, a string, initially the empty string. It contains the code points from the pattern string represented by the token.
2.1.2. Tokenizing
A tokenize policy is a string that must be either "strict
" or "lenient
".
A tokenizer is a struct.
A tokenizer has an associated input, a pattern string, initially the empty string.
A tokenizer has an associated policy, a tokenize policy, initially "strict
".
A tokenizer has an associated token list, a token list, initially an empty list.
A tokenizer has an associated index, a number, initially 0.
A tokenizer has an associated next index, a number, initially 0.
A tokenizer has an associated code point, a Unicode code point, initially null.
-
Let tokenizer be a new tokenizer.
-
Set tokenizer’s input to input.
-
Set tokenizer’s policy to policy.
-
While tokenizer’s index is less than tokenizer’s input's code point length:
-
Run seek and get the next code point given tokenizer and tokenizer’s index.
-
If tokenizer’s code point is U+002A (
*
):-
Run add a token with default position and length given tokenizer and "
asterisk
".
-
-
If tokenizer’s code point is U+002B (
+
) or U+003F (?
):-
Run add a token with default position and length given tokenizer and "
other-modifier
".
-
-
If tokenizer’s code point is U+005C (
\
):-
If tokenizer’s index is equal to tokenizer’s input's code point length − 1:
-
Run process a tokenizing error given tokenizer, tokenizer’s next index, and tokenizer’s index.
-
-
Let escaped index be tokenizer’s next index.
-
Run get the next code point given tokenizer.
-
Run add a token with default length given tokenizer, "
escaped-char
", tokenizer’s next index, and escaped index.
-
-
If tokenizer’s code point is U+007B (
{
):-
Run add a token with default position and length given tokenizer and "
open
".
-
-
If tokenizer’s code point is U+007D (
}
):-
Run add a token with default position and length given tokenizer and "
close
".
-
-
If tokenizer’s code point is U+003A (
:
):-
Let name position be tokenizer’s next index.
-
Let name start be name position.
-
While name position is less than tokenizer’s input's code point length:
-
Run seek and get the next code point given tokenizer and name position.
-
Let first code point be true if name position equals name start and false otherwise.
-
Let valid code point be the result of running is a valid name code point given tokenizer’s code point and first code point.
-
If valid code point is false break.
-
Set name position to tokenizer’s next index.
-
-
If name position is less than or equal to name start:
-
Run process a tokenizing error given tokenizer, name start, and tokenizer’s index.
-
-
Run add a token with default length given tokenizer, "
name
", name position, and name start.
-
-
If tokenizer’s code point is U+0028 (
(
):-
Let depth be 1.
-
Let regexp position be tokenizer’s next index.
-
Let regexp start be regexp position.
-
Let error be false.
-
While regexp position is less than tokenizer’s input's code point length:
-
Run seek and get the next code point given tokenizer and regexp position.
-
If the result of running is ASCII given tokenizer’s code point is false:
-
Run process a tokenizing error given tokenizer, regexp start, and tokenizer’s index.
-
Set error to true.
-
-
If regexp position equals regexp start and tokenizer’s code point is U+003F (
?
):-
Run process a tokenizing error given tokenizer, regexp start, and tokenizer’s index.
-
Set error to true.
-
-
If tokenizer’s code point is U+005C (
\
):-
If regexp position equals tokenizer’s input's code point length − 1:
-
Run process a tokenizing error given tokenizer, regexp start, and tokenizer’s index.
-
Set error to true.
-
-
Run get the next code point given tokenizer.
-
If the result of running is ASCII given tokenizer’s code point is false:
-
Run process a tokenizing error given tokenizer, regexp start, and tokenizer’s index.
-
Set error to true.
-
-
Set regexp position to tokenizer’s next index.
-
-
If tokenizer’s code point is U+0029 (
)
):-
Decrement depth by 1.
-
If depth is 0:
-
Set regexp position to tokenizer’s next index.
-
-
-
Otherwise if tokenizer’s code point is U+0028 (
(
):-
Increment depth by 1.
-
If regexp position equals tokenizer’s input's code point length − 1:
-
Run process a tokenizing error given tokenizer, regexp start, and tokenizer’s index.
-
Set error to true.
-
-
Let temporary position be tokenizer’s next index.
-
Run get the next code point given tokenizer.
-
If tokenizer’s code point is not U+003F (
?
):-
Run process a tokenizing error given tokenizer, regexp start, and tokenizer’s index.
-
Set error to true.
-
-
Set tokenizer’s next index to temporary position.
-
-
Set regexp position to tokenizer’s next index.
-
-
If error is true continue.
-
If depth is not zero:
-
Run process a tokenizing error given tokenizer, regexp start, and tokenizer’s index.
-
-
Let regexp length be regexp position − regexp start − 1.
-
If regexp length is zero:
-
Run process a tokenizing error given tokenizer, regexp start, and tokenizer’s index.
-
-
Run add a token given tokenizer, "
regexp
", regexp position, regexp start, and regexp length.
-
-
Run add a token with default position and length given tokenizer and "
char
".
-
-
Run add a token with default length given tokenizer, "
end
", tokenizer’s index, and tokenizer’s index. -
Return tokenizer’s token list.
-
Set tokenizer’s code point to the Unicode code point in tokenizer’s input at the position indicated by tokenizer’s next index.
-
Increment tokenizer’s next index by 1.
-
Set tokenizer’s next index to index.
-
Run get the next code point given tokenizer.
-
Let token be a new token.
-
Set token’s type to type.
-
Set token’s value to the code point substring from value position with length value length within tokenizer’s input.
-
Append token to the back of tokenizer’s token list.
-
Set tokenizer’s index to next position.
-
Let computed length be next position − value position.
-
Run add a token given tokenizer, type, next position, value position, and computed length.
-
Run add a token with default length given tokenizer, type, tokenizer’s next index, and tokenizer’s index.
-
Run add a token with default length given tokenizer, "
invalid-char
", next position, and value position.
-
If first is true return the result of checking if code point is contained in the IdentifierStart set of code points.
-
Otherwise return the result of checking if code point is contained in the IdentifierPart set of code points.
-
If code point is between U+0000 and U+007F inclusive, then return true.
-
Otherwise return false.
2.1.3. Parts
A part list is a list of zero or more parts.
A part is a struct representing one piece of a parser pattern string. It can contain at most one matching group, a fixed text prefix, a fixed text suffix, and a modifier. It can contain as little as a single fixed text string or a single matching group.
A part has an associated type, a string, which must be set upon creation. It must be one of the following:
- "
fixed-text
" - The part represents a simple fixed text string.
- "
regexp
" - The part represents a matching group with a custom regular expression.
- "
segment-wildcard
" - The part represents a matching group that matches code points up to the next separator code point. This is typically used for a named group like "
:foo
" that does not have a custom regular expression. - "
full-wildcard
" - The part represents a matching group that greedily matches all code points. This is typically used for the "
*
" wildcard matching group.
A part has an associated value, a string, which must be set upon creation.
A part has an associated modifier a string, which must be set upon creation. It must be one of the following:
- "
none
" - The part does not have a modifier.
- "
optional
" - The part has an optional modifier indicated by the U+003F (
?
) code point. - "
zero-or-more
" - The part has a "zero or more" modifier indicated by the U+002A (
*
) code point. - "
one-or-more
" - The part has a "one or more" modifier indicated by the U+002B (
+
) code point.
A part has an associated name, a string, initially the empty string.
A part has an associated prefix, a string, initially the empty string.
A part has an associated suffix, a string, initially the empty string.
2.1.4. Options
An options struct contains different settings that control how pattern string behaves. These options originally come from path-to-regexp. We only include the options that are modified within the URLPattern specification and exclude the other options. For the purposes of comparison, this specification acts like path-to-regexp where strict
, start
, and end
are always set to false.
An options has an associated delimiter code point, a string, which must be set upon creation. It must contain one ASCII code point or the empty string. This code point is treated as a segment separator and is used for determining how far a :foo
named group should match by default. For example, if the delimiter code point is "/
" then "/:foo
" will match "/bar
", but not "/bar/baz
". If the delimiter code point is the empty string then the example pattern would match both strings.
An options has an associated prefix code point, a string, which must be set upon creation. It must contain one ASCII code point or the empty string. The code point is treated as an automatic prefix if found immediately preceding a match group. This matters when a match group is modified to be optional or repeating. For example, if prefix code point is "/
" then "/foo/:bar?/baz
" will treat the "/
" before ":bar
" as a prefix that becomes optional along with the named group. So in this example the pattern would match "/foo/baz
".
An options has an associated ignore case, a boolean, which must be set up upon creation. It defaults to false. Depending on the set value, true or false, this flag enables case-sensitive or case-insensitive matches, respectively. For the purpose of comparison, this case be thought of as the negated sensitive
option in path-to-regexp.
2.1.5. Parsing
A pattern parser is a struct.
A pattern parser has an associated token list, a token list, initially an empty list.
A pattern parser has an associated encoding callback, a encoding callback, that must be set upon creation.
A pattern parser has an associated segment wildcard regexp, a string, that must be set upon creation.
A pattern parser has an associated part list, a part list, initially an empty list.
A pattern parser has an associated pending fixed value, a string, initially the empty string.
A pattern parser has an associated index, a number, initially 0.
A pattern parser has an associated next numeric name, a number, initially 0.
-
Let parser be a new pattern parser whose encoding callback is encoding callback and segment wildcard regexp is the result of running generate a segment wildcard regexp given options.
-
Set parser’s token list to the result of running tokenize given input and "
strict
". -
While parser’s index is less than parser’s token list's size:
This first section is looking for the sequence:
<prefix char><name><regexp><modifier>
. There could be zero to all of these tokens.-
Let char token be the result of running try to consume a token given parser and "
char
". -
Let name token be the result of running try to consume a token given parser and "
name
". -
Let regexp or wildcard token be the result of running try to consume a regexp or wildcard token given parser and name token.
-
If name token is not null or regexp or wildcard token is not null:
If there is a matching group, we need to add the part immediately.
-
Let prefix be the empty string.
-
If char token is not null then set prefix to char token’s value.
-
If prefix is not the empty string and not options’s prefix code point:
-
Append prefix to the end of parser’s pending fixed value.
-
Set prefix to the empty string.
-
-
Run maybe add a part from the pending fixed value given parser.
-
Let modifier token be the result of running try to consume a modifier token given parser.
-
Run add a part given parser, prefix, name token, regexp or wildcard token, the empty string, and modifier token.
-
-
Let fixed token be char token.
If there was no matching group, then we need to buffer any fixed text. We want to collect as much text as possible before adding it as a "
fixed-text
" part. -
If fixed token is null, then set fixed token to the result of running try to consume a token given parser and "
escaped-char
". -
If fixed token is not null:
-
Append fixed token’s value to parser’s pending fixed value.
-
-
Let open token be the result of running try to consume a token given parser and "
open
". -
If open token is not null:
-
Set prefix be the result of running consume text given parser.
-
Set name token to the result of running try to consume a token given parser and "
name
". -
Set regexp or wildcard token to the result of running try to consume a regexp or wildcard token given parser and name token.
-
Let suffix be the result of running consume text given parser.
-
Run consume a required token given parser and "
close
". -
Set modifier token to the result of running try to consume a modifier token given parser.
-
Run add a part given parser, prefix, name token, regexp or wildcard token, suffix, and modifier token.
-
-
Run maybe add a part from the pending fixed value given parser.
-
Run consume a required token given parser and "
end
".
-
-
Return parser’s part list.
The full wildcard regexp value is the string ".*
".
-
Let result be "
[^
". -
Append the result of running escape a regexp string given options’s delimiter code point to the end of result.
-
Append "
]+?
" to the end of result. -
Return result.
-
Assert: parser’s index is less than parser’s token list size.
-
Let next token be parser’s token list[parser’s index].
-
If next token’s type is not type return null.
-
Increment parser’s index by 1.
-
Return next token.
-
Let token be the result of running try to consume a token given parser and "
other-modifier
". -
If token is not null, then return token.
-
Set token to the result of running try to consume a token given parser and "
asterisk
". -
Return token.
-
Let token be the result of running try to consume a token given parser and "
regexp
". -
If name token is null and token is null, then set token to the result of running try to consume a token given parser and "
asterisk
". -
Return token.
-
Let result be the result of running try to consume a token given parser and type.
-
If result is null, then throw a
TypeError
. -
Return result.
-
Let result be the empty string.
-
While true:
-
Let token be the result of running try to consume a token given parser and "
char
". -
If token is null, then set token to the result of running try to consume a token given parser and "
escaped-char
". -
If token is null, then break.
-
Append token’s value to the end of result.
-
-
Return result.
-
If parser’s pending fixed value is the empty string, then return.
-
Let encoded value be the result of running parser’s encoding callback given parser’s pending fixed value.
-
Set parser’s pending fixed value to the empty string.
-
Let part be a new part whose type is "
fixed-text
", value is encoded value, and modifier is "none
".
-
Let modifier be "
none
". -
If modifier token is not null:
-
If modifier token’s value is "
?
" then set modifier to "optional
". -
Otherwise if modifier token’s value is "
*
" then set modifier to "zero-or-more
". -
Otherwise if modifier token’s value is "
+
" then set modifier to "one-or-more
".
-
-
If name token is null and regexp or wildcard token is null and modifier is "
none
":This was a "
{foo}
" grouping. We add this to the pending fixed value so that it will be combined with any previous or subsequent text.-
Append prefix to the end of parser’s pending fixed value.
-
Return.
-
-
Run maybe add a part from the pending fixed value given parser.
-
If name token is null and regexp or wildcard token is null:
This was a "
{foo}?
" grouping. The modifier means we cannot combine it with other text. Therefore we add it as a part immediately.-
Assert: suffix is the empty string.
-
If prefix is the empty string, then return.
-
Let encoded value be the result of running parser’s encoding callback given prefix.
-
Let part be a new part whose type is "
fixed-text
", value is encoded value, and modifier is modifier. -
Return.
-
-
Let regexp value be the empty string.
Next, we convert the regexp or wildcard token into a regular expression.
-
If regexp or wildcard token is null, then set regexp value to parser’s segment wildcard regexp.
-
Otherwise if regexp or wildcard token’s type is "
asterisk
", then set regexp value to the full wildcard regexp value. -
Otherwise set regexp value to regexp or wildcard token’s value.
-
Let type be "
regexp
".Next, we convert regexp value into a part type. We make sure to go to a regular expression first so that an equivalent "
regexp
" token will be treated the same as a "name
" or "asterisk
" token. -
If regexp value is parser’s segment wildcard regexp:
-
Set type to "
segment-wildcard
". -
Set regexp value to the empty string.
-
-
Otherwise if regexp value is the full wildcard regexp value:
-
Set type to "
full-wildcard
". -
Set regexp value to the empty string.
-
-
Let name be the empty string.
Next, we determine the part name. This can be explicitly provided by a "
name
" token or be automatically assigned. -
If name token is not null, then set name to name token’s value.
-
Otherwise if regexp or wildcard token is not null:
-
Set name to parser’s next numeric name, serialized.
-
Increment parser’s next numeric name by 1.
-
-
If the result of running is a duplicate name given parser and name is true, then throw a
TypeError
. -
Let encoded prefix be the result of running parser’s encoding callback given prefix.
Finally, we encode the fixed text values and create the part.
-
Let encoded suffix be the result of running parser’s encoding callback given suffix.
-
Let part be a new part whose type is type, value is regexp value, modifier is modifier, name is name, prefix is encoded prefix, and suffix is encoded suffix.
2.2. Converting part lists to regular expressions
-
Let result be "
^
". -
Let name list be a new list.
-
For each part of part list:
-
If part’s type is "
fixed-text
":-
If part’s modifier is "
none
", then append the result of running escape a regexp string given part’s value to the end of result. -
Otherwise:
A "
fixed-text
" part with a modifier uses a non capturing group. It uses the following form.(?:<fixed text>)<modifier>
-
Append "
(?:
" to the end of result. -
Append the result of running escape a regexp string given part’s value to the end of result.
-
Append "
)
" to the end of result. -
Append the result of running convert a modifier to a string given part’s modifier to the end of result.
-
-
-
Append part’s name to name list.
We collect the list of matching group names in a parallel list. This is largely done for legacy reasons to match path-to-regexp. We could attempt to convert this to use regular expression named captured groups, but given the complexity of this algorithm there is a real risk of introducing unintended bugs. In addition, if we ever end up exposing the generated regular expressions to the web we would like to maintain compability with path-to-regexp which has indicated its unlikely to switch to using named capture groups.
-
Let regexp value be part’s value.
-
If part’s type is "
segment-wildcard
", then set regexp value to the result of running generate a segment wildcard regexp given options. -
Otherwise if part’s type is "
full-wildcard
", then set regexp value to full wildcard regexp value. -
If part’s prefix is the empty string and part’s suffix is the empty string:
If there is no prefix or suffix then generation depends on the modifier. If there is no modifier or just the optional modifier, it uses the following simple form:
(<regexp value>)<modifier>
If there is a repeating modifier, however, we will use the more complex form:
((?:<regexp value>)<modifier>)
-
If part’s modifier is "
none
" or "optional
", then:-
Append "
(
" to the end of result. -
Append regexp value to the end of result.
-
Append "
)
" to the end of result. -
Append the result of running convert a modifier to a string given part’s modifier to the end of result.
-
-
Otherwise:
-
Append "
((?:
" to the end of result. -
Append regexp value to the end of result.
-
Append "
)
" to the end of result. -
Append the result of running convert a modifier to a string given part’s modifier to the end of result.
-
Append "
)
" to the end of result.
-
-
-
If part’s modifier is "
none
" or "optional
":This section handles non-repeating parts with a prefix or suffix. There is an inner capturing group that contains the primary regexp value. The inner group is then combined with the prefix or suffix in an outer non-capturing group. Finally the modifier is applied. The resulting form is as follows.
(?:<prefix>(<regexp value>)<suffix>)<modifier>
-
Append "
(?:
" to the end of result. -
Append the result of running escape a regexp string given part’s prefix to the end of result.
-
Append "
(
" to the end of result. -
Append regexp value to the end of result.
-
Append "
)
" to the end of result. -
Append the result of running escape a regexp string given part’s suffix to the end of result.
-
Append "
)
" to the end of result. -
Append the result of running convert a modifier to a string given part’s modifier to the end of result.
-
-
Assert: part’s modifier is "
zero-or-more
" or "one-or-more
". -
Assert: part’s prefix is not the empty string or part’s suffix is not the empty string.
Repeating parts with a prefix or suffix are dramatically more complicated. We want to exclude the initial prefix and the final suffix, but include them between any repeated elements. To achieve this we provide a separate initial expression that excludes the prefix. Then the expression is duplicated with the prefix/suffix values included in an optional repeating element. If zero values are permitted then a final optional modifier can be appended. The resulting form is as follows.
(?:<prefix>((?:<regexp value>)(?:<suffix><prefix>(?:<regexp value>))*)<suffix>)?
-
Append "
(?:
" to the end of result. -
Append the result of running escape a regexp string given part’s prefix to the end of result.
-
Append "
((?:
" to the end of result. -
Append regexp value to the end of result.
-
Append "
)(?:
" to the end of result. -
Append the result of running escape a regexp string given part’s suffix to the end of result.
-
Append the result of running escape a regexp string given part’s prefix to the end of result.
-
Append "
(?:
" to the end of result. -
Append regexp value to the end of result.
-
Append "
))*)
" to the end of result. -
Append the result of running escape a regexp string given part’s suffix to the end of result.
-
Append "
)
" to the end of result. -
If part’s modifier is "
zero-or-more
" then append "?
" to the end of result.
-
-
Append "
$
" to the end of result. -
Return (result, name list).
-
Assert: input is an ASCII string.
-
Let result be the empty string.
-
Let index be 0.
-
While index is less than input’s length:
-
Let c be input[index].
-
Increment index by 1.
-
If c is one of:
- U+002E (
.
); - U+002B (
+
); - U+002A (
*
); - U+003F (
?
); - U+005E (
^
); - U+0024 (
$
); - U+007B (
{
); - U+007D (
}
); - U+0028 (
(
); - U+0029 (
)
); - U+005B (
[
); - U+005D (
]
); - U+007C (
|
); - U+002F (
/
); or - U+005C (
\
),
then append "
\
" to the end of result. - U+002E (
-
Append c to the end of result.
-
-
Return result.
2.3. Converting part lists to pattern strings
-
Let result be the empty string.
-
Let index list be the result of getting the indices for part list.
-
For each index of index list:
-
Let part be part list[index].
-
Let previous part be part list[index - 1] if index is greater than 0, otherwise let it be null.
-
Let next part be part list[index + 1] if index is less than index list’s size - 1, otherwise let it be null.
-
If part’s type is "
fixed-text
" then:-
If part’s modifier is "
none
" then:-
Append the result of running escape a pattern string given part’s value to the end of result.
-
-
Append "
{
" to the end of result. -
Append the result of running escape a pattern string given part’s value to the end of result.
-
Append "
}
" to the end of result. -
Append the result of running convert a modifier to a string given part’s modifier to the end of result.
-
-
Let custom name be true if part’s name[0] is not an ASCII digit; otherwise false.
-
Let needs grouping be true if at least one of the following are true, otherwise let it be false:
- part’s suffix is not the empty string.
- part’s prefix is not the empty string and is not options’s prefix code point.
-
If all of the following are true:
- needs grouping is false; and
- custom name is true; and
- part’s type is "
segment-wildcard
"; and - part’s modifier is "
none
"; and - next part is not null; and
- next part’s prefix is the empty string; and
- next part’s suffix is the empty string
-
If next part’s type is "
fixed-text
":-
Set needs grouping to true if the result of running is a valid name code point given next part’s value's first code point and the boolean false is true.
-
-
Otherwise:
-
Set needs grouping to true if next part’s name[0] is an ASCII digit.
-
-
If all of the following are true:
- needs grouping is false; and
- part’s prefix is the empty string; and
- previous part is not null; and
- previous part’s type is "
fixed-text
"; and - previous part’s value's last code point is options’s prefix code point.
-
If needs grouping is true, then append "
{
" to the end of result. -
Append the result of running escape a pattern string given part’s prefix to the end of result.
-
If custom name is true:
-
Append "
:
" to the end of result. -
Append part’s name to the end of result.
-
-
If part’s type is "
regexp
" then:-
Append "
(
" to the end of result. -
Append part’s value to the end of result.
-
Append "
)
" to the end of result.
-
-
Otherwise if part’s type is "
segment-wildcard
" and custom name is false:-
Append "
(
" to the end of result. -
Append the result of running generate a segment wildcard regexp given options to the end of result.
-
Append "
)
" to the end of result.
-
-
Otherwise if part’s type is "
full-wildcard
":-
If custom name is false and one of the following is true:
- previous part is null; or
- previous part’s type is "
fixed-text
"; or - previous part’s modifier is not "
none
"; or - needs grouping is true; or
- part’s prefix is not the empty string
*
" to the end of result. -
Otherwise:
-
Append "
(
" to the end of result. -
Append full wildcard regexp value to the end of result.
-
Append "
)
" to the end of result.
-
-
-
If all of the following are true:
- part’s type is "
segment-wildcard
"; and - custom name is true; and
- part’s suffix is not the empty string; and
- The result of running is a valid name code point given part’s suffix's first code point and the boolean false is true
\
) to the end of result. - part’s type is "
-
Append the result of running escape a pattern string given part’s suffix to the end of result.
-
If needs grouping is true, then append "
}
" to the end of result. -
Append the result of running convert a modifier to a string given part’s modifier to the end of result.
-
-
Return result.
-
Assert: input is an ASCII string.
-
Let result be the empty string.
-
Let index be 0.
-
While index is less than input’s length:
-
Let c be input[index].
-
Increment index by 1.
-
If c is one of:
- U+002B (
+
); - U+002A (
*
); - U+003F (
?
); - U+003A (
:
); - U+007B (
{
); - U+007D (
}
); - U+0028 (
(
); - U+0029 (
)
); or - U+005C (
\
),
then append U+005C (
\
) to the end of result. - U+002B (
-
Append c to the end of result.
-
-
Return result.
-
If modifier is "
zero-or-more
", then return "*
". -
If modifier is "
optional
", then return "?
". -
If modifier is "
one-or-more
", then return "+
". -
Return the empty string.
3. Canonicalization
3.1. Encoding callbacks
-
If value is the empty string, return value.
-
Let dummyURL be a new URL record.
-
Let parseResult be the result of running the basic URL parser given value followed by "
://dummy.test
", with dummyURL as url.Note, state override is not used here because it enforces restrictions that are only appropriate for the
protocol
setter. Instead we use the protocol to parse a dummy URL using the normal parsing entry point. -
If parseResult is failure, then throw a
TypeError
. -
Return dummyURL’s scheme.
-
If value is the empty string, return value.
-
Let dummyURL be a new URL record.
-
Set the username given dummyURL and value.
-
Return dummyURL’s username.
-
If value is the empty string, return value.
-
Let dummyURL be a new URL record.
-
Set the password given dummyURL and value.
-
Return dummyURL’s password.
-
If value is the empty string, return value.
-
Let dummyURL be a new URL record.
-
Let parseResult be the result of running the basic URL parser given value with dummyURL as url and hostname state as state override.
-
If parseResult is failure, then throw a
TypeError
. -
Return dummyURL’s host, serialized, or empty string if it is null.
-
Let result be the empty string.
-
For each code point in value interpreted as a list of code points:
-
If all of the following are true:
- code point is not an ASCII hex digit;
- code point is not U+005B (
[
); - code point is not U+005D (
]
); and - code point is not U+003A (
:
),
then throw a
TypeError
. -
Append the result of running ASCII lowercase given code point to the end of result.
-
-
Return result.
-
If portValue is the empty string, return portValue.
-
Let dummyURL be a new URL record.
-
If protocolValue was given, then set dummyURL’s scheme to protocolValue.
Note, we set the URL record's scheme in order for the basic URL parser to recognize and normalize default port values.
-
Let parseResult be the result of running basic URL parser given portValue with dummyURL as url and port state as state override.
-
If parseResult is failure, then throw a
TypeError
. -
Return dummyURL’s port, serialized, or empty string if it is null.
-
If value is the empty string, then return value.
-
Let leading slash be true if the first code point in value is U+002F (
/
) and otherwise false. -
Let modified value be "
/-
" if leading slash is false and otherwise the empty string.The URL parser will automatically prepend a leading slash to the canonicalized pathname. This does not work here unfortunately. This algorithm is called for pieces of the pathname, instead of the entire pathname, when used as an encoding callback. Therefore we disable the prepending of the slash by inserting our own. An additional character is also inserted here in order to avoid inadvertantly collapsing a leading dot due to the fake leading slash being interpreted as a "
/.
" sequence. These inserted characters are then removed from the result below.Note, implementations are free to simply disable slash prepending in their URL parsing code instead of paying the performance penalty of inserting and removing characters in this algorithm.
-
Append value to the end of modified value.
-
Let dummyURL be a new URL record.
-
Let parseResult be the result of running basic URL parser given modified value with dummyURL as url and path start state as state override.
-
If parseResult is failure, then throw a
TypeError
. -
Let result be the result of URL path serializing dummyURL.
-
If leading slash is false, then set result to the code point substring from 2 to the end of the string within result.
-
Return result.
-
If value is the empty string, return value.
-
Let dummyURL be a new URL record.
-
Set dummyURL’s path to the empty string.
-
Let parseResult be the result of running URL parsing given value with dummyURL as url and opaque path state as state override.
-
If parseResult is failure, then throw a
TypeError
. -
Return the result of URL path serializing dummyURL.
-
If value is the empty string, return value.
-
Let dummyURL be a new URL record.
-
Set dummyURL’s query to the empty string.
-
Let parseResult be the result of running basic URL parser given value with dummyURL as url and query state as state override.
-
If parseResult is failure, then throw a
TypeError
. -
Return dummyURL’s query.
-
If value is the empty string, return value.
-
Let dummyURL be a new URL record.
-
Set dummyURL’s fragment to the empty string.
-
Let parseResult be the result of running basic URL parser given value with dummyURL as url and fragment state as state override.
-
If parseResult is failure, then throw a
TypeError
. -
Return dummyURL’s fragment.
3.2. URLPatternInit
processing
URLPatternInit
init, a string type, a string or null protocol, a string or null username, a string or null password, a string or null hostname, a string or null port, a string or null pathname, a string or null search, and a string or null hash:
-
Let result be the result of creating a new
URLPatternInit
. -
If protocol is not null, set result["
protocol
"] to protocol. -
If username is not null, set result["
username
"] to username. -
If password is not null, set result["
password
"] to password. -
If hostname is not null, set result["
hostname
"] to hostname. -
If pathname is not null, set result["
pathname
"] to pathname. -
Let baseURL be null.
-
The base URL can be used to supply additional context, but for each component, if init includes a component which is at least as specific as one in the base URL, none is inherited.
A component is more specific if it appears later in one of the following two lists (which are very similar to the order they appear in the URL syntax):
-
protocol, hostname, port, pathname, search, hash
-
protocol, hostname, port, username, password
Username and password are also never inherited from a base URL when constructing a
URLPattern
. (They are, however, inherited from the base URL when parsing a URL supplied as an argument totest()
orexec()
.)-
If baseURL is failure, then throw a
TypeError
. -
If init["
protocol
"] does not exist, then set result["protocol
"] to the result of processing a base URL string given baseURL’s scheme and type. -
If type is not "
pattern
" and init contains none of "protocol
", "hostname
", "port
" and "username
", then set result["username
"] to the result of processing a base URL string given baseURL’s username and type. -
If type is not "
pattern
" and init contains none of "protocol
", "hostname
", "port
", "username
" and "password
", then set result["password
"] to the result of processing a base URL string given baseURL’s password and type. -
If init contains neither "
protocol
" nor "hostname
", then:-
Let baseHost be baseURL’s host.
-
If baseHost is null, then set baseHost to the empty string.
-
Set result["
hostname
"] to the result of processing a base URL string given baseHost and type.
-
-
If init contains none of "
protocol
", "hostname
", and "port
", then:-
If baseURL’s port is null, then set result["
port
"] to the empty string. -
Otherwise, set result["
port
"] to baseURL’s port, serialized.
-
-
If init contains none of "
protocol
", "hostname
", "port
", and "pathname
", then set result["pathname
"] to the result of processing a base URL string given the result of URL path serializing baseURL and type. -
If init contains none of "
protocol
", "hostname
", "port
", "pathname
", and "search
", then:-
Let baseQuery be baseURL’s query.
-
If baseQuery is null, then set baseQuery to the empty string.
-
Set result["
search
"] to the result of processing a base URL string given baseQuery and type.
-
-
If init contains none of "
protocol
", "hostname
", "port
", "pathname
", "search
", and "hash
", then:-
Let baseFragment be baseURL’s fragment.
-
If baseFragment is null, then set baseFragment to the empty string.
-
Set result["
hash
"] to the result of processing a base URL string given baseFragment and type.
-
-
-
If init["
protocol
"] exists, then set result["protocol
"] to the result of process protocol for init given init["protocol
"] and type. -
If init["
username
"] exists, then set result["username
"] to the result of process username for init given init["username
"] and type. -
If init["
password
"] exists, then set result["password
"] to the result of process password for init given init["password
"] and type. -
If init["
hostname
"] exists, then set result["hostname
"] to the result of process hostname for init given init["hostname
"] and type. -
If init["
port
"] exists, then set result["port
"] to the result of process port for init given init["port
"], result["protocol
"], and type. -
-
If the following are all true:
- baseURL is not null;
- baseURL has an opaque path; and
- the result of running is an absolute pathname given result["
pathname
"] and type is false,
then:
-
Let baseURLPath be the result of running process a base URL string given the result of URL path serializing baseURL and type.
-
Let slash index be the index of the last U+002F (
/
) code point found in baseURLPath, interpreted as a sequence of code points, or null if there are no instances of the code point. -
If slash index is not null:
-
Let new pathname be the code point substring from 0 to slash index + 1 within baseURLPath.
-
Append result["
pathname
"] to the end of new pathname. -
Set result["
pathname
"] to new pathname.
-
-
Set result["
pathname
"] to the result of process pathname for init given result["pathname
"], result["protocol
"], and type.
-
If init["
search
"] exists then set result["search
"] to the result of process search for init given init["search
"] and type. -
If init["
hash
"] exists then set result["hash
"] to the result of process hash for init given init["hash
"] and type. -
Return result.
-
Assert: input is not null.
-
If type is not "
pattern
" return input. -
Return the result of escaping a pattern string given input.
-
If input is the empty string, then return false.
-
If input[0] is U+002F (
/
), then return true. -
If type is "
url
", then return false. -
If input’s code point length is less than 2, then return false.
-
If input[0] is U+005C (
\
) and input[1] is U+002F (/
), then return true. -
If input[0] is U+007B (
{
) and input[1] is U+002F (/
), then return true. -
Return false.
-
Let strippedValue be the given value with a single trailing U+003A (
:
) removed, if any. -
If type is "
pattern
" then return strippedValue. -
Return the result of running canonicalize a protocol given strippedValue.
-
If type is "
pattern
" then return value. -
Return the result of running canonicalize a username given value.
-
If type is "
pattern
" then return value. -
Return the result of running canonicalize a password given value.
-
If type is "
pattern
" then return value. -
Return the result of running canonicalize a hostname given value.
-
If type is "
pattern
" then return portValue. -
Return the result of running canonicalize a port given portValue and protocolValue.
-
If type is "
pattern
" then return pathnameValue. -
If protocolValue is a special scheme or the empty string, then return the result of running canonicalize a pathname given pathnameValue.
If the protocolValue is the empty string then no value was provided for
protocol
in the constructor dictionary. Normally we do not special case empty string dictionary values, but in this case we treat it as a special scheme in order to default to the most common pathname canonicalization. -
Return the result of running canonicalize an opaque pathname given pathnameValue.
-
Let strippedValue be the given value with a single leading U+003F (
?
) removed, if any. -
If type is "
pattern
" then return strippedValue. -
Return the result of running canonicalize a search given strippedValue.
-
Let strippedValue be the given value with a single leading U+0023 (
#
) removed, if any. -
If type is "
pattern
" then return strippedValue. -
Return the result of running canonicalize a hash given strippedValue.
4. Using URL patterns in other specifications
To promote consistency on the web platform, other documents integrating with this specification should adhere to the following guidelines, unless there is good reason to diverge.
-
Accept shorthands. Most author patterns will be simple and straightforward. Accordingly, APIs should accept shorthands for those common cases and avoid the need for authors to take additional steps to transform these into complete
URLPattern
objects. -
Respect the base URL. Just as URLs are generally parsed relative to a base URL for their environment (most commonly, a document base URL), URL patterns should respect this as well. The
URLPattern
constructor itself is an exception because it directly exposes the concept itself, similar to how the URL constructor does not respect the base URL even though the rest of the platform does. -
Be clear about regexp groups. Some APIs may benefit from only allowing URL patterns which do not have regexp groups, for example, because user agents are likely to implement them in a different thread or process from those executing author script, and because of security or performance concerns, a JavaScript engine would not ordinarily run there. If so, this should be clearly documented (with reference to has regexp groups) and the operation should report an error as soon as possible (e.g., by throwing a JavaScript exception). If possible, this should be feature-detectable to allow for the possibility of this constraint being lifted in the future. Avoid creating different subsets of URL patterns without consulting the editors of this specification.
-
Be clear about what URLs will be matched. For instance, algorithms during fetching are likely to operate on URLs with no fragment. If so, the specification should be clear that this is the case, and may advise showing a developer warning if a pattern which cannot match (e.g., because it requires a non-empty fragment) is used.
4.1. Integrating with JavaScript APIs
typedef (USVString or URLPatternInit or URLPattern );
URLPatternCompatible
JavaScript APIs should accept all of:
-
a
URLPattern
object -
a dictionary-like object which specifies the components required to construct a pattern
-
a string (in the constructor string syntax)
To accomplish this, specifications should accept URLPatternCompatible
as an argument to an operation or dictionary member, and process it using the following algorithm, using the appropriate environment settings object's API base URL or equivalent.
URLPattern
object from a Web IDL value URLPatternCompatible
input given URL baseURL and realm realm, perform the following steps:
-
If the specific type of input is
URLPattern
:-
Return input.
-
-
Otherwise:
-
Let pattern be a new
URLPattern
with realm. -
Set pattern’s associated URL pattern to the result of building a URL pattern from a Web IDL value given input and baseURL.
-
Return pattern.
-
URLPatternCompatible
input given URL baseURL, perform the following steps:
-
If the specific type of input is
URLPattern
:-
Return input’s associated URL pattern.
-
-
Otherwise, if the specific type of input is
URLPatternInit
: -
Otherwise:
-
Assert: The specific type of input is
USVString
. -
Return the result of creating a URL pattern given input, the serialization of baseURL, and an empty map.
-
This allows authors to concisely specify most patterns, and use the constructor to access uncommon options if necessary. The implicit use of the base URL is similar to, and consistent with, HTML’s parse a URL algorithm. [HTML]
4.2. Integrating with JSON data formats
JSON data formats which include URL patterns should mirror the behavior of JavaScript APIs and accept both:
-
an object which specifies the components required to construct a pattern
-
a string (in the constructor string syntax)
If a specification has an Infra value (e.g., after using parse a JSON string to an Infra value), use the following algorithm, using the appropriate base URL (by default, the URL of the JSON resource). [INFRA]
-
Let serializedBaseURL be the serialization of baseURL.
-
If rawPattern is a string, then:
-
Otherwise, if rawPattern is a map, then:
-
Let init be «[ "
baseURL
" → serializedBaseURL ]», representing a dictionary of typeURLPatternInit
. -
For each key → value of rawPattern:
-
If key is not the identifier of a dictionary member of
URLPatternInit
or one of its inherited dictionaries, value is not a string, or the member’s type is not declared to beUSVString
, then return null.This will need to be updated ifURLPatternInit
gains members of other types.A future version of this specification might also have a less strict mode, if that proves useful to other specifications. -
Set init[key] to value.
-
-
Return the result of creating a URL pattern given init, null, and an empty map.
It might become necessary in the future to plumb non-empty options here.
-
-
Otherwise, return null.
Specifications may wish to leave room in their formats to accept options for URLPatternOptions
, override the base URL, or similar, since it is not possible to construct a URLPattern
object directly in this case, unlike in a JavaScript API. For example, Speculation Rules accepts a "relative_to
" key which can be used to switch to using the document base URL instead of the JSON resource’s URL. [SPECULATION-RULES]
4.3. Integrating with HTTP header fields
HTTP headers which include URL patterns should accept a string in the constructor string syntax, likely as part of a structured field [RFC8941].
Specifications for HTTP headers should operate on URL patterns (e.g., using the match algorithm) rather than URLPattern
objects (which imply the existence of a JavaScript realm).
-
Let serializedBaseURL be the serialization of baseURL.
-
Return the result of creating a URL pattern given rawPattern, serializedBaseURL, and an empty map.
Acknowledgments
The editors would like to thank Alex Russell, Anne van Kesteren, Asa Kusuma, Blake Embrey, Cyrus Kasaaian, Daniel Murphy, Darwin Huang, Devlin Cronin, Domenic Denicola, Dominick Ng, Jake Archibald, Jeffrey Posnick, Jeremy Roman, Jimmy Shen, Joe Gregorio, Joshua Bell, Kenichi Ishibashi, Kenji Baheux, Kenneth Rohde Christiansen, Kingsley Ngan, Kinuko Yasuda, L. David Baron, Luca Casonato, Łukasz Anforowicz, Makoto Shimazu, Marijn Kruisselbrink, Matt Falkenhagen, Matt Giuca, Michael Landry, R. Samuel Klatchko, Rajesh Jagannathan, Ralph Chelala, Sangwhan Moon, Sayan Pal, Victor Costan, Yoshisato Yanagisawa, and Youenn Fablet for their contributors to this specification.
Special thanks to Blake Embrey and the other pillarjs/path-to-regexp contributors for building an excellent open source library that so many have found useful.
Also, special thanks to Kenneth Rohde Christiansen for his work on the polyfill. He put in extensive work to adapt to the changing URLPattern
API.
This standard is written by Ben Kelly (Google, wanderview@chromium.org), Jeremy Roman (Google, jbroman@chromium.org), and 宍戸俊哉 (Shunya Shishido, Google, sisidovski@chromium.org).
Intellectual property rights
Copyright © WHATWG (Apple, Google, Mozilla, Microsoft). This work is licensed under a Creative Commons Attribution 4.0 International License. To the extent portions of it are incorporated into source code, such portions in the source code are licensed under the BSD 3-Clause License instead.
This is the Living Standard. Those interested in the patent-review version should view the Living Standard Review Draft.