One thing that has confused me for no reason whatsoever are regular expressions. So hopefully this tutorial will help you learn some basics about regular expressions.
What even is RegEXP? (Regular EXPressions) Well, they are a 'language' of themselves, and they allow you to match parts of the string. For example, if you wanted to match the color in each of these sentences:
Bob has a blue shirt. Joe has a green shirt. Amy has an orange shirt. Coder100 has a brown shirt.
A suitable regular expression would be:
/\w+ has an? (\w+) shirt./
Creating Regular Expressions are simple. There are two ways you can approach this. The first way is to use a literal:
/... your regexp .../... your flags ...
or, to call a constructor:
new RegExp("...your regexp...", "... your flags ...");
Flags are like 'configuration options'. The most common flag is the
g flag. This means you will capture all the matches in one go. This is a table of all the flags that you will probably ever use:
|Captures all the matches in one go.|
|The match is case insensitive|
What does the
m flag mean? The
m flag means that the start of the string will be on each line instead of index 0, and the end of the string will be a newline instead of the last index.
Regular Expressions can almost be treated as strings, but more 'relaxed'. For example:
Regular Expressions contain 'wildcards'. You can specify the specifics of a wildcard.
|Any alphanumeric character (case insensitive)|
|Any non-alphanumeric character|
|Any whitespace (newlines included)|
|Any non-whitespace character|
|A word boundary|
|Non word boundary|
|Match any character except newline.|
The first four should be very straightforward, but what is a word boundary? A word boundary means you want to match a whole word. For example, if you wanted to match
op, you will also match the op in
operator if you don't add
\b. Conversely, if you want to match the op
operator, you can add a
op\B // matches 'operator' \bop\b // matches only 'op'
Most commonly, you will only use the latter (the two
Finally, the dot operator allows you to match any character except for a newline.
What if you wanted to only match characters, or only match specific numbers? Come sets and ranges! The syntax is like this:
For example, if you wanted to match
c, you would do:
We call this a set.
You can also shorten that to:
We call this a range.
This applies to unicode characters too!
If you wanted to match anything but the
c, you add a
^. This is because
^ means XOR in JS, which is the equivalent of
An alternative to ranges is the
| operator. The previous example could be shortened to:
Finally, we have grouping. Before we begin, one definition that should be known is a match. Like our first example of colors, the match is the color. The expression is the description of what should be matched.
Grouping is easily done like:
Now, this will match
cd, but not individually (like
Now, if you wanted to take that group and put return it as a match, you just remove the
?: non-capturing group delimiter.
Now it will return
cd as a match. This also applies to sets as well, and you can also put groups inside groups!
More than one
What if you wanted to match at least one
a in a row? Simple! We have quantifiers to do this.
|match one or more of the preceding token.|
|match zero or more of the preceding token.|
|match zero or one of the preceding token.|
|match n-m of the preceding token.|
The first two should be pretty straightforward, but what about the last one?
The last one allows you to match a custom amount of them. For example, this implements
m can be omitted if you want an unknown number of matches.
Now, with this, we can now match function arguments! Given:
a, b, c, d
We parse it as:
Note: Any successive match will only return the last match. Given the previous example, our matches would be:
One thing you must note is that when trying to make a match like this:
you will actually also match all of this:
hello world world
which might not be what you want. This is because regular expressions are greedy by default. This means they will try to match as much as they can. To make it match as few characters as it can, just add a
? after the quantifier.
this will match:
**hello world** world
What's the point of learning anything if you aren't going to use it? Here, I will introduce to you many common operations that you will often encounter.
This will test if your regular expression matches a given string.
/hello/.test("hello") // true /hello/.test("hi") // false
This is the most common of all operations. You can just think that it returns an array, but it does give you a few more information.
"she's broken".match(/s(he's) br(ok)en/) // [ "she's broken", "he's", "ok" ]
The zeroeth index returns the whole match, and everything other is the numbered order of the matches.
This returns a new string with your replacements.
"she's broken".replace(/s(he's) br(ok)en/, "$1 $2"); // he's ok
You can read more about replace here
g flag enabled, it will capture all the matches, and return an array of the matches.
[..."she's brokenshe's broken".matchAll(/s(he's) br(ok)en/g)] // [ [ "she's broken", "he's", "ok" ], [ "she's broken", "he's", "ok" ] ]
Testing your regex and debugging it is hard without a tool. One tool I use the most is regexr, which allows you test out expressions, and also explain it. You can also save expressions and not reinvent the wheel but seeing what others have done.
Hopefully by the end of this tutorial, you are able to match just about anything!