Regular Expressions: Describe Anything!
h
Coder100 (12821)

One thing that has confused me for no reason whatsoever are regular expressions. So hopefully this tutorial will help you learn some basics about regular expressions.

Matching

What even is RegEXP? (Regular EXPressions) Well, they are a 'language' of themselves, and they allow you to match parts of the string. For example, if you wanted to match the color in each of these sentences:

Bob has a blue shirt.
Joe has a green shirt.
Amy has an orange shirt.
Coder100 has a brown shirt.

A suitable regular expression would be:

/\w+ has an? (\w+) shirt./

Syntax

Creating Regular Expressions are simple. There are two ways you can approach this. The first way is to use a literal:

/... your regexp .../... your flags ...

or, to call a constructor:

new RegExp("...your regexp...", "... your flags ...");

Flags are like 'configuration options'. The most common flag is the g flag. This means you will capture all the matches in one go. This is a table of all the flags that you will probably ever use:

flagdesc
gCaptures all the matches in one go.
iThe match is case insensitive
mCapture multiline

What does the m flag mean? The m flag means that the start of the string will be on each line instead of index 0, and the end of the string will be a newline instead of the last index.

Syntax

Regular Expressions can almost be treated as strings, but more 'relaxed'. For example:

/hello, world!/

will match hello, world!.

Escapes

Regular Expressions contain 'wildcards'. You can specify the specifics of a wildcard.

expressiondescription
\wAny alphanumeric character (case insensitive)
\WAny non-alphanumeric character
\sAny whitespace (newlines included)
\SAny non-whitespace character
\bA word boundary
\BNon word boundary
\nnewline
.Match any character except newline.

The first four should be very straightforward, but what is a word boundary? A word boundary means you want to match a whole word. For example, if you wanted to match op, you will also match the op in operator if you don't add \b. Conversely, if you want to match the op operator, you can add a \B:

op\B // matches 'operator'
\bop\b // matches only 'op'

Most commonly, you will only use the latter (the two \b).

Finally, the dot operator allows you to match any character except for a newline.

Set

What if you wanted to only match characters, or only match specific numbers? Come sets and ranges! The syntax is like this:

[...chars...]

For example, if you wanted to match a or b or c, you would do:

[abc]

We call this a set.
You can also shorten that to:

[a-c]

We call this a range.
This applies to unicode characters too!
If you wanted to match anything but the a, b, and c, you add a ^. This is because ^ means XOR in JS, which is the equivalent of !=.

[^a-c]

| operator

An alternative to ranges is the | operator. The previous example could be shortened to:

a|b|c

Grouping

Finally, we have grouping. Before we begin, one definition that should be known is a match. Like our first example of colors, the match is the color. The expression is the description of what should be matched.

Grouping is easily done like:

(?:ab)|(?:cd)

Now, this will match ab or cd, but not individually (like a or d).

Now, if you wanted to take that group and put return it as a match, you just remove the ?: non-capturing group delimiter.

(ab)|(cd)

Now it will return ab or cd as a match. This also applies to sets as well, and you can also put groups inside groups!

[(ab)(cd)]
(?:(ab)|(cd))

More than one

What if you wanted to match at least one a in a row? Simple! We have quantifiers to do this.

symboldesc
+match one or more of the preceding token.
*match zero or more of the preceding token.
?match zero or one of the preceding token.
{n,m}match n-m of the preceding token.

The first two should be pretty straightforward, but what about the last one?
The last one allows you to match a custom amount of them. For example, this implements * perfectly:

{0,}

+:

{1,}

?:

{1,2}

m can be omitted if you want an unknown number of matches.

Now, with this, we can now match function arguments! Given:

a, b, c, d

We parse it as:

(\w+)(?:, (\w+))*

Note: Any successive match will only return the last match. Given the previous example, our matches would be: a and d.

Laziness

One thing you must note is that when trying to make a match like this:

.+ world

you will actually also match all of this:

hello world world

which might not be what you want. This is because regular expressions are greedy by default. This means they will try to match as much as they can. To make it match as few characters as it can, just add a ? after the quantifier.

.+? world

this will match:

**hello world** world

Javascript operations

What's the point of learning anything if you aren't going to use it? Here, I will introduce to you many common operations that you will often encounter.

.test

This will test if your regular expression matches a given string.

/hello/.test("hello") // true
/hello/.test("hi") // false

.match

This is the most common of all operations. You can just think that it returns an array, but it does give you a few more information.

"she's broken".match(/s(he's) br(ok)en/) // [ "she's broken", "he's", "ok" ]

The zeroeth index returns the whole match, and everything other is the numbered order of the matches.

.replace

This returns a new string with your replacements.

"she's broken".replace(/s(he's) br(ok)en/, "$1 $2"); // he's ok

You can read more about replace here

.matchAll

With the g flag enabled, it will capture all the matches, and return an array of the matches.

[..."she's brokenshe's broken".matchAll(/s(he's) br(ok)en/g)] // [ [ "she's broken", "he's", "ok" ], [ "she's broken", "he's", "ok" ] ]

regexr

Testing your regex and debugging it is hard without a tool. One tool I use the most is regexr, which allows you test out expressions, and also explain it. You can also save expressions and not reinvent the wheel but seeing what others have done.
https://regexr.com/
happy debugging!

Conclusion

Hopefully by the end of this tutorial, you are able to match just about anything!

You are viewing a single comment. View All
Coder100 (12821)

oh no coderbot100's goal is to get as many cycles for himself as possible, I don't use him other than to mark him as correct answer @programmeruser