Regular Expressions: Describe Anything!
One thing that has confused me for no reason whatsoever are regular expressions. So hopefully this tutorial will help you learn some basics about regular expressions.
Matching
What even is RegEXP? (Regular EXPressions) Well, they are a 'language' of themselves, and they allow you to match parts of the string. For example, if you wanted to match the color in each of these sentences:
Bob has a blue shirt.
Joe has a green shirt.
Amy has an orange shirt.
Coder100 has a brown shirt.
A suitable regular expression would be:
/\w+ has an? (\w+) shirt./
Syntax
Creating Regular Expressions are simple. There are two ways you can approach this. The first way is to use a literal:
/... your regexp .../... your flags ...
or, to call a constructor:
new RegExp("...your regexp...", "... your flags ...");
Flags are like 'configuration options'. The most common flag is the g
flag. This means you will capture all the matches in one go. This is a table of all the flags that you will probably ever use:
flag | desc |
---|---|
g | Captures all the matches in one go. |
i | The match is case insensitive |
m | Capture multiline |
What does the m
flag mean? The m
flag means that the start of the string will be on each line instead of index 0, and the end of the string will be a newline instead of the last index.
Syntax
Regular Expressions can almost be treated as strings, but more 'relaxed'. For example:
/hello, world!/
will match hello, world!
.
Escapes
Regular Expressions contain 'wildcards'. You can specify the specifics of a wildcard.
expression | description |
---|---|
\w | Any alphanumeric character (case insensitive) |
\W | Any non-alphanumeric character |
\s | Any whitespace (newlines included) |
\S | Any non-whitespace character |
\b | A word boundary |
\B | Non word boundary |
\n | newline |
. | Match any character except newline. |
The first four should be very straightforward, but what is a word boundary? A word boundary means you want to match a whole word. For example, if you wanted to match op
, you will also match the op in operator
if you don't add \b
. Conversely, if you want to match the op operator
, you can add a \B
:
op\B // matches 'operator'
\bop\b // matches only 'op'
Most commonly, you will only use the latter (the two \b
).
Finally, the dot operator allows you to match any character except for a newline.
Set
What if you wanted to only match characters, or only match specific numbers? Come sets and ranges! The syntax is like this:
[...chars...]
For example, if you wanted to match a
or b
or c
, you would do:
[abc]
We call this a set.
You can also shorten that to:
[a-c]
We call this a range.
This applies to unicode characters too!
If you wanted to match anything but the a
, b
, and c
, you add a ^
. This is because ^
means XOR in JS, which is the equivalent of !=
.
[^a-c]
|
operator
An alternative to ranges is the |
operator. The previous example could be shortened to:
a|b|c
Grouping
Finally, we have grouping. Before we begin, one definition that should be known is a match. Like our first example of colors, the match is the color. The expression is the description of what should be matched.
Grouping is easily done like:
(?:ab)|(?:cd)
Now, this will match ab
or cd
, but not individually (like a
or d
).
Now, if you wanted to take that group and put return it as a match, you just remove the ?:
non-capturing group delimiter.
(ab)|(cd)
Now it will return ab
or cd
as a match. This also applies to sets as well, and you can also put groups inside groups!
[(ab)(cd)]
(?:(ab)|(cd))
More than one
What if you wanted to match at least one a
in a row? Simple! We have quantifiers to do this.
symbol | desc |
---|---|
+ | match one or more of the preceding token. |
* | match zero or more of the preceding token. |
? | match zero or one of the preceding token. |
{n,m} | match n-m of the preceding token. |
The first two should be pretty straightforward, but what about the last one?
The last one allows you to match a custom amount of them. For example, this implements *
perfectly:
{0,}
+
:
{1,}
?
:
{1,2}
m
can be omitted if you want an unknown number of matches.
Now, with this, we can now match function arguments! Given:
a, b, c, d
We parse it as:
(\w+)(?:, (\w+))*
Note: Any successive match will only return the last match. Given the previous example, our matches would be:
a
andd
.
Laziness
One thing you must note is that when trying to make a match like this:
.+ world
you will actually also match all of this:
hello world world
which might not be what you want. This is because regular expressions are greedy by default. This means they will try to match as much as they can. To make it match as few characters as it can, just add a ?
after the quantifier.
.+? world
this will match:
**hello world** world
Javascript operations
What's the point of learning anything if you aren't going to use it? Here, I will introduce to you many common operations that you will often encounter.
.test
This will test if your regular expression matches a given string.
/hello/.test("hello") // true
/hello/.test("hi") // false
.match
This is the most common of all operations. You can just think that it returns an array, but it does give you a few more information.
"she's broken".match(/s(he's) br(ok)en/) // [ "she's broken", "he's", "ok" ]
The zeroeth index returns the whole match, and everything other is the numbered order of the matches.
.replace
This returns a new string with your replacements.
"she's broken".replace(/s(he's) br(ok)en/, "$1 $2"); // he's ok
You can read more about replace here
.matchAll
With the g
flag enabled, it will capture all the matches, and return an array of the matches.
[..."she's brokenshe's broken".matchAll(/s(he's) br(ok)en/g)] // [ [ "she's broken", "he's", "ok" ], [ "she's broken", "he's", "ok" ] ]
regexr
Testing your regex and debugging it is hard without a tool. One tool I use the most is regexr, which allows you test out expressions, and also explain it. You can also save expressions and not reinvent the wheel but seeing what others have done.
https://regexr.com/
happy debugging!
Conclusion
Hopefully by the end of this tutorial, you are able to match just about anything!
Wow!
I'm still confused as heck as to how you use regex!
Nice job putting together this tutorial tho :)
You might also want to mention capture groups, which allow you to capture the matched values, and allow you to make things like your Python interpreter:
var evaluate = new Function(`try{var doc = document.getElementById("repl").contentDocument||document.getElementById("repl").contentWindow.document;doc.body.innerHTML="Python interpreter by @Coder100<br><br>";doc.body.style.color="#fff";doc.body.style.fontFamily="'Roboto Mono',monospace";function flog(){console.log(Array.from(arguments).join(" "));doc.body.innerHTML+=Array.from(arguments).join(" ")+"<br>";}${boilerplate
.replace(/"/g,"'")
.replace(/print/g, "flog")
.replace(/#/g,"//")
.replace(/input/g,"prompt")
.replace(/def +(.+)\((.*)\): *((?:\n +(?:.+))+)/g,"function $1($2){$3}")
.replace(/lambda +(.+) +: +(.+)$/gm, 'new Function("function($1){return $2}")')
.replace(/("""|''')/g,"`")
.replace(/pass/g, "return;")
.replace(/</g,"<")
}}catch(e){console.log(e)}`);
evaluate();
I think I did add capturing groups, maybe I forgot? lol @programmeruser
the fact is they should be workign @realTronsi
symbol | desc |
---|---|
+ | match one or more of the preceding token. |
@Coder100 wow they suddenly work now wtf
lol idk @realTronsi
Very cool! Seems like you're making lots of tutorials, Nice!
yeah, ig i am, thanks! @JBYT27
coder100 | coderbot100 |
---|---|
is not a bot | is a bot |
@Coder100 how tF
wait nvm thats an image
(I hope)
i dont think it is. im pretty sure its a table made with characters on keyboard... LOL @TsunamiOrSumth
Yes |
---|
t o t a l l y |
lol what i was testing out tables for some reason they aren't working @JBYT27
oh, ok. lol @Coder100
@TsunamiOrSumth https://www.tablesgenerator.com/markdown_tables
Operator | Name | Description | Example |
---|---|---|---|
+ | Addition | Adds together two values | x + y |
- | Subtraction | Subtracts one value from another | x - y |
* | Multiplication | Multiplies two values | x * y |
/ | Division | Divides one value by another | x / y |
% | Modulus | Returns the division remainder | x % y |
++ | Increment | Increases the value of a variable by 1 | ++x |
-- | Decrement | Decreases the value of a variable by 1 | --x |
Also, why are all of your comments being upvoted?
@programmeruser idk but its not me, probably just some rando
oh no coderbot100's goal is to get as many cycles for himself as possible, I don't use him other than to mark him as correct answer @programmeruser
Nice job! XOR is not
!=
thoequivalent as in:
1 ^ 0 == 1
1 != 0 == 1
@CursorsDev
true != 1 tho /s
@CursorsDev hmm
fixed