In this lesson, we'll look at character classes.
A character class is a special designation that specifies a search for any character from a particular set.
Let's look at a simple example of how character classes work. Suppose we only need to find letters from the alphabet. To do this, you can use character classes, which are described in square brackets, in our case that's the English alphabet: [a-z]
. We can see that all alphabetical characters in the string are highlighted:
/[a-z]
/
java
11_34-1938 tab
new line
You can search for numbers from zero to nine in the same way:
/[0-9]
/
java 11
_34
-1938
tab
new line
And in this example, we'll specify just 2 characters, each of which will be found:
/[aj]
/
ja
va
11_34-1938 ta
b
new line
With character classes, you can use a mechanism called negation. If we put the character ^
before the first character in square brackets, the search will be inverted and all the characters will be found except those listed after ^
:
/[^aj]
/
jav
a 11_34-1938 t
ab
new line
If you need to find a hyphen as well as letters from the alphabet, then you just need to enter it at the beginning or end of a group of characters; that way it won't be perceived as a special character:
/[aj-]
/
ja
va
11_34-
1938 ta
b
new line
Regular expressions often use special predefined character classes. They're written using the \
and have their own designations in the regular expression language. In the last lesson we used \
as an escape character. Here it's also used as part of the notation. Let's find all the digits in the text using \d
:
/\d
/
java 11
_34
-1938
tab
new line
If we specify a large D
, the search will retrieve all other characters, including whitespace and tabs.
/\D
/
java
11_
34-
1938tab
new line
There's also the \s
, class \S
, which is used to search for whitespace characters, and “\S”, in turn, represents all non-whitespace characters. As we can see, the principle of character classes is simple: a lowercase letter denotes a class, and an upper case letter denotes everything that doesn't belong to it.
There's another popular class \w
, which includes all letters of the alphabet, all numbers and underscores. The code below doesn't show it, but whitespace characters don't correspond to this class, nor does -
.
/\w
/
java
11_34
-1938
tab
new
line
\w
is equivalent to this notation: [0-9a-zA-Z_]
. Note that searches in character ranges are case-sensitive, so in this entry a-z
followed by A-Z
.
Accordingly, \W
finds the opposite of \w
. Here, hyphens and whitespace characters will be found:
/\W
/
java 11_34-
1938 tab
new line
The Hexlet support team or other students will answer you.
A professional subscription will give you full access to all Hexlet courses, projects and lifetime access to the theory of lessons learned. You can cancel your subscription at any time.
Programming courses for beginners and experienced developers. Start training for free
Our graduates work in companies:
From a novice to a developer. Get a job or your money back!
Sign up or sign in
Ask questions if you want to discuss a theory or an exercise. Hexlet Support Team and experienced community members can help find answers and solve a problem.