In this lesson, we will discuss character classes.
A character class is a special designation that specifies a search for any character from a particular set.
Let us look at a simple example of how character classes work. Suppose we only need to find letters from the alphabet. To do this, you can describe character classes in square brackets, for example, the English alphabet: [a-z]
.
We can see that all alphabetical characters in the string are highlighted:
/[a-z]
/
java
11_34-1938 tab
new line
You can search for numbers from zero to nine in the same way:
/[0-9]
/
java 11
_34
-1938
tab
new line
And in this example, we specify just two characters, each of which will be found:
/[aj]
/
ja
va
11_34-1938 ta
b
new line
With character classes, you can use a mechanism called negation. It helps to invert the search.
When we put the character ^
before the first character in square brackets. This way we will find all characters except those listed after ^
:
/[^aj]
/
jav
a 11_34-1938 t
ab
new line
If we need to find a hyphen and letters from the alphabet, we enter them at the beginning or end of a group of characters. That way, the hyphen will not be perceived as a special character:
/[aj-]
/
ja
va
11_34-
1938 ta
b
new line
Regular expressions often use special predefined character classes. They are written using the \
and have their designations in the regular expression language.
In the previous lesson, we used \
as an escape character. Here we also use it as part of the notation.
Let us find all the digits in the text using \d
:
/\d
/
java 11
_34
-1938
tab
new line
If we specify a large D
, the search will retrieve all other characters, including whitespace and tabs:
/\D
/
java
11_
34-
1938tab
new line
There are also:
- The class
\s
, which helps search for whitespace characters - The class
\S
, representing all non-whitespace characters
As we can see, the principle is simple. Lowercase letters denote classes, and uppercase letters represent everything that does not belong to it.
There is another popular class \w
. It includes all letters of the alphabet, all numbers, and underscores. The code below does not show it, but whitespace characters do not correspond to this class, nor does -
:
/\w
/
java
11_34
-1938
tab
new
line
The class\w
is equivalent to this notation: [0-9a-zA-Z_]
. Note that searches in character ranges are case-sensitive, so a-z
is followed by A-Z
.
Accordingly, \W
searches for the opposite of \w
. So we can find hyphens and whitespace characters:
/\W
/
java 11_34-
1938 tab
new line
Are there any more questions? Ask them in the Discussion section.
The Hexlet support team or other students will answer you.
For full access to the course you need a professional subscription.
A professional subscription will give you full access to all Hexlet courses, projects and lifetime access to the theory of lessons learned. You can cancel your subscription at any time.