Register to get access to free programming courses with interactive exercises

Character classes Regular Expressions (Regexp)

In this lesson, we will discuss character classes.

A character class is a special designation that specifies a search for any character from a particular set.

Let us look at a simple example of how character classes work. Suppose we only need to find letters from the alphabet. To do this, you can describe character classes in square brackets, for example, the English alphabet: [a-z].

We can see that all alphabetical characters in the string are highlighted:


/[a-z]/

java 11_34-1938 tab

new line


You can search for numbers from zero to nine in the same way:


/[0-9]/

java 11_34-1938 tab

new line


And in this example, we specify just two characters, each of which will be found:


/[aj]/

java 11_34-1938 tab

new line


With character classes, you can use a mechanism called negation. It helps to invert the search.

When we put the character ^ before the first character in square brackets. This way we will find all characters except those listed after ^:


/[^aj]/

java 11_34-1938 tab

new line


If we need to find a hyphen and letters from the alphabet, we enter them at the beginning or end of a group of characters. That way, the hyphen will not be perceived as a special character:


/[aj-]/

java 11_34-1938 tab

new line


Regular expressions often use special predefined character classes. They are written using the \ and have their designations in the regular expression language.

In the previous lesson, we used \ as an escape character. Here we also use it as part of the notation.

Let us find all the digits in the text using \d:


/\d/

java 11_34-1938 tab

new line


If we specify a large D, the search will retrieve all other characters, including whitespace and tabs:


/\D/

java11_34-1938tab

new line


There are also:

  • The class\s, which helps search for whitespace characters
  • The class \S, representing all non-whitespace characters

As we can see, the principle is simple. Lowercase letters denote classes, and uppercase letters represent everything that does not belong to it.

There is another popular class \w. It includes all letters of the alphabet, all numbers, and underscores. The code below does not show it, but whitespace characters do not correspond to this class, nor does -:


/\w/

java 11_34-1938 tab

new line


The class\w is equivalent to this notation: [0-9a-zA-Z_]. Note that searches in character ranges are case-sensitive, so a-z is followed by A-Z.

Accordingly, \W searches for the opposite of \w. So we can find hyphens and whitespace characters:


/\W/

java 11_34-1938 tab

new line



Are there any more questions? Ask them in the Discussion section.

The Hexlet support team or other students will answer you.

For full access to the course you need a professional subscription.

A professional subscription will give you full access to all Hexlet courses, projects and lifetime access to the theory of lessons learned. You can cancel your subscription at any time.

Get access
130
courses
1000
exercises
2000+
hours of theory
3200
tests

Sign up

Programming courses for beginners and experienced developers. Start training for free

  • 130 courses, 2000+ hours of theory
  • 1000 practical tasks in a browser
  • 360 000 students
By sending this form, you agree to our Personal Policy and Service Conditions

Our graduates work in companies:

Bookmate
Health Samurai
Dualboot
ABBYY
Suggested learning programs
profession
Development of front-end components for web applications
10 months
from scratch
Start at any time

Use Hexlet to the fullest extent!

  • Ask questions about the lesson
  • Test your knowledge in quizzes
  • Practice in your browser
  • Track your progress

Sign up or sign in

By sending this form, you agree to our Personal Policy and Service Conditions
Toto Image

Ask questions if you want to discuss a theory or an exercise. Hexlet Support Team and experienced community members can help find answers and solve a problem.