Register to get access to free programming courses with interactive exercises

Grouping. Backreferences Regular Expressions (Regexp)

In this lesson, we will look at additional features and different grouping types.

Backreferences

We have a group of symbols from which we choose either ta or tu:


/(ta|tu)/

ta-tu ta-ta tu-tu


Suppose we want to find only those substrings in which the left and right parts match: ta - ta and tu - tu.

Let us try to add another "or" condition to our expression. That way, we will see that we have not got what we wanted:


/(ta|tu)-(ta|tu)/

ta-tu ta-ta tu-tu


It is the case when backreferencing helps. It works as follows. We use the special notation \1, which shows that we should substitute the characters from the first group for \1.

Thus, we will find substrings with the same left and right parts:


/(ta|tu)-\1/

ta-tu ta-ta tu-tu


By default, we create all character groups, write them to a specific memory area, and label them with characters from \1 to \9.

When we use quantification, it does not affect the result. The quantification is not involved in the backreference, so we take only the first occurrence in the memory area:


/(ta|tu)+-\1/

ta-tu ta-ta tu-tu


Named groups

When programmers have multiple groups, they do not find it very convenient to remember them by number. It is much easier to use names. To do this, you must add ?<name> after opening the bracket:


/(?<group1>ta|tu)-\k<group1>/

ta-tu ta-ta tu-tu


Now you can refer to the group using the name group1 to perform operations on the group1 in your code.

Disabling backreferencing

We can turn off backreferencing by putting a ?: inside our group:


/(?:ta|tu)-\1/

ta-tu ta-ta tu-tu


After that, we do not save the group to the memory area. An error can occur when calling it since the group does not exist in the memory.

If you use this approach, the regular expression will get very difficult to read, but it will work faster. This method works 100% of the time if:

  • You have a lot of groups and do not need them
  • You want to avoid using them to save up space and avoid interference with further grouping

Atomic grouping

Another interesting kind of grouping without backreferencing is atomic grouping.

JavaScript, Python, and other popular programming languages do not support atomic grouping. But you can google solutions to emulate them with existing constructions.

For atomic grouping, we use : instead of >:


/a(?>bc|b|x)cc/

abccaxcc


If we remove ?>, the regex will find two substrings — abcc and axcc:


/a(bc|b|x)cc/

abccaxcc


When we add the atomic grouping characters, ?>, the following happens: we find first a, then bc, then cc. Usually, in the example above, the search would have rolled back to a and continued checking from b since the alternation character | is present. Then, we would get to cc, and the check would work.

But with atomic grouping, the return along the string back to a is disabled. It continues moving along the alternatives bc -> b -> x. After x we find cc.

Once we find the first match from the atomic group (?>bc|b|x), other variants from this group do not get considered. Then the next character of the analyzed string is searched from the first character of the regular expression.

We would only be able to find a match for the whole string with atomic grouping if we added another c to it:


/a(?>bc|b|x)cc/

abcccaxcc


Are there any more questions? Ask them in the Discussion section.

The Hexlet support team or other students will answer you.

For full access to the course you need a professional subscription.

A professional subscription will give you full access to all Hexlet courses, projects and lifetime access to the theory of lessons learned. You can cancel your subscription at any time.

Get access
130
courses
1000
exercises
2000+
hours of theory
3200
tests

Sign up

Programming courses for beginners and experienced developers. Start training for free

  • 130 courses, 2000+ hours of theory
  • 1000 practical tasks in a browser
  • 360 000 students
By sending this form, you agree to our Personal Policy and Service Conditions

Our graduates work in companies:

Bookmate
Health Samurai
Dualboot
ABBYY
Suggested learning programs
profession
Development of front-end components for web applications
10 months
from scratch
Start at any time

Use Hexlet to the fullest extent!

  • Ask questions about the lesson
  • Test your knowledge in quizzes
  • Practice in your browser
  • Track your progress

Sign up or sign in

By sending this form, you agree to our Personal Policy and Service Conditions
Toto Image

Ask questions if you want to discuss a theory or an exercise. Hexlet Support Team and experienced community members can help find answers and solve a problem.