Register to get access to free programming courses with interactive exercises

Grouping. Backreferences Regular Expressions (Regexp)

Let's look at some additional features and different types of grouping.

Backreferences

We have a group of symbols from which we choose either ta or tu:


/(ta|tu)/

ta-tu ta-ta tu-tu


Suppose we want to find only those substrings in which the left and right parts match: ta - ta and tu - tu. Let's try to add another “or” condition to our expression; we'll see that we haven't got what we wanted:


/(ta|tu)-(ta|tu)/

ta-tu ta-ta tu-tu


This is where backreferencing helps. It works as follows. We use special notation \1, which says that the characters from the first group – and we have only one group – should be substituted for \1. Thus, substrings with the same left and right parts will be found:


/(ta|tu)-\1/

ta-tu ta-ta tu-tu


By default, all character groups that we create are written to a special memory area and labeled with characters from \1 to \9. If we used quantification, it wouldn't affect the result because it isn't involved in the backreference, and only the first occurrence in the memory region is taken:


/(ta|tu)+-\1/

ta-tu ta-ta tu-tu


Named groups

If you use multiple groups, it's not very convenient to remember them by number. It's much easier to use names. To do this, you must add ?<name> after opening the bracket.


/(?<group1>ta|tu)-\k<group1>/

ta-tu ta-ta tu-tu


Now you can refer to the group using the name group1 to perform operations on the group1 in your code.

Disabling backreferencing

We can turn off backreferencing by putting a ?: inside our group:


/(?:ta|tu)-\1/

ta-tu ta-ta tu-tu


After that, the group won't be saved to a special memory area, and an error will occur when calling it since the group doesn't exist in the memory. If you use this approach, the regular expression will get very difficult to read, but it will work faster. This is method works 100% of the time if you have a lot of groups and don't need them, or if you want to avoid using them so that they don't take up much space and don't interfere with further grouping.

Atomic grouping

Another interesting kind of grouping without backreferencing is called atomic grouping. NB! Atomic grouping isn't supported in some popular programming languages, including JavaScript and Python. But you can google solutions to emulate them with existing constructions.

For atomic grouping, we use : instead of >:


/a(?>bc|b|x)cc/

abccaxcc


Let's have a look at how it works. If we remove ?>, regex will find two substrings: abcc and axcc:


/a(bc|b|x)cc/

abccaxcc


When we add the atomic grouping characters, ?>, the following happens: first a, is found, then bc, then cc. Normally, in the example above, the search would have rolled back to a and continued checking from b, since the alternation character | is present. After that we would get to cc and the check would work.

But with atomic grouping, the return along the string back to a is disabled, and it continues moving along the alternatives bc -> b -> x, and after x we find cc.

Once the first match from the atomic group is found (?>bc|b|x), other variants from this group don't get considered. Then the next character of the analyzed string is searched from the first character of the regular expression.

We would only be able to find a match for the whole string with atomic grouping if we added another c to the string:


/a(?>bc|b|x)cc/

abcccaxcc


Hexlet Experts

Are there any more questions? Ask them in the Discussion section.

The Hexlet support team or other students will answer you.

About Hexlet learning process

For full access to the course you need a professional subscription.

A professional subscription will give you full access to all Hexlet courses, projects and lifetime access to the theory of lessons learned. You can cancel your subscription at any time.

Get access
130
courses
1000
exercises
2000+
hours of theory
3200
tests

Sign up

Programming courses for beginners and experienced developers. Start training for free

  • 130 courses, 2000+ hours of theory
  • 1000 practical tasks in a browser
  • 360 000 students
By sending this form, you agree to our Personal Policy and Service Conditions

Our graduates work in companies:

<span class="translation_missing" title="translation missing: en.web.courses.lessons.registration.bookmate">Bookmate</span>
<span class="translation_missing" title="translation missing: en.web.courses.lessons.registration.healthsamurai">Healthsamurai</span>
<span class="translation_missing" title="translation missing: en.web.courses.lessons.registration.dualboot">Dualboot</span>
<span class="translation_missing" title="translation missing: en.web.courses.lessons.registration.abbyy">Abbyy</span>
Suggested learning programs

From a novice to a developer. Get a job or your money back!

Frontend Developer icon
Profession
beginner
Development of front-end components for web applications
start anytime 10 months

Use Hexlet to the fullest extent!

  • Ask questions about the lesson
  • Test your knowledge in quizzes
  • Practice in your browser
  • Track your progress

Sign up or sign in

By sending this form, you agree to our Personal Policy and Service Conditions
Toto Image

Ask questions if you want to discuss a theory or an exercise. Hexlet Support Team and experienced community members can help find answers and solve a problem.