Regular Expressions (Regexp)

Theory: Look forward/backward

Most implementations of regular expressions support “lookahead” and “lookback”, or “lookbehind” and “lookback”. Let's see what they're for.

We have the following regular expression, which two substrings:


/LudovicXVI/

LudovicXV, LudovicXVI, LudovicXVIII, LudovicLXVII, LudovicXXL


Suppose we don't need to include the part of the substring with the Roman numerals XVI in the search results. To do this, we can wrap it up like this:


/Ludovic(?=XVI)/

LudovicXV, LudovicXVI, LudovicXVIII, LudovicLXVII, LudovicXXL


As we can see, the matching conditions set by the original expression haven't changed. The same substring was matched as in the previous example. However, the characters XVI in the matched substring weren't included in the final search result. This behavior is called positive lookahead.

The logic of positive lookahead can be described as follows. The regular expression a(?=b) finds matches where a, is followed by b, but without making b part of the match.

Forward lookahead can also be negative. In that case, it will look for matches in those substrings where the part of the substring specified in brackets is not there. In our case, that's still XVI. To turn a positive lookahead into a negative one, replace = with !. Now we have the other three substrings matched:


/Ludovic(?!XVI)/

LudovicXV, LudovicXVI, LudovicXVIII, LudovicLXVII, LudovicXXL


In addition to lookahead, there's also lookbehind, or retrospective searching. It works in a similar way, but it looks for matches of characters after the bracketed part of the regular expression (the bracketed part isn't included in the match).

In other words, the regular expression (?<=b)a finds matches of a, that have b in front of it, but without making b part of the match.

An additional sign <. is used for positive lookbehind. In this example, we find a match for the substring Two, which is followed by One:


/(?<=One )Two/

One Two, Three Two


To change the positive lookbehind back to negative, we change = to ! like with lookahead:


/(?<!One )Two/

One Two, Three Two


Recommended programs