Regular expressions are a powerful tool for web developers and programmers. In this blog post, I discuss how to use them in WordPress with the Redirection plugin. The plugin has many useful functions that can be used for SEO purposes or redirecting traffic from one domain to another.

Redirecting URLs using an htaccess file is the most basic yet powerful method to redirect links, but that process can become cumbersome after a few thousand redirects. This is why I enjoy using the WordPress redirect plugin: Redirection. Redirection allows you to quickly add redirects in WordPress easily and effectively. After you’ve set up your 301 redirects, you will want to fix any redirecting URLs on your website with a search and replace plugin, this link optimization will speed up the website crawls for search engines and make link management more streamlined.

The easy way to replace content in WordPress using the Search Regex Plugin

  1. Download and install

    search regex plugin

    Download and install the Search Regex Plugin

  2. Search & Replace

    search and replace settings

    Go to Tools >> Search Regex >> Search and Replace 

  3. Click on Regular Expression

    regular expression

    Under Search Flags (on the right hand side), click on Regular Expression. If you do NOT do this, it will not perform a search using Regex.

  4. Use Regular Expressions

    tools

    Search and replace content and URLs using Regular Expressions (Regex)

    This code captures(.*) the content, and $1 adds it back.

    In the example image above, that search and replace removes the target and rel in all internal links, excluding any external links within the content.

Video Tutorial

What are Regular Expressions?

A regular expression (or regex) is a group of characters used to find a pattern within a piece of text.

In the context of the Redirection plugin, a plain URL match will match exactly one URL. A regex URL can match many URLs.

In addition to matching many URLs, a regular expression can extract information from the source URL and copy it to the target URL.

A few examples may be helpful. A redirect that has the source URL /my-url will only ever match requests for /my-url.

A redirect that has the source URL /my-url/.* will match requests for:

  • /my-url/this
  • /my-url/that

And so on.

The important part of /my-url/.* is .*​. This is the regular expression part of the URL and is equivalent to saying “match /my-URL/ followed by any sequence of characters.”

Note that to enable a regular expression match in Redirection, please ensure you have enabled the ‘regex’ option.

Please note that regular expressions are not specific to the Redirection plugin; it’s used in many developer platforms.

Regular Expression Syntax

So, regular expressions such as .* seem really useful. But what does it actually mean?

In this instance, the . means ‘any character’, and the * means any number of the previous expression. That is any number of any characters.

But wait, it gets much, much more complicated!

Regular expressions allow for very detailed and complex patterns that are beyond the scope of this page. Search for ‘regular expressions’ and settle in for a long reading session if you want more details.

Extracting Source Information

Not only can a regular expression match many URLs, but it can also extract information from the source URL and copy it to the target URL.

Why would you want to do that? Let’s look at another example. Say you have a site where some pages are under the /oldpage/ directory and you’ve moved these to /newpage/.

  • /oldpage/bananas/
  • /oldpage/best-post-in-the-world/

And you want to move these to:

  • /newpage/bananas/
  • /newpage/best-post-in-the-world/

That is, you want to change /oldpage/ to /newpage/, but keep bananas and best-post-in-the-world.

To do this, you can create a regular expression such as /oldpage/(.*).

Note that the .* is surrounded by brackets. This tells Redirection to ‘capture’ the data. The target URL would then be /newpage/$1.

Here the $1 is replaced by the contents of the captured (.*). And so:

/oldpage/bananas => /oldpage/(bananas) => /newpage/$1 => /newpage/bananas

Infinite Loops

A frequently occurring problem with a regular expression is an infinite redirect. You create a regex that redirects to a URL that is itself caught by the same regular expression. This is then redirected again, and again, and again until the browser stops with a ERR_TOO_MANY_REDIRECTS message (or equivalent).

For example, say we have this redirect:

/index.php/(.*) => /portal/index.php/$1

If you access /index.php/banana​ it will then be redirected to /portal/index.php/banana. But wait! The URL /portal/index.php/banana itself matches the original regular expression as index.php/banana matches /portal/index.php/(.*).

If we use the carat ^ character, we can fix the match to the start of the URL:

^/oldpage/(.*)

Here the ^ tells the regular expression that it only applies when matched at the start of the URL. This prevents it from matching elsewhere in the URL and stops the infinite redirect.

Testing Regular Expressions

There are many resources available for regular expressions, and some of the most useful are regular expression testers. With these, you can experiment with a pattern and tune it to match exactly what you want.

Note that Redirection uses PHP’s regular expressions. These are commonly known as PCRE and may not be the same as other regular expression libraries.

A good resource to understand regular expressions can be found here.

If you want to match a character that is part of the regular expression syntax, you must escape it. For example, if you want to match ? then you will need to put \? as ? on its own has a special meaning.

Matching the whole URL

Redirection performs a regular expression replace – preg_replace. The redirect source is the match pattern, the target is the replacement pattern, and the requested URL is the subject.

This means that only matched content will be replaced. If you do not match the whole URL, then only the part you do match will be replaced, and the rest of the URL will be appended. This is expected behavior and is how regular expressions work.

Note that the term ‘whole URL’ refers to everything except the protocol and domain in this context.

Common Regular Expressions

The following regular expressions are commonly used for WordPress. As mentioned above, you will need to enable the regex option.

Note that regular expressions cannot generate new information. For example, you cannot generate a category from the URL if it does not already exist there.

Redirect date and name permalink
/2020/01/02/post-name/ ⇒ /post-name/
You may find it easier to use a Permalink Migration.

Source
^/\d{4}/\d{2}/\d{2}/(.*)

Target
/$1

Redirect date, name, and category permalink
/2020/01/02/category/post-name ⇒ /post-name/
You may find it easier to use a Permalink Migration.

Source
^/\d{4}/\d{2}/\d{2}/.*?/(.*)

Target
/$1

Redirect all URLs to /blog/ except for ones that start with /blog/
/old-post-url/ ⇒ /blog/old-post-url/

Source
^/(?!blog)(.*)

Target
/blog/$1

Redirect every page on the old site to the new site
http://oldsite.com/your-thing/ ⇒ https://newsite.com/your-thing/
You may find it easier to use a Site Relocate.

Source
/(.*)

Target
https://newsite.com/$1

Redirect .html pages in a directory
http://oldsite.com/blog/post-name.html ⇒ https://newsite.com/post-name/

Source
/blog/(.*?)\.html

Target
/$1/

Remove index.php from a URL
http://site.com/index.php/post-name.html ⇒ https://site.com/post-name/

Source
/index\.php/(.*)

Target
/$1/

Remove .html from Blogger pages
http://site.com/2019/01/02/something.html ⇒ http://site.com/2019/01/02/something

Source
^/(.*?)\.html$

Target
/$1/

Redirect AMP pages
http://site.com/article/amp/ ⇒ http://site.com/article/

Source
^/(.*?)/amp/$

Target
/$1/

Redirect ?amp pages
http://site.com/article/?amp=12 ⇒ http://site.com/article/

Source
^/(.*?)/\?amp=.*

Target
/$1/

I hope you found this post useful and informative; please email me with any questions you might have about Regular expressions.