Ian M Sutherland

PHP, replace, and regex

In an effort to update this site's code, I realized it would be good opportunity to attempt to re-create the codebase using object oriented programming, rather than functional as it currently is.

In the meantime, I plan to update some of the features of the code that manages the display of these posts. In a previous site meant for posting quotes form an IRC channel (à la bash.org), I wrote a function that took the various tags and formats users would submit, and using regular expressions extrapolate and color-code each line of text (or event chat action) correspondingly. Then trying to identify usernames, and provide a way to automatically search for other posts by that user. In most IRC clients, usernames are surrounded by parentheses. Some users have elevated access to the channel (such as admins, or 'voice' that can chat while other users are muted), which are designated by either a @, +, or %.

The fun part of regular expressions is getting it to finally work. It's honestly funny looking back on previously written code, almost like trying to decipher Egyptian hieroglyphics. As an example:



For regular expressions, each grouping of parentheses represents a set of characters to be used as a variable. I've numbered these to make it easier to follow.
The first slash signifies the start of the entire regular expression. You see the same slash at the end of the regular expression as well.
The first grouping identifies the opening parentheses. Take note of that first \, as I'll come back to it shortly.
The second groups looks for either a @, +, or % symbol, but the question marked at the end means matching these symbols are optional.
Group 3 is any character of the alphabet, both lower and uppercase, any number, and an underscore. The plus at the end here means to make one or more of any of these characters.
Finally is group 4 where we match the end parentheses.

This regex is the first parameter of a preg_replace PHP function. Once the criteria for text that matches all of these is met, PHP will replace the entire string with the text in the next parameter, and replace each instance of $1, $2, $3 and $4 with the corresponding matched text, which happens to be separating the usernames from the rest of the text, and creating a link that will search for other usernames in the database. The \'s that you see here are escape characters so that the quotes and parentheses are not incorrectly considered the end of the parameter.

A similar script is used a few time to match usernames as they appear in various IRC interfaces, such as a username followed by a colon.t9

You can see the current state of this site as I update it ianmsutherland.com/suprbash