Home Tutorial PHP Lesson 17 - Regular Expressions



MSN Random Online

Lesson 17 - Regular Expressions PDF Print E-mail
Written by Administrator   
Thursday, 15 February 2007 17:52

Lesson 17 - Regular Expressions

Regular Expressions are one of the trickiest things to learn.  There are a lot of components to it, but it can at the same time be very strong.  A regular expression is an expression which lets you match an arbitrary strong, dissect it, and check it for validity.  A regular expression uses as set of characters to match strings someone inputs to them.  For those of you that have used DOS or the Unix shell, "dir *.txt" (for DOS) or "ls *.txt" are both regular expressions which ask that the dir/ls commands only return strings that end with ".txt" and have "any other character" before them.

Why would you want to use regular expressions in you're scripts?  The biggest reason would be to validate what a user inputs into fields in a HTML form and submits to your PHP script.  I won't go into the negatives, but for example, if you had the HTML field "age", you would only expect the user to input a number.  If the user inputs anything other than numbers, you don't want that information to go into your database.  You can use regular expressions to validate what the user inputs in the "age" field, and if they type in something bad, you can warn them.

The six basic simple characters used in regular expression are:

Pattern: a*
Matches: '', 'a', 'aa', ...
Explanation: match "a" zero or more times

Pattern: b+
Matches: 'b', 'bb', ...
Explanation: match "b" one or more times

Pattern: ab?c
Matches: 'ac', 'abc'
Explanation: match "a" followed by "b" optionally and then "c"

Pattern: [abc]
Matches: 'a' or 'b' or 'c'
Explanation: match "a" or "b" or "c" once

Pattern: [a-c]
Matches: 'a' or 'b' or 'c'
Explanation: Abbreviation for the above

Pattern: [abc]*
Matches: '', 'accb', ...
Explanation: Combination of "one from a set" and "zero or more"; match "a" or "b" or "c" zero or more times from the set

The "^" character is used to check to see whether something "starts at the beginning of the string".  The "$" character is used to check whether something "finishes at the end of the string".  The "|" character is used as the "or" separator.  The "|" character is not like the square bracket characters, because the | character separates regular expressions, NOT characters.  Brackets are used to group regular expressions.  Curly brackets are used to match regular expressions a certain amount of times (or a minimum/maximum amount of times).  I know this is a little too much to take it, but soon there will be a massive amount of examples to explain all of these regular expressions characters.

There are also a few special characters which are used to set common characters.  Those are:

\t -> Tab
\n -> Newline
\r -> Carriage Return
\* -> Asterisk
\\ -> Backslash
\d -> Digits [0-9]
\w -> Word [a-zA-Z0-9_] (letters, numbers, and the underscore)
\s -> Space [\t\r\n] (a tab, a carriage return, a newline)
. -> Anything except end-of-line [^\n] (literally any character that isn't a newline)

The function used in PHP to match a string using regular expressions is the preg_match() function.  This function uses Perl's regular expression feature to match a string.  The function takes the following [simple] parameters:

int preg_match (string pattern, string subject)

The "pattern" must start and end with the "/" character.  The main reason for this is that this function uses the Perl regular expressions library, and Perl uses "/"'s in its functions (if you used Perl, regular expressions don't use functions, instead they use m///).  The function returns a 1 if the "pattern" matched something in the "subject" and 0 otherwise.

Here are a few simple examples:

Code:
echo preg_match("/a/", "a"); //matches "a"
echo preg_match("/b/", "a"); //doesn't match, needs a "b"
echo preg_match("/a+/",""); //doesn't match, needs to have at least 1 "a"
echo preg_match("/a+/","a"); //matches, at least one "a"
echo preg_match("/a+/","aaaaaa"); //matches, at least one "a"
echo preg_match("/a*/",""); //matches, 0 or more "a"'s
echo preg_match("/a*/","aaaaaaaaaa"); //matches, 0 or more "a"'s
echo preg_match("/[xyz]/","x"); //matches, there is an "x"
echo preg_match("/[xyz]/","y"); //matches, there is an "y"
echo preg_match("/[xyz]/","z"); //matches, there is an "z"
echo preg_match("/[xyz]/","a"); //doesn't match, there is neither "x", "y", or "z"
echo preg_match("/[a-z]/","q"); //matches, "q" is in the range from "a" to "z"
echo preg_match("/[0-9]/","5"); //matches, "5" is in the range from "0" to "9"
echo preg_match("/[0-9]/","s"); //doesn't match, "s" is not in the range from "0" to "9"

examples of the "|" character:

Code:
//note that the | does not match only the chararacter before or after,
//the | character matches everything either before or after unless you group
//them

//not grouped
echo preg_match("/ab|cd/","ab"); //matches
echo preg_match("/ab|cd/","cd"); //matches

//grouped
echo preg_match("/a(b|c)d/","abd"); //matches
echo preg_match("/a(b|c)d/","acd"); //matches
echo preg_match("/a(b|c)d/","ad"); //doesn't match

examples of the "*" character:

Code:
echo preg_match("/ab*/","abbbb"); //matches
echo preg_match("/ab*/","bbbbb"); //fails

examples of the "+" character:

Code:
echo preg_match("/a+b/","aaaab"); //matches
echo preg_match("/a+b/","b"); //fails

examples with "\w" character:

Code:
echo preg_match("/\w+/","abc"); //matches
echo preg_match("/\w+/","a_b_c"); //matches
echo preg_match("/\w+/","0123456789"); //matches
echo preg_match("/\w+/","-"); //fails, "-" is not a part of \w
echo preg_match("/\w+/"," "); //fails, space is not a part of \w
echo preg_match("/\w+/",""); //fails, have to have a least one \w

examples with "?" character:

Code:
echo preg_match("/a?b?c?/","a"); //matches
echo preg_match("/a?b?c?/","b"); //matches
echo preg_match("/a?b?c?/","c"); //matches
echo preg_match("/a?b?c?/","abc"); //matches
echo preg_match("/a?b?c?/","ab"); //matches
echo preg_match("/a?b?c?/","bc"); //matches

examples with "^" and "$" characters:

Code:
echo preg_match("/^im/","image"); //matches
echo preg_match("/^im/","imagine"); //matches
echo preg_match("/^im/","embrace"); //doesn't match
echo preg_match("/er$/","programmer"); //matches
echo preg_match("/er$/","designer"); //matches
echo preg_match("/er$/","designing"); //doesn't match
echo preg_match("/^(ab|cd)$/","ab"); //matches
echo preg_match("/^(ab|cd)$/","cd"); //matches
echo preg_match("/^(ab|cd)$/","abcd"); //doesn't match
echo preg_match("/^(ab|cd)$/","xy"); //doesn't match

examples with curly brackets character:

Code:
echo preg_match("/a{2}/","aaa"); //matches, found "aa" somewhere
echo preg_match("/^a{2}$/","aa"); //matches, entire string is "aa"
echo preg_match("/^a{2}$/","aaa"); //doesn't match, entire string isn't "aa"
echo preg_match("/a{2,4}/","aaa"); //matches, minimum "aa", maximum "aaaa"
echo preg_match("/a{2,4}/","aaaa"); //matches
echo preg_match("/a{2,4}/","a"); //doesn't match
echo preg_match("/^a{2,4}$/","aabaa"); //doesn't match

a few common regular expressions (these are by no means secure... JUST simple):

Code:
echo preg_match("/^[-.\w]+\@[-.\w]+$/","
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 "); //email addresses
echo preg_match("/^\d{2}$/","24"); //ages
echo preg_match("/^(19|20)\d\d$/","1983"); //years
echo preg_match("/^([\w\s]+)$/","hello there"); //a simple string
echo preg_match("/^(http:\/\/www\.|http:\/\/|www\.)([\w\.\/\=\?\&\-]+)$/","http://www.google.com"); //urls

 

 
 

Top Members

No top members yet.