Email Address Validation
Email addresses have a local-part and a domain separated by an (unquoted) “@” symbol. The local-part must be either a dot-atom or a quoted string, and the domain must be either a domain name or a domain literal.
A dot-atom can only contain letters, numbers, dots, and the following characters: ! # $ % & ‘ * + – / = ? ^ _ ` { | } ~. However, neither the first nor the last character can be a dot, and two or more consecutive dots are not allowed. The maximum length of a dot-atom is 64 characters. A regular expression to match for a dot-atom local-part would be as follows:
// Dot-atom
/^(?!.{65,})([!#-'*+\/-9=?^-~-]+)(?>\.(?1))*$/iD
A quoted string can only contain printable US-ASCII characters or the space character, all contained within double quotes. Double quotes and backslashes are allowed only if part of a quoted-pair (escaped with a backslash). A quoted string may be empty. The maximum length of a quoted string is 64 characters, not including the enclosing double-quotes or the escaping backslash of a quotes-pair. A regular expression to match for a quoted string local-part would be as follows:
// Quoted string
/^"(?>[ !#-\[\]-~]|\\\[ -~]){64,}"$/iD
A domain name consists of 1 to 127 labels (the 128th label being the (empty) root domain), separated by dots, each containing any combination of letters, numbers, or hyphens. However, neither the first nor the last character can be a hyphen. The maximum length of a domain name and label is 253 and 63 characters respectively. A regular expression to match for a domain name would be as follows:
// Domain name
/^(?!.{254,})(?!.*[^.]{64,})([a-z0-9](?>[a-z0-9-]*[a-z0-9])?)(?>\.(?1)){0,126}$/iD
A domain literal is one of an IPv4 address, an IPv6 address, or an IPv4-mapped IPv6 address.
An IPv4 address consists of four groups, separated by dots, each containing a decimal value between 0 and 255. A regular expression to match for an IPv4 address would be as follows:
// IPv4 Address
/^(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?1)){3}$/D
An IPv6 address consists of eight groups, separated by colons, each containing a hexadecimal value between 0 and FFFF. One or more consecutive groups of 0 value can be represented as a double colon; however, this can only occur once. A regular expression to match for an IPv6 address would be as follows:
// IPv6 Address
/^(?>([a-f0-9]{1,4})(?>:(?1)){7}|(?!(?:.*[a-f0-9](?>:|$)){8,})((?1)(?>:(?1)){0,6})?::(?2)?)$/iD
An IPv4-mapped IPv6 address is an IPv6 address with the final two groups represented as an IPv4 address. One or more consecutive groups of 0 value can be represented as a double colon; however, this can only occur once. A regular expression to match for an IPv4-mapped IPv6 address would be as follows:
// IPv4-mapped IPv6 Address
/^(?>([a-f0-9]{1,4})(?>:(?1)){5}:|(?!(?:.*[a-f0-9]:){6,})(?2)?::(?>((?1)(?>:(?1)){0,4}):)?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?3)){3})$/iD
When used as a domain literal in an email address, the IP address must be contained within square brackets, and IPv6 or IPv4-mapped IPv6 addresses must be preceded by (unquoted) “IPv6:”. A regular expression check to match for a domain literal would be as follows:
// Domain literal
/^\[(?:(?>IPv6:(?>([a-f0-9]{1,4})(?>:(?1)){7}|(?!(?:.*[a-f0-9][:\]]){8,})((?1)(?>:(?1)){0,6})?::(?2)?))|(?>(?>IPv6:(?>(?1)(?>:(?1)){5}:|(?!(?:.*[a-f0-9]:){6,})(?3)?::(?>((?1)(?>:(?1)){0,4}):)?))?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?4)){3}))\]$/iD
By bringing these regexes together, separating the local-part from the domain with an (unquoted) “@” symbol, and limiting the entire email address to 254 characters we are left with the following which matches for every valid RFC 5321 email address:
// Email address
/^(?!(?>"?(?>[^"\\\]|\\\[ -~])"?){255,})(?!"?(?>[^"\\\]|\\\[ -~]){65,}"?@)(?>([!#-'*+\/-9=?^-~-]+)(?>\.(?1))*|"(?>[ !#-\[\]-~]|\\\[ -~])*")@(?!.*[^.]{64,})(?>([a-z0-9](?>[a-z0-9-]*[a-z0-9])?)(?>\.(?2)){0,126}|\[(?:(?>IPv6:(?>([a-f0-9]{1,4})(?>:(?3)){7}|(?!(?:.*[a-f0-9][:\]]){8,})((?3)(?>:(?3)){0,6})?::(?4)?))|(?>(?>IPv6:(?>(?3)(?>:(?3)){5}:|(?!(?:.*[a-f0-9]:){6,})(?5)?::(?>((?3)(?>:(?3)){0,4}):)?))?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?6)){3}))\])$/iD
We can then create a function which returns the return value of a (case-insensitive) preg_match on the above regular expression:
function isValid5321($emailAddress)
{
return preg_match('/^(?!(?>"?(?>[^"\\\]|\\\[ -~])"?){255,})(?!"?(?>[^"\\\]|\\\[ -~]){65,}"?@)(?>([!#-\'*+\/-9=?^-~-]+)(?>\.(?1))*|"(?>[ !#-\[\]-~]|\\\[ -~])*")@(?!.*[^.]{64,})(?>([a-z0-9](?>[a-z0-9-]*[a-z0-9])?)(?>\.(?2)){0,126}|\[(?:(?>IPv6:(?>([a-f0-9]{1,4})(?>:(?3)){7}|(?!(?:.*[a-f0-9][:\]]){8,})((?3)(?>:(?3)){0,6})?::(?4)?))|(?>(?>IPv6:(?>(?3)(?>:(?3)){5}:|(?!(?:.*[a-f0-9]:){6,})(?5)?::(?>((?3)(?>:(?3)){0,4}):)?))?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?6)){3}))\])$/iD', $emailAddress);
}
An obsolete version of the local-part is also possible, allowing for a mixture of atoms and quoted strings, separated by dots. An obsolete quoted string allows any US-ASCII character when part of a quoted-pair, and any US-ASCII character except the null, horizontal tab, new line, carriage return, backslash, and double quote characters when not. An obsolete local-part may only be empty if it is a single quoted string. The maximum length of an obsolete local-part, not including the double quotes enclosing a quoted string or the escaping backslash of a quoted-pair, is 64 characters. A regular expression to match for an obsolete local-part would be as follows:
// Obsolete local-part
/^(?!"?(?>[^"\\\]|\\\[ -~]){65,}"?@)([!#-'*+\/-9=?^-~-]+|"(?>(?>[\x01-\x08\x0B\x0C\x0E-!#-\[\]-\x7F]|\\\[\x00-\x7F]))*")(?>\.(?1))*$/iD
Comments and folding white spaces are also allowed in an email address; before and/or after the local-part, before and/or after the domain, and before and/or after any dot in a local-part and/or domain. Folding white space may also appear in a quoted string and/or in comments, and comments may nest. A comment is almost identical to a quoted string except that it is opened and closed with a left and right parentheses respectively and that parentheses are only allowed as part of a quoted-pair (or as further comments), whereas double quotes may appear freely. Folding white spaces are occurrences of the space and/or horizontal tab character preceded by, optionally, zero or more spaces and/or horizontal tabs followed by a carriage return and line feed pair. An obsolete form of folding white spaces is also allowed which are occurrences of the space and/or horizontal tab character followed by, optionally, and any number of times, a carriage return and line feed pair preceding more space and/or horizontal tab characters Folding white spaces, where allowed, are optional and may occur repeatedly. A regular expression to match for comments and folding white spaces would be as follows:
// Comments and folding white spaces /^((?>(?>(?>((?>[ ]+(?>\x0D\x0A[ ]+)*|(?>[ ]*\x0D\x0A)?[ ]+)?)(\((?>(?2)(?>[\x01-\x08\x0B\x0C\x0E-'*-\[\]-\x7F]|\\\[\x00-\x7F]|(?3)))*(?2)\)))+(?2))|(?2))?)$/iD
We can now include these where appropriate in the earlier function to give us the following which matches for RFC 5322 email addresses:
function isValid5322($emailAddress)
{
return preg_match('/^(?!(?>(?>(?1)"?(?>[^"\\\]|\\\[ -~]))"?(?1)){255,})(?!(?>(?1)"?(?>[^"\\\]|\\\[ -~])"?(?1)){65,}@)((?>(?>(?>((?>[ ]+(?>\x0D\x0A[ ]+)*|(?>[ ]*\x0D\x0A)?[ ]+)?)(\((?>(?2)(?>[\x01-\x08\x0B\x0C\x0E-\'*-\[\]-\x7F]|\\\[\x00-\x7F]|(?3)))*(?2)\)))+(?2))|(?2))?)([!#-\'*+\/-9=?^-~-]+|"(?>(?2)(?>[\x01-\x08\x0B\x0C\x0E-!#-\[\]-\x7F]|\\\[\x00-\x7F]))*(?2)")(?>(?1)\.(?1)(?4))*(?1)@(?!(?1)[a-z0-9-]{64,})(?1)(?>([a-z0-9](?>[a-z0-9-]*[a-z0-9])?)(?>(?1)\.(?!(?1)[a-z0-9-]{64,})(?1)(?5)){0,126}|\[(?:(?>IPv6:(?>([a-f0-9]{1,4})(?>:(?6)){7}|(?!(?:.*[a-f0-9][:\]]){8,})((?6)(?>:(?6)){0,6})?::(?7)?))|(?>(?>IPv6:(?>(?6)(?>:(?6)){5}:|(?!(?:.*[a-f0-9]:){6,})(?8)?::(?>((?6)(?>:(?6)){0,4}):)?))?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?9)){3}))\])(?1)$/isD', $emailAddress);
}
For a class which allows greater control over which type(s) of email address to validate, see below:
<?php
/**
* Squiloople Framework
*
* LICENSE: Feel free to use and redistribute this code.
*
* @author Michael Rushton <michael@squiloople.com>
* @link http://squiloople.com/
* @category Squiloople
* @package Models
* @subpackage Validators
* @version 1.0
* @copyright © 2009-2010 Michael Rushton
*/
// Define the namespace
namespace Models\Validators;
/**
* Email Address Validator
*
* Validate email addresses according to RFC 5321 or RFC 5322
*/
final class EmailAddressValidator
{
/**
* The email address to validate
*
* @access private
* @var string $_emailAddress
*/
private $_emailAddress;
/**
* A quoted string local part is either allowed (true) or not (false)
*
* @access private
* @var bool $_quotedString
*/
private $_quotedString = false;
/**
* An obsolete local part is either allowed (true) or not (false)
*
* @access private
* @var bool $_obsolete
*/
private $_obsolete = false;
/**
* A domain literal domain is either allowed (true) or not (false)
*
* @access private
* @var bool $_domainLiteral
*/
private $_domainLiteral = false;
/**
* Comments and folding white spaces are either allowed (true) or not (false)
*
* @access private
* @var bool $_cfws
*/
private $_cfws = false;
/**
* Set the email address and turn on the relevant standard if required
*
* @access public
* @param string $emailAddress
* @param null|integer $standard
*/
public function __construct($emailAddress, $standard = null)
{
// Set the email address
$this->_emailAddress = $emailAddress;
// Turn on the RFC 5321 standard if requested
if ($standard == 5321)
{
$this->setStandard5321();
}
// Otherwise turn on the RFC 5322 standard if requested
elseif ($standard == 5322)
{
$this->setStandard5322();
}
}
/**
* Call the constructor fluently
*
* @access public
* @static
* @param string $emailAddress
* @param null|integer $standard
* @return \Models\Validators\EmailAddressValidator
*/
public static function setEmailAddress($emailAddress, $standard = null)
{
return new self($emailAddress, $standard);
}
/**
* Validate the email address according to RFC 5321 and return itself
*
* @access public
* @param bool $allow
* @return \Models\Validators\EmailAddressValidator
*/
public function setStandard5321($allow = true)
{
// A quoted string local part is either allowed (true) or not (false)
$this->_quotedString = $allow;
// A domain literal domain is either allowed (true) or not (false)
$this->_domainLiteral = $allow;
// Return itself
return $this;
}
/**
* Validate the email address according to RFC 5322 and return itself
*
* @access public
* @param bool $allow
* @return \Models\Validators\EmailAddressValidator
*/
public function setStandard5322($allow = true)
{
// An obsolete local part is either allowed (true) or not (false)
$this->_obsolete = $allow;
// A domain literal domain is either allowed (true) or not (false)
$this->_domainLiteral = $allow;
// Comments and folding white spaces are either allowed (true) or not (false)
$this->_cfws = $allow;
// Return itself
return $this;
}
/**
* Either allow (true) or disallow (false) a quoted string local part and return itself
*
* @access public
* @param bool $allow
* @return \Models\Validators\EmailAddressValidator
*/
public function setQuotedString($allow = true)
{
// Either allow (true) or disallow (false) a quoted string local part
$this->_quotedString = $allow;
// Return itself
return $this;
}
/**
* Either allow (true) or disallow (false) an obsolete local part and return itself
*
* @access public
* @param bool $allow
* @return \Models\Validators\EmailAddressValidator
*/
public function setObsolete($allow = true)
{
// Either allow (true) or disallow (false) an obsolete local part
$this->_obsolete = $allow;
// Return itself
return $this;
}
/**
* Either allow (true) or disallow (false) a domain literal domain and return itself
*
* @access public
* @param bool $allow
* @return \Models\Validators\EmailAddressValidator
*/
public function setDomainLiteral($allow = true)
{
// Either allow (true) or disallow (false) a domain literal domain
$this->_domainLiteral = $allow;
// Return itself
return $this;
}
/**
* Either allow (true) or disallow (false) comments and folding white spaces and return itself
*
* @access public
* @param bool $allow
* @return \Models\Validators\EmailAddressValidator
*/
public function setCFWS($allow = true)
{
// Either allow (true) or disallow (false) comments and folding white spaces
$this->_cfws = $allow;
// Return itself
return $this;
}
/**
* Return the regular expression for a dot atom local part
*
* @access private
* @return string
*/
private function _getDotAtom()
{
return "([!#-'*+\/-9=?^-~-]+)(?>\.(?1))*";
}
/**
* Return the regular expression for a quoted string local part
*
* @access private
* @return string
*/
private function _getQuotedString()
{
return '"(?>[ !#-\[\]-~]|\\\[ -~])*"';
}
/**
* Return the regular expression for an obsolete local part
*
* @access private
* @return string
*/
private function _getObsolete()
{
return '([!#-\'*+\/-9=?^-~-]+|"(?>'
. $this->_getFWS()
. '(?>[\x01-\x08\x0B\x0C\x0E-!#-\[\]-\x7F]|\\\[\x00-\xFF]))*'
. $this->_getFWS()
. '")(?>'
. $this->_getCFWS()
. '\.'
. $this->_getCFWS()
. '(?1))*';
}
/**
* Return the regular expression for a domain name domain
*
* @access private
* @return string
*/
private function _getDomainName()
{
return '([a-z0-9](?>[a-z0-9-]*[a-z0-9])?)(?>'
. $this->_getCFWS()
. '\.'
. $this->_getCFWS()
. '(?2)){0,126}';
}
/**
* Return the regular expression for an IPv6 address
*
* @access private
* @return string
*/
private function _getIPv6()
{
return '([a-f0-9]{1,4})(?>:(?3)){7}|(?!(?:.*[a-f0-9][:\]]){8,})((?3)(?>:(?3)){0,6})?::(?4)?';
}
/**
* Return the regular expression for an IPv4-mapped IPv6 address
*
* @access private
* @return string
*/
private function _getIPv6v4()
{
return '(?3)(?>:(?3)){5}:|(?!(?:.*[a-f0-9]:){6,})(?5)?::(?>((?3)(?>:(?3)){0,4}):)?';
}
/**
* Return the regular expression for an IPv4 address
*
* @access private
* @return string
*/
private function _getIPv4()
{
return '(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?6)){3}';
}
/**
* Return the regular expression for a domain literal domain
*
* @access private
* @return string
*/
private function _getDomainLiteral()
{
return '\[(?:(?>IPv6:(?>'
. $this->_getIPv6()
. '))|(?>(?>IPv6:(?>'
. $this->_getIPv6v4()
. '))?'
. $this->_getIPv4()
. '))\]';
}
/**
* Return either the regular expression for folding white spaces or its backreference if allowed
*
* @access private
* @param bool $define
* @return string
*/
private function _getFWS($define = false)
{
// Return the backreference if $define is set to false otherwise return the regular expression
if ($this->_cfws)
{
return !$define ? '(?P>fws)' : '(?<fws>(?>[ ]+(?>\x0D\x0A[ ]+)*)?)';
}
}
/**
* Return the regular expression for comments
*
* @access private
* @return string
*/
private function _getComments()
{
return '(?<comment>\((?>'
. $this->_getFWS()
. '(?>[\x01-\x08\x0B\x0C\x0E-\'*-\[\]-\x7F]|\\\[\x00-\x7F]|(?P>comment)))*'
. $this->_getFWS()
. '\))';
}
/**
* Return either the regular expression for comments and folding white spaces or its backreference if allowed
*
* @access private
* @param bool $define
* @return string
*/
private function _getCFWS($define = false)
{
// Return the backreference if $define is set to false
if ($this->_cfws && !$define)
{
return '(?P>cfws)';
}
// Otherwise return the regular expression
if ($this->_cfws)
{
return '(?<cfws>(?>(?>(?>'
. $this->_getFWS(true)
. $this->_getComments()
. ')+'
. $this->_getFWS()
. ')|'
. $this->_getFWS()
. ')?)';
}
}
/**
* Establish, and return, the valid format for the local part
*
* @access private
* @return string
*/
private function _getLocalPart()
{
// The local part may be obsolete if allowed
if ($this->_obsolete)
{
return $this->_getObsolete();
}
// Or the local part may be either a dot atom or a quoted string if the latter is allowed
if ($this->_quotedString)
{
return '(?>' . $this->_getDotAtom() . '|' . $this->_getQuotedString() . ')';
}
// Otherwise the local part may only be a dot atom
return $this->_getDotAtom();
}
/**
* Establish, and return, the valid format for the domain
*
* @access private
* @return string
*/
private function _getDomain()
{
// The domain may be either a domain name or a domain literal if the latter is allowed
if ($this->_domainLiteral)
{
return '(?>' . $this->_getDomainName() . '|' . $this->_getDomainLiteral() . ')';
}
// Otherwise the domain must be a domain name
return $this->_getDomainName();
}
/**
* Check to see if the domain can be resolved to MX RRs
*
* @access private
* @param array $domain
* @return bool|int
*/
private function _verifyDomain($domain)
{
// Return 0 if the domain cannot be resolved to MX RRs
if (!checkdnsrr(end($domain), 'MX'))
{
return 0;
}
// Otherwise return true
return true;
}
/**
* Perform the validation check on the email address's syntax and, if required, call _verifyDomain()
*
* @access public
* @param bool $verify
* @return bool|integer
*/
public function isValid($verify = false)
{
// Return false if the email address has an incorrect syntax
if (!preg_match(
'/^'
. $this->_getCFWS()
. $this->_getLocalPart()
. $this->_getCFWS()
. '@'
. $this->_getCFWS()
. $this->_getDomain()
. $this->_getCFWS(true)
. '$/isD'
, $this->_emailAddress
))
{
return false;
}
// Check to see if the domain can be resolved to MX RRs if required
if ($verify)
{
return $this->_verifyDomain(explode('@', $this->_emailAddress));
}
// Otherwise return 1
return 1;
}
}
On creating the object, using either \Models\Validators\EmailAddressValidator::setEmailAddress($emailAddress) or new \Models\Validators\EmailAddressValidator($emailAddress), the default settings allow dot-atom@domain-name email addresses. If the second (optional) parameter is set to 5321 then a quoted string local-part and a domain literal domain are allowed. If the second (optional) parameter is set to 5322 then an obsolete local-part, a domain literal domain, and comments and folding white spaces are allowed. To add a format, call its associated method with either no parameter or a true parameter. To remove a format, call its associated method with a false parameter. To return the validation check (either 1 for valid or false for invalid), use the isValid() method. The following is a list of available settings:
set5321() // setQuotedString() and setDomainLiteral() set5322() // setObsolete(), setDomainLiteral(), and setCFWS() setQuotedString() // A quoted string local-part is allowed setObsolete() // An obsolete local-part is allowed setDomainLiteral() // A domain literal domain is allowed setCFWS() // Comments and folding white spaces are allowed
If you pass a true parameter to the isValid() method then the _verifyDomain() method will be called to check to see if the domain can be resolved to MX RRs, but only if the email address is syntactically valid. If the verification is successful then the object will return true; if the verification is unsuccessful then the object will return 0.
Alternatively, for those who’d like just a simple regular expression which allows the majority of in-use email addresses, use the following:
function isValidEmailAddress($emailAddress)
{
return preg_match("/^([!#-'*+\/-9=?^-~-]+)(?>\.(?1))*@(?>[a-z0-9](?>[a-z0-9-]*[a-z0-9])?\.){1,2}[a-z]{2,6}$/iD", $emailAddress);
}
For the official documentation on email addresses, please see RFC 5321 and RFC 5322.
I found a code snippet you posted on linuxjournal.com for email validation.
return preg_match(“/^(?=.{5,254})(?:(?:\”[^\"]{1,62}\”)|(?:(?!\.)(?!.*\.[.@])[a-z0-9!#$%&'*+\/=?^_`{|}~^.-]{1,64}))@(?:(?:\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9]?[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9]?[0-9])\])|(?:(?!-)(?!.*-\$)(?!.*-\.)(?!.*\.-)(?!.*[^n]–)(?!.*[^x]n–)(?!n–)(?!.*[^.]xn--)(?:[a-z0-9-]{1,63}\.){1,127}(?:[a-z0-9-]{1,63})))$/i”, $email);
Would this be equivalent to this class you have on this site?
No, that snippet lacks support for quoted string local parts, obsolete local parts, and IPv6 domain literal domains. Plus, it isn’t very efficient. The new regular expression is:
function is_email_address($email_address) { return preg_match('/^(?!(?>\x22?(?>\x22\x40|\x5C?[\x00-\x7F])\x22?){255,})(?!(?>\x22?\x5C?[\x00-\x7F]\x22?){65,}@)(?>[\x21\x23-\x27\x2A\x2B\x2D\x2F-\x39\x3D\x3F\x5E-\x7E]+|(?>\x22(?>[\x01-\x08\x0B\x0C\x0E-\x1F\x21\x23-\x5B\x5D-\x7F]|\x5C[\x00-\x7F])*\x22))(?>\.(?>[\x21\x23-\x27\x2A\x2B\x2D\x2F-\x39\x3D\x3F\x5E-\x7E]+|(?>\x22(?>[\x01-\x08\x0B\x0C\x0E-\x1F\x21\x23-\x5B\x5D-\x7F]|\x5C[\x00-\x7F])*\x22)))*@(?>(?>(?!.*[^.]{64,})(?>(?>xn--)?[a-z0-9]+(?>-[a-z0-9]+)*\.){0,126}(?>xn--)?[a-z0-9]+(?>-[a-z0-9]+)*)|(?:\[(?>(?>IPv6:(?>(?>[a-f0-9]{1,4}(?>:[a-f0-9]{1,4}){7})|(?>(?!(?:.*[a-f0-9][:\]]){8,})(?>[a-f0-9]{1,4}(?>:[a-f0-9]{1,4}){0,6})?::(?>[a-f0-9]{1,4}(?>:[a-f0-9]{1,4}){0,6})?)))|(?>(?>IPv6:(?>(?>[a-f0-9]{1,4}(?>:[a-f0-9]{1,4}){5}:)|(?>(?!(?:.*[a-f0-9]:){6,})(?>[a-f0-9]{1,4}(?>:[a-f0-9]{1,4}){0,4})?::(?>[a-f0-9]{1,4}(?>:[a-f0-9]{1,4}){0,4}:)?)))?(?>25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?>25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}))\]))$/isD', $email_address); }This seems to miss the trivial case of a@foo which according to the RFCs and Dominic Sayers test cases should be invalid – http://www.dominicsayers.com/isemail/results.php
Rasmus says: “This seems to miss the trivial case of a@foo which according to the RFCs and Dominic Sayers test cases should be invalid”.
This is not the case. RFC 5321 says that “a domain name (or often just a “domain”) consists of one or more components, separated by dots if more than one appears. In the case of a top-level domain used by itself in an email address, a single string is used without any dots.”
And a real-life example:
checkdnsrr('ai', 'MX') // Returns true getmxrr('ai', $array) // Returns trueYes, but read further in that RFC you quoted. Section 2.3.5 says:
Only resolvable, fully-qualified domain names (FQDNs) are permitted
when domain names are used in SMTP. In other words, names that can
be resolved to MX RRs or address (i.e., A or AAAA) RRs (as discussed
in Section 5) are permitted, as are CNAME RRs whose targets can be
resolved, in turn, to MX or address RRs. Local nicknames or
unqualified names MUST NOT be used.
Both Cal Henderson’s and Dominic Sayers’ validators label addresses of this form invalid.
I guess the key here is the SMTP part. Are you checking for valid email addresses routeable on the public Internet or not. If you are, then this test has to fail as you can’t deliver to a TLD and if it isn’t a TLD then it must be a local nickname which is specifically disallowed by RFC 5321.
So, here is my slightly modified version that enforces public Internet addresses:
‘/^(?!(?:(?:\x22?\x5C[\x00-\x7E]\x22?)|(?:\x22?[^\x5C\x22]\x22?)){255,})(?!(?:(?:\x22?\x5C[\x00-\x7E]\x22?)|(?:\x22?[^\x5C\x22]\x22?)){65,}@)(?:(?:[\x21\x23-\x27\x2A\x2B\x2D\x2F-\x39\x3D\
x3F\x5E-\x7E]+)|(?:\x22(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x21\x23-\x5B\x5D-\x7F]|(?:\x5C[\x00-\x7F]))*\x22))(?:\.(?:(?:[\x21\x23-\x27\x2A\x2B\x2D\x2F-\x39\x3D\x3F\x5E-\x7E]+)|(?:\x22(?:[\x01-\x08\x0B\x0C\x0E-\x1
F\x21\x23-\x5B\x5D-\x7F]|(?:\x5C[\x00-\x7F]))*\x22)))*@(?:(?:(?!.*[^.]{64,})(?:(?:(?:xn--)?[a-z0-9]+(?:-[a-z0-9]+)*\.){1,126}){1,}(?:(?:[a-z][a-z0-9]*)|(?:(?:xn--)[a-z0-9]+))(?:-[a-z0-9]+)*)|(?:\[(?:(?:IPv6:
(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){7})|(?:(?!(?:.*[a-f0-9][:\]]){7,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?)))|(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}
){5}:)|(?:(?!(?:.*[a-f0-9]:){5,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3}:)?)))?(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))(?:\.(?:(?:25[0-5])|(?:2
[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))){3}))\]))$/i’
Some single-label domains can be resolved to MX RRs:
checkdnsrr('ua', 'MX') // Returns true getmxrr('ua', $array) // Returns trueAnd some examples of single-label domain email addresses:
vince@ai
paul@io
root@km
joost@tk
admin@tt
hostmaster@ua
Source: Tony Finch – TLDs with MXs
Right, I should have said cannot reliably be delivered to because of potential local hostname conflicts. There are actually quite a few RFC-defying addresses that work. gmail will happily deliver to foo.@example.com as well, for example, but that is also obviously an invalid address according to the RFCs. Of course, the HTML5 guys are allowing bogus addresses like that now. Check out their ABNF at: http://bit.ly/9eOdUm
That doesn’t change the fact that they are syntactically valid (and that they are also in use). My code intends to allow every RFC 5322/5321 valid email address (excluding semantically invisible (unnecessary) content like folding white space and comments) and deny every invalid email address.
If I am to also allow invalid email addresses (foo.@example.com, as you offer), then the entire purpose of an email address validator is moot.
However, I accept that the RFCs allow impractical addresses like “My name”.is.”Michael”@[IPv6:FFFF::255.255.255.255], which is why I have also made a class which allows the developer to allow or disallow different types of local-parts and domains. Taking into account your replies, I have now also included a method to turn on and off single-label domain names (SetStrictDomain()).
This code won’t run – there’s no SetStrictAtom function set. When you call the constructor, php throws an error. This is when you set strict = false.
$x = EmailAddressValidator::SetEmailAddress($contact_email,false);
Thanks for bringing that to my attention, James. I’m not sure how I missed that, but I’ve put it in.
Hello,
i dont understand this part.
“Additionally, the top level domain must not begin with a number.”
As far as i know 911.com is starting with a number and is a valid (and working) TLD.
hahaha … fuck … forget it…
mixed up domain and top level domain …
Sorry. :)
It turns out this regex is now the basis for PHP’s in-built
filter_var()function. See source codeMichael, very interesting but by the end I’m slightly confused. Is there some PHP code that I could use to trap, say 99% of false emails. It doesn’t have to be perfect but just easy to implement. I just want to ‘trim’ an email list before sending. Thanks in advance.
If you have an array (
$emails) of email addresses, try the following code:foreach ($emails as $key => $email) { if (EmailAddressValidator::SetEmailAddress($email, false)->Validate()) { mail($email, $subject, $message, $headers); } else { unset($emails[$key]); } }That will send an email to the valid addresses and remove from the array the invalid addresses.
[...] Es wird also ein recht komplexer Regex auf die E-Mail-Adresse angewandt. Unter dem im Kommentar genannten Link findet sich auch eine etwas ausführlichere Erklärung: Email Address Validator. [...]