Email Address Validation

Email addresses have a local-part and a domain separated by an (unquoted) “@” symbol. The local-part must be either a dot-atom or a quoted string, and the domain must be either a domain name or a domain literal.

A dot-atom can only contain letters, numbers, dots, and the following characters: ! # $ % & ‘ * + – / = ? ^ _ ` { | } ~. However, neither the first nor the last character can be a dot, and two or more consecutive dots are not allowed. The maximum length of a dot-atom is 64 characters. A regular expression to match for a dot-atom local-part would be as follows:

// Dot-atom

"/^(?!.{65,})([!#-'*+\/-9=?^-~-]+)(?>\.(?1))*$/iD"

A quoted string can only contain printable US-ASCII characters or the space character, all contained within double quotes. Double quotes and backslashes are allowed only if part of a quoted-pair (escaped with a backslash). A quoted string may be empty. The maximum length of a quoted string is 64 characters, not including the enclosing double-quotes or the escaping backslash of a quoted-pair. A regular expression to match for a quoted string local-part would be as follows:

// Quoted string

'/^"(?>[ !#-\[\]-~]|\\\[ -~]){0,64}"$/iD'

A domain name consists of 1 to 127 labels (the 128th label being the (empty) root domain), separated by dots, each containing any combination of letters, numbers, or hyphens. However, neither the first nor the last character can be a hyphen. The maximum length of a domain name and label is 253 and 63 characters respectively. A regular expression to match for a domain name would be as follows:

// Domain name

'/^(?!.{254,})(?!.*[^.]{64,})([a-z0-9](?>[a-z0-9-]*[a-z0-9])?)(?>\.(?1)){0,126}$/iD'

A domain literal is one of an IPv4 address, an IPv6 address, or an IPv4-mapped IPv6 address.

An IPv4 address consists of four groups, separated by dots, each containing a decimal value between 0 and 255. A regular expression to match for an IPv4 address would be as follows:

// IPv4 Address

'/^(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?1)){3}$/D'

An IPv6 address consists of eight groups, separated by colons, each containing a hexadecimal value between 0 and FFFF. One or more consecutive groups of 0 value can be represented as a double colon; however, this can only occur once. A regular expression to match for an IPv6 address would be as follows:

// IPv6 Address

'/^(?>([a-f0-9]{1,4})(?>:(?1)){7}|(?!(?:.*[a-f0-9](?>:|$)){8,})((?1)(?>:(?1)){0,6})?::(?2)?)$/iD'

An IPv4-mapped IPv6 address is an IPv6 address with the final two groups represented as an IPv4 address. One or more consecutive groups of 0 value can be represented as a double colon; however, this can only occur once. A regular expression to match for an IPv4-mapped IPv6 address would be as follows:

// IPv4-mapped IPv6 Address

'/^(?>([a-f0-9]{1,4})(?>:(?1)){5}:|(?!(?:.*[a-f0-9]:){6,})(?2)?::(?>((?1)(?>:(?1)){0,4}):)?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?3)){3})$/iD'

When used as a domain literal in an email address, the IP address must be contained within square brackets, and IPv6 or IPv4-mapped IPv6 addresses must be preceded by (unquoted) “IPv6:”. A regular expression check to match for a domain literal would be as follows:

// Domain literal

'/^\[(?:(?>IPv6:(?>([a-f0-9]{1,4})(?>:(?1)){7}|(?!(?:.*[a-f0-9][:\]]){8,})((?1)(?>:(?1)){0,6})?::(?2)?))|(?>(?>IPv6:(?>(?1)(?>:(?1)){5}:|(?!(?:.*[a-f0-9]:){6,})(?3)?::(?>((?1)(?>:(?1)){0,4}):)?))?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?4)){3}))\]$/iD'

By bringing these regexes together, separating the local-part from the domain with an (unquoted) “@” symbol, and limiting the entire email address to 254 characters, we are left with the following which matches for every valid RFC 5321 email address:

// Email address

'/^(?!(?>"?(?>\\\[ -~]|[^"])"?){255,})(?!"?(?>\\\[ -~]|[^"]){65,}"?@)(?>([!#-\'*+\/-9=?^-~-]+)(?>\.(?1))*|"(?>[ !#-\[\]-~]|\\\[ -~])*")@(?!.*[^.]{64,})(?>([a-z0-9](?>[a-z0-9-]*[a-z0-9])?)(?>\.(?2)){0,126}|\[(?:(?>IPv6:(?>([a-f0-9]{1,4})(?>:(?3)){7}|(?!(?:.*[a-f0-9][:\]]){8,})((?3)(?>:(?3)){0,6})?::(?4)?))|(?>(?>IPv6:(?>(?3)(?>:(?3)){5}:|(?!(?:.*[a-f0-9]:){6,})(?5)?::(?>((?3)(?>:(?3)){0,4}):)?))?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?6)){3}))\])$/iD'

We can then create a function which returns the return value of a (case-insensitive) preg_match on the above regular expression:

  /**
   * Validate an email address using RFC 5321
   *
   * @param string $email_address
   * @return integer
   */
  function is_valid_email_address_5321($email_address)
  {
    return preg_match('/^(?!(?>"?(?>\\\[ -~]|[^"])"?){255,})(?!"?(?>\\\[ -~]|[^"]){65,}"?@)(?>([!#-\'*+\/-9=?^-~-]+)(?>\.(?1))*|"(?>[ !#-\[\]-~]|\\\[ -~])*")@(?!.*[^.]{64,})(?>([a-z0-9](?>[a-z0-9-]*[a-z0-9])?)(?>\.(?2)){0,126}|\[(?:(?>IPv6:(?>([a-f0-9]{1,4})(?>:(?3)){7}|(?!(?:.*[a-f0-9][:\]]){8,})((?3)(?>:(?3)){0,6})?::(?4)?))|(?>(?>IPv6:(?>(?3)(?>:(?3)){5}:|(?!(?:.*[a-f0-9]:){6,})(?5)?::(?>((?3)(?>:(?3)){0,4}):)?))?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?6)){3}))\])$/iD', $email_address);
  }

An obsolete version of the local-part is also possible, allowing for a mixture of atoms and quoted strings, separated by dots. An obsolete quoted string allows any US-ASCII character when part of a quoted-pair, and any US-ASCII character except the null, horizontal tab, new line, carriage return, backslash, and double quote characters when not. An obsolete local-part may only be empty if it is a single quoted string. The maximum length of an obsolete local-part, not including the double quotes enclosing a quoted string or the escaping backslash of a quoted-pair, is 64 characters. A regular expression to match for an obsolete local-part would be as follows:

// Obsolete local-part

'/^(?!"?(?>\\\[ -~]|[^"]){65,}"?@)([!#-\'*+\/-9=?^-~-]+|"(?>(?>[\x01-\x08\x0B\x0C\x0E-!#-\[\]-\x7F]|\\\[\x00-\x7F]))*")(?>\.(?1))*$/iD'

Comments and folding white spaces are also allowed in an email address; before and/or after the local-part, before and/or after the domain, and before and/or after any dot in a local-part and/or domain. Folding white space may also appear in a quoted string and/or in comments, and comments may nest. A comment is almost identical to a quoted string except that it is opened and closed with a left and right parentheses respectively and that parentheses are only allowed as part of a quoted-pair (or as further comments), whereas double quotes may appear freely. Folding white spaces are occurrences of the space and/or horizontal tab character preceded by, optionally, zero or more spaces and/or horizontal tabs followed by a carriage return and line feed pair. An obsolete form of folding white spaces is also allowed which are occurrences of the space and/or horizontal tab character followed by, optionally, and any number of times, a carriage return and line feed pair preceding more space and/or horizontal tab characters. Folding white spaces, where allowed, are optional and may occur repeatedly. A regular expression to match for comments and folding white spaces would be as follows:

// Comments and folding white spaces

'/^((?>(?>(?>((?>[	 ]+(?>\x0D\x0A[	 ]+)*|(?>[	 ]*\x0D\x0A)?[	 ]+)?)(\((?>(?2)(?>[\x01-\x08\x0B\x0C\x0E-\'*-\[\]-\x7F]|\\\[\x00-\x7F]|(?3)))*(?2)\)))+(?2))|(?2))?)$/iD'

We can now include these where appropriate in the earlier function to give us the following which matches for RFC 5322 email addresses:

  /**
   * Validate an email address using RFC 5322
   *
   * @param string $email_address
   * @return integer
   */
  function is_valid_email_address_5322($email_address)
  {
    return preg_match('/^(?!(?>(?1)"?(?>\\\[ -~]|[^"])"?(?1)){255,})(?!(?>(?1)"?(?>\\\[ -~]|[^"])"?(?1)){65,}@)((?>(?>(?>((?>[	 ]+(?>\x0D\x0A[	 ]+)*|(?>[	 ]*\x0D\x0A)?[	 ]+)?)(\((?>(?2)(?>[\x01-\x08\x0B\x0C\x0E-\'*-\[\]-\x7F]|\\\[\x00-\x7F]|(?3)))*(?2)\)))+(?2))|(?2))?)([!#-\'*+\/-9=?^-~-]+|"(?>(?2)(?>[\x01-\x08\x0B\x0C\x0E-!#-\[\]-\x7F]|\\\[\x00-\x7F]))*(?2)")(?>(?1)\.(?1)(?4))*(?1)@(?!(?1)[a-z0-9-]{64,})(?1)(?>([a-z0-9](?>[a-z0-9-]*[a-z0-9])?)(?>(?1)\.(?!(?1)[a-z0-9-]{64,})(?1)(?5)){0,126}|\[(?:(?>IPv6:(?>([a-f0-9]{1,4})(?>:(?6)){7}|(?!(?:.*[a-f0-9][:\]]){8,})((?6)(?>:(?6)){0,6})?::(?7)?))|(?>(?>IPv6:(?>(?6)(?>:(?6)){5}:|(?!(?:.*[a-f0-9]:){6,})(?8)?::(?>((?6)(?>:(?6)){0,4}):)?))?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?9)){3}))\])(?1)$/isD', $email_address);
  }

For a class which allows greater control over which type(s) of email address to validate, see below:

<?php

  /**
   * Squiloople Framework
   *
   * LICENSE: Feel free to use and redistribute this code.
   *
   * @author Michael Rushton <michael@squiloople.com>
   * @link http://squiloople.com/
   * @category Squiloople
   * @package Models
   * @subpackage Validators
   * @version 1.0
   * @copyright © 2012 Michael Rushton
   */

  // Define the namespace
  namespace Models\Validators;

  /**
   * Email Address Validator
   *
   * Validate email addresses according to the relevant standards
   */
  final class EmailAddressValidator
  {

    /**
     * The email address
     *
     * @access private
     * @var string $_email_address
     */
    private $_email_address;

    /**
     * A quoted string local part is either allowed (TRUE) or not (FALSE)
     *
     * @access private
     * @var boolean $_quoted_string
     */
    private $_quoted_string = FALSE;

    /**
     * An obsolete local part is either allowed (TRUE) or not (FALSE)
     *
     * @access private
     * @var boolean $_obsolete
     */
    private $_obsolete = FALSE;

    /**
     * A basic domain name is either required (TRUE) or not (FALSE)
     *
     * @access private
     * @var boolean $_basic_domain_name
     */
    private $_basic_domain_name = TRUE;

    /**
     * A domain literal domain is either allowed (TRUE) or not (FALSE)
     *
     * @access private
     * @var boolean $_domain_literal
     */
    private $_domain_literal = FALSE;

   /**
     * Comments and folding white spaces are either allowed (TRUE) or not (FALSE)
     *
     * @access private
     * @var boolean $_cfws
     */
    private $_cfws = FALSE;

    /**
     * Set the email address and turn on the relevant standard if required
     *
     * @access public
     * @param string $email_address
     * @param null|integer $standard
     */
    public function __construct($email_address, $standard = NULL)
    {

      // Set the email address
      $this->_email_address = $email_address;

      // Set the relevant standard or throw an exception if an unknown is requested
      switch ($standard)
      {

        // Continue if no standard requested
        case NULL:
          continue;

        // Otherwise if RFC 5321 requested
        case 5321:
          $this->setStandard5321();
          break;

        // Otherwise if RFC 5322 requested
        case 5322:
          $this->setStandard5322();
          break;

        // Otherwise throw an exception
        default:
          throw new \Exception('Unknown RFC standard for email address validation.');

      }

    }

    /**
     * Call the constructor fluently
     *
     * @access public
     * @static
     * @param string $email_address
     * @param null|integer $standard
     * @return \Models\Validators\EmailAddressValidator
     */
    public static function setEmailAddress($email_address, $standard = NULL)
    {
      return new self($email_address, $standard);
    }

    /**
     * Validate the email address using a basic standard
     *
     * @access public
     * @return \Models\Validators\EmailAddressValidator
     */
    public function setStandardBasic()
    {

      // A quoted string local part is not allowed
      $this->_quoted_string = FALSE;

      // An obsolete local part is not allowed
      $this->_obsolete = FALSE;

      // A basic domain name is required
      $this->_basic_domain_name = TRUE;

      // A domain literal domain is not allowed
      $this->_domain_literal = FALSE;

      // Comments and folding white spaces are not allowed
      $this->_cfws = FALSE;

      // Return the \Models\Validators\EmailAddressValidator object
      return $this;

    }

    /**
     * Validate the email address using RFC 5321
     *
     * @access public
     * @return \Models\Validators\EmailAddressValidator
     */
    public function setStandard5321()
    {

      // A quoted string local part is allowed
      $this->_quoted_string = TRUE;

      // An obsolete local part is not allowed
      $this->_obsolete = FALSE;

      // Only a basic domain name is not required
      $this->_basic_domain_name = FALSE;

      // A domain literal domain is allowed
      $this->_domain_literal = TRUE;

      // Comments and folding white spaces are not allowed
      $this->_cfws = FALSE;

      // Return the \Models\Validators\EmailAddressValidator object
      return $this;

    }

    /**
     * Validate the email address using RFC 5322
     *
     * @access public
     * @return \Models\Validators\EmailAddressValidator
     */
    public function setStandard5322()
    {

      // A quoted string local part is disallowed
      $this->_quoted_string = FALSE;

      // An obsolete local part is allowed
      $this->_obsolete = TRUE;

      // Only a basic domain name is not required
      $this->_basic_domain_name = FALSE;

      // A domain literal domain is allowed
      $this->_domain_literal = TRUE;

      // Comments and folding white spaces are allowed
      $this->_cfws = TRUE;

      // Return the \Models\Validators\EmailAddressValidator object
      return $this;

    }

    /**
     * Either allow (TRUE) or do not allow (FALSE) a quoted string local part
     *
     * @access public
     * @param boolean $allow
     * @return \Models\Validators\EmailAddressValidator
     */
    public function setQuotedString($allow = TRUE)
    {

      // Either allow (TRUE) or do not allow (FALSE) a quoted string local part
      $this->_quoted_string = $allow;

      // Return the \Models\Validators\EmailAddressValidator object
      return $this;

    }

    /**
     * Either allow (TRUE) or do not allow (FALSE) an obsolete local part
     *
     * @access public
     * @param boolean $allow
     * @return \Models\Validators\EmailAddressValidator
     */
    public function setObsolete($allow = TRUE)
    {

      // Either allow (TRUE) or do not allow (FALSE) an obsolete local part
      $this->_obsolete = $allow;

      // Return the \Models\Validators\EmailAddressValidator object
      return $this;

    }

    /**
     * Either require (TRUE) or do not require (FALSE) a basic domain name
     *
     * @access public
     * @param boolean $allow
     * @return \Models\Validators\EmailAddressValidator
     */
    public function setBasicDomainName($allow = TRUE)
    {

      // Either require (TRUE) or do not require (FALSE) a basic domain name
      $this->_basic_domain_name = $allow;

      // Return the \Models\Validators\EmailAddressValidator object
      return $this;

    }

    /**
     * Either allow (TRUE) or do not allow (FALSE) a domain literal domain
     *
     * @access public
     * @param boolean $allow
     * @return \Models\Validators\EmailAddressValidator
     */
    public function setDomainLiteral($allow = TRUE)
    {

      // Either allow (TRUE) or do not allow (FALSE) a domain literal domain
      $this->_domain_literal = $allow;

      // Return the \Models\Validators\EmailAddressValidator object
      return $this;

    }

    /**
     * Either allow (TRUE) or do not allow (FALSE) comments and folding white spaces
     *
     * @access public
     * @param boolean $allow
     * @return \Models\Validators\EmailAddressValidator
     */
    public function setCFWS($allow = TRUE)
    {

      // Either allow (TRUE) or do not allow (FALSE) comments and folding white spaces
      $this->_cfws = $allow;

      // Return the \Models\Validators\EmailAddressValidator object
      return $this;

    }

    /**
     * Return the regular expression for a dot atom local part
     *
     * @access private
     * @return string
     */
    private function _getDotAtom()
    {
      return "([!#-'*+\/-9=?^-~-]+)(?>\.(?1))*";
    }

    /**
     * Return the regular expression for a quoted string local part
     *
     * @access private
     * @return string
     */
    private function _getQuotedString()
    {
      return '"(?>[ !#-\[\]-~]|\\\[ -~])*"';
    }

    /**
     * Return the regular expression for an obsolete local part
     *
     * @access private
     * @return string
     */
    private function _getObsolete()
    {

      return '([!#-\'*+\/-9=?^-~-]+|"(?>'
        . $this->_getFWS()
        . '(?>[\x01-\x08\x0B\x0C\x0E-!#-\[\]-\x7F]|\\\[\x00-\xFF]))*'
        . $this->_getFWS()
        . '")(?>'
        . $this->_getCFWS()
        . '\.'
        . $this->_getCFWS()
        . '(?1))*';

    }

    /**
     * Return the regular expression for a domain name domain
     *
     * @access private
     * @return string
     */
    private function _getDomainName()
    {

      // Return the basic domain name format if required
      if ($this->_basic_domain_name)
      {

        return '(?>' . $this->_getDomainNameLengthLimit()
          . '[a-z0-9](?>[a-z0-9-]*[a-z0-9])?'
          . $this->_getCFWS()
          . '\.'
          . $this->_getCFWS()
          . '){1,2}[a-z]{2,6}';

      }

      // Otherwise return the full domain name format
      return $this->_getDomainNameLengthLimit()
        . '([a-z0-9](?>[a-z0-9-]*[a-z0-9])?)(?>'
        . $this->_getCFWS()
        . '\.'
        . $this->_getDomainNameLengthLimit()
        . $this->_getCFWS()
        . '(?2)){0,126}';

    }

    /**
     * Return the regular expression for an IPv6 address
     *
     * @access private
     * @return string
     */
    private function _getIPv6()
    {
      return '([a-f0-9]{1,4})(?>:(?3)){7}|(?!(?:.*[a-f0-9][:\]]){8,})((?3)(?>:(?3)){0,6})?::(?4)?';
    }

    /**
     * Return the regular expression for an IPv4-mapped IPv6 address
     *
     * @access private
     * @return string
     */
    private function _getIPv6v4()
    {
      return '(?3)(?>:(?3)){5}:|(?!(?:.*[a-f0-9]:){6,})(?5)?::(?>((?3)(?>:(?3)){0,4}):)?';
    }

    /**
     * Return the regular expression for an IPv4 address
     *
     * @access private
     * @return string
     */
    private function _getIPv4()
    {
      return '(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?6)){3}';
    }

    /**
     * Return the regular expression for a domain literal domain
     *
     * @access private
     * @return string
     */
    private function _getDomainLiteral()
    {

      return '\[(?:(?>IPv6:(?>'
        . $this->_getIPv6()
        . '))|(?>(?>IPv6:(?>'
        . $this->_getIPv6v4()
        . '))?'
        . $this->_getIPv4()
        . '))\]';

    }

    /**
     * Return either the regular expression for folding white spaces or its backreference
     *
     * @access private
     * @param boolean $define
     * @return string
     */
    private function _getFWS($define = FALSE)
    {

      // Return the backreference if $define is set to FALSE otherwise return the regular expression
      if ($this->_cfws)
      {
        return !$define ? '(?P>fws)' : '(?<fws>(?>[	 ]+(?>\x0D\x0A[	 ]+)*|(?>[	 ]*\x0D\x0A)?[	 ]+)?)';
      }

    }

    /**
     * Return the regular expression for comments
     *
     * @access private
     * @return string
     */
    private function _getComments()
    {

      return '(?<comment>\((?>'
        . $this->_getFWS()
        . '(?>[\x01-\x08\x0B\x0C\x0E-\'*-\[\]-\x7F]|\\\[\x00-\x7F]|(?P>comment)))*'
        . $this->_getFWS()
        . '\))';

    }

    /**
     * Return either the regular expression for comments and folding white spaces or its backreference
     *
     * @access private
     * @param boolean $define
     * @return string
     */
    private function _getCFWS($define = FALSE)
    {

      // Return the backreference if $define is set to FALSE
      if ($this->_cfws && !$define)
      {
        return '(?P>cfws)';
      }

      // Otherwise return the regular expression
      if ($this->_cfws)
      {

        return '(?<cfws>(?>(?>(?>'
          . $this->_getFWS(TRUE)
          . $this->_getComments()
          . ')+'
          . $this->_getFWS()
          . ')|'
          . $this->_getFWS()
          . ')?)';

      }

    }

    /**
     * Establish and return the valid format for the local part
     *
     * @access private
     * @return string
     */
    private function _getLocalPart()
    {

      // The local part may be obsolete if allowed
      if ($this->_obsolete)
      {
        return $this->_getObsolete();
      }

      // Otherwise the local part must be either a dot atom or a quoted string if the latter is allowed
      if ($this->_quoted_string)
      {
        return '(?>' . $this->_getDotAtom() . '|' . $this->_getQuotedString() . ')';
      }

      // Otherwise the local part must be a dot atom
      return $this->_getDotAtom();

    }

    /**
     * Establish and return the valid format for the domain
     *
     * @access private
     * @return string
     */
    private function _getDomain()
    {

      // The domain must be either a domain name or a domain literal if the latter is allowed
      if ($this->_domain_literal)
      {
        return '(?>' . $this->_getDomainName() . '|' . $this->_getDomainLiteral() . ')';
      }

      // Otherwise the domain must be a domain name
      return $this->_getDomainName();

    }

    /**
     * Return the email address length limit
     *
     * @access private
     * @return string
     */
    private function _getEmailAddressLengthLimit()
    {
      return '(?!(?>' . $this->_getCFWS() . '"?(?>\\\[ -~]|[^"])"?' . $this->_getCFWS() . '){255,})';
    }

    /**
     * Return the local part length limit
     *
     * @access private
     * @return string
     */
    private function _getLocalPartLengthLimit()
    {
      return '(?!(?>' . $this->_getCFWS() . '"?(?>\\\[ -~]|[^"])"?' . $this->_getCFWS() . '){65,}@)';
    }

    /**
     * Establish and return the domain name length limit
     *
     * @access private
     * @return string
     */
    private function _getDomainNameLengthLimit()
    {
      return '(?!' . $this->_getCFWS() . '[a-z0-9-]{64,})';
    }

    /**
     * Check to see if the domain can be resolved to MX RRs
     *
     * @access private
     * @param array $domain
     * @return integer|boolean
     */
    private function _verifyDomain($domain)
    {

      // Return 0 if the domain cannot be resolved to MX RRs
      if (!checkdnsrr(end($domain), 'MX'))
      {
        return 0;
      }

      // Otherwise return TRUE
      return TRUE;

    }

    /**
     * Perform the validation check on the email address's syntax and, if required, call _verifyDomain()
     *
     * @access public
     * @param boolean $verify
     * @return boolean|integer
     */
    public function isValid($verify = FALSE)
    {

      // Return FALSE if the email address has an incorrect syntax
      if (!preg_match(

          '/^'
        . $this->_getEmailAddressLengthLimit()
        . $this->_getLocalPartLengthLimit()
        . $this->_getCFWS()
        . $this->_getLocalPart()
        . $this->_getCFWS()
        . '@'
        . $this->_getCFWS()
        . $this->_getDomain()
        . $this->_getCFWS(TRUE)
        . '$/isD'
        , $this->_email_address

      ))
      {
        return FALSE;
      }

      // Otherwise check to see if the domain can be resolved to MX RRs if required
      if ($verify)
      {
        return $this->_verifyDomain(explode('@', $this->_email_address));
      }

      // Otherwise return 1
      return 1;

    }

  }

On creating the object, using either \Models\Validators\EmailAddressValidator::setEmailAddress($email_address) or new \Models\Validators\EmailAddressValidator($email_address), the default settings allow dot-atom@domain-name email addresses where the domain name may only have two or three labels and the top-level domain must be between two and six (alphabetic) characters inclusive in length. If the second (optional) parameter is set to 5321 then a quoted string local-part and a domain literal domain are allowed, as well as a more liberal domain name format. If the second (optional) parameter is set to 5322 then an obsolete local-part, a domain literal domain, and comments and folding white spaces are allowed, as well as a more liberal domain name format. To add a format, call its associated method with either no parameter or a true parameter. To remove a format, call its associated method with a false parameter. The setStandardBasic(), setStandard5321(), and setStandard5322() methods do not accept any parameters. To return the validation check (either 1 for valid or false for invalid), use the isValid() method. The following is a list of available settings:

setStandardBasic()   // A dot-atom local-part and a domain name domain are allowed
setStandard5321()    // A dot-atom or quoted string local-part and a domain name or domain literal domain are allowed
setStandard5322()    // An obsolete local-part, a domain name or domain literal domain, and comments and folding white spaces are allowed
setQuotedString()    // A quoted string local-part is allowed
setObsolete()        // An obsolete local-part is allowed
setBasicDomainName() // A basic domain name is required
setDomainLiteral()   // A domain literal domain is allowed
setCFWS()            // Comments and folding white spaces are allowed

If you pass a true parameter to the isValid() method then the _verifyDomain() method will be called to check to see if the domain can be resolved to MX RRs, but only if the email address is syntactically valid. If the verification is successful then the object will return true; if the verification is unsuccessful then the object will return 0.

Alternatively, for those who’d like just a simple regular expression which allows the majority of in-use email addresses, use the following:

  /**
   * Validate an email address using a basic standard
   *
   * @param string $email_address
   * @return integer
   */
  function is_valid_email_address($email_address)
  {
    return preg_match("/^(?!.{255,})(?!.{65,}@)([!#-'*+\/-9=?^-~-]+)(?>\.(?1))*@(?!.*[^.]{64,})(?>[a-z0-9](?>[a-z0-9-]*[a-z0-9])?\.){1,2}[a-z]{2,6}$/iD", $email_address);
  }

For the official documentation on email addresses, please see RFC 5321 and RFC 5322.

Tags: ,

Sunday, December 20th, 2009 PHP

18 Comments to Email Address Validation

  • Rae says:

    I found a code snippet you posted on linuxjournal.com for email validation.

    return preg_match(“/^(?=.{5,254})(?:(?:\”[^\"]{1,62}\”)|(?:(?!\.)(?!.*\.[.@])[a-z0-9!#$%&'*+\/=?^_`{|}~^.-]{1,64}))@(?:(?:\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9]?[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9]?[0-9])\])|(?:(?!-)(?!.*-\$)(?!.*-\.)(?!.*\.-)(?!.*[^n]–)(?!.*[^x]n–)(?!n–)(?!.*[^.]xn--)(?:[a-z0-9-]{1,63}\.){1,127}(?:[a-z0-9-]{1,63})))$/i”, $email);

    Would this be equivalent to this class you have on this site?

  • Michael says:

    No, that snippet lacks support for quoted string local parts, obsolete local parts, and IPv6 domain literal domains. Plus, it isn’t very efficient. The new regular expression is:

      function is_email_address($email_address) {
        return preg_match('/^(?!(?>\x22?(?>\x22\x40|\x5C?[\x00-\x7F])\x22?){255,})(?!(?>\x22?\x5C?[\x00-\x7F]\x22?){65,}@)(?>[\x21\x23-\x27\x2A\x2B\x2D\x2F-\x39\x3D\x3F\x5E-\x7E]+|(?>\x22(?>[\x01-\x08\x0B\x0C\x0E-\x1F\x21\x23-\x5B\x5D-\x7F]|\x5C[\x00-\x7F])*\x22))(?>\.(?>[\x21\x23-\x27\x2A\x2B\x2D\x2F-\x39\x3D\x3F\x5E-\x7E]+|(?>\x22(?>[\x01-\x08\x0B\x0C\x0E-\x1F\x21\x23-\x5B\x5D-\x7F]|\x5C[\x00-\x7F])*\x22)))*@(?>(?>(?!.*[^.]{64,})(?>(?>xn--)?[a-z0-9]+(?>-[a-z0-9]+)*\.){0,126}(?>xn--)?[a-z0-9]+(?>-[a-z0-9]+)*)|(?:\[(?>(?>IPv6:(?>(?>[a-f0-9]{1,4}(?>:[a-f0-9]{1,4}){7})|(?>(?!(?:.*[a-f0-9][:\]]){8,})(?>[a-f0-9]{1,4}(?>:[a-f0-9]{1,4}){0,6})?::(?>[a-f0-9]{1,4}(?>:[a-f0-9]{1,4}){0,6})?)))|(?>(?>IPv6:(?>(?>[a-f0-9]{1,4}(?>:[a-f0-9]{1,4}){5}:)|(?>(?!(?:.*[a-f0-9]:){6,})(?>[a-f0-9]{1,4}(?>:[a-f0-9]{1,4}){0,4})?::(?>[a-f0-9]{1,4}(?>:[a-f0-9]{1,4}){0,4}:)?)))?(?>25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?>25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}))\]))$/isD', $email_address);
      }
  • Rasmus says:

    This seems to miss the trivial case of a@foo which according to the RFCs and Dominic Sayers test cases should be invalid – http://www.dominicsayers.com/isemail/results.php

  • Michael says:

    Rasmus says: “This seems to miss the trivial case of a@foo which according to the RFCs and Dominic Sayers test cases should be invalid”.

    This is not the case. RFC 5321 says that “a domain name (or often just a “domain”) consists of one or more components, separated by dots if more than one appears. In the case of a top-level domain used by itself in an email address, a single string is used without any dots.”

    And a real-life example:

    checkdnsrr('ai', 'MX') // Returns true
    getmxrr('ai', $array) // Returns true
  • Rasmus says:

    Yes, but read further in that RFC you quoted. Section 2.3.5 says:

    Only resolvable, fully-qualified domain names (FQDNs) are permitted
    when domain names are used in SMTP. In other words, names that can
    be resolved to MX RRs or address (i.e., A or AAAA) RRs (as discussed
    in Section 5) are permitted, as are CNAME RRs whose targets can be
    resolved, in turn, to MX or address RRs. Local nicknames or
    unqualified names MUST NOT be used.

    Both Cal Henderson’s and Dominic Sayers’ validators label addresses of this form invalid.

  • Rasmus says:

    I guess the key here is the SMTP part. Are you checking for valid email addresses routeable on the public Internet or not. If you are, then this test has to fail as you can’t deliver to a TLD and if it isn’t a TLD then it must be a local nickname which is specifically disallowed by RFC 5321.

  • Rasmus says:

    So, here is my slightly modified version that enforces public Internet addresses:

    ‘/^(?!(?:(?:\x22?\x5C[\x00-\x7E]\x22?)|(?:\x22?[^\x5C\x22]\x22?)){255,})(?!(?:(?:\x22?\x5C[\x00-\x7E]\x22?)|(?:\x22?[^\x5C\x22]\x22?)){65,}@)(?:(?:[\x21\x23-\x27\x2A\x2B\x2D\x2F-\x39\x3D\
    x3F\x5E-\x7E]+)|(?:\x22(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x21\x23-\x5B\x5D-\x7F]|(?:\x5C[\x00-\x7F]))*\x22))(?:\.(?:(?:[\x21\x23-\x27\x2A\x2B\x2D\x2F-\x39\x3D\x3F\x5E-\x7E]+)|(?:\x22(?:[\x01-\x08\x0B\x0C\x0E-\x1
    F\x21\x23-\x5B\x5D-\x7F]|(?:\x5C[\x00-\x7F]))*\x22)))*@(?:(?:(?!.*[^.]{64,})(?:(?:(?:xn--)?[a-z0-9]+(?:-[a-z0-9]+)*\.){1,126}){1,}(?:(?:[a-z][a-z0-9]*)|(?:(?:xn--)[a-z0-9]+))(?:-[a-z0-9]+)*)|(?:\[(?:(?:IPv6:
    (?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){7})|(?:(?!(?:.*[a-f0-9][:\]]){7,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?)))|(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}
    ){5}:)|(?:(?!(?:.*[a-f0-9]:){5,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3}:)?)))?(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))(?:\.(?:(?:25[0-5])|(?:2
    [0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))){3}))\]))$/i’

  • Michael says:

    Some single-label domains can be resolved to MX RRs:

    checkdnsrr('ua', 'MX') // Returns true
    getmxrr('ua', $array) // Returns true

    And some examples of single-label domain email addresses:

    vince@ai
    paul@io
    root@km
    joost@tk
    admin@tt
    hostmaster@ua

    Source: Tony Finch – TLDs with MXs

  • Rasmus says:

    Right, I should have said cannot reliably be delivered to because of potential local hostname conflicts. There are actually quite a few RFC-defying addresses that work. gmail will happily deliver to foo.@example.com as well, for example, but that is also obviously an invalid address according to the RFCs. Of course, the HTML5 guys are allowing bogus addresses like that now. Check out their ABNF at: http://bit.ly/9eOdUm

  • Michael says:

    That doesn’t change the fact that they are syntactically valid (and that they are also in use). My code intends to allow every RFC 5322/5321 valid email address (excluding semantically invisible (unnecessary) content like folding white space and comments) and deny every invalid email address.

    If I am to also allow invalid email addresses (foo.@example.com, as you offer), then the entire purpose of an email address validator is moot.

    However, I accept that the RFCs allow impractical addresses like “My name”.is.”Michael”@[IPv6:FFFF::255.255.255.255], which is why I have also made a class which allows the developer to allow or disallow different types of local-parts and domains. Taking into account your replies, I have now also included a method to turn on and off single-label domain names (SetStrictDomain()).

  • james says:

    This code won’t run – there’s no SetStrictAtom function set. When you call the constructor, php throws an error. This is when you set strict = false.

    $x = EmailAddressValidator::SetEmailAddress($contact_email,false);

  • Michael says:

    Thanks for bringing that to my attention, James. I’m not sure how I missed that, but I’ve put it in.

  • Martin says:

    Hello,

    i dont understand this part.
    “Additionally, the top level domain must not begin with a number.”

    As far as i know 911.com is starting with a number and is a valid (and working) TLD.

  • Martin says:

    hahaha … fuck … forget it…

    mixed up domain and top level domain …
    Sorry. :)

  • Michael says:

    It turns out this regex is now the basis for PHP’s in-built filter_var() function. See source code

  • Chris says:

    Michael, very interesting but by the end I’m slightly confused. Is there some PHP code that I could use to trap, say 99% of false emails. It doesn’t have to be perfect but just easy to implement. I just want to ‘trim’ an email list before sending. Thanks in advance.

  • Michael says:

    If you have an array ($emails) of email addresses, try the following code:

    foreach ($emails as $key => $email) {
    
      if (EmailAddressValidator::SetEmailAddress($email, false)->Validate()) {
        mail($email, $subject, $message, $headers);
      }
    
      else {
        unset($emails[$key]);
      }
    
    }
    

    That will send an email to the valid addresses and remove from the array the invalid addresses.

  • [...] Es wird also ein recht komplexer Regex auf die E-Mail-Adresse angewandt. Unter dem im Kommentar genannten Link findet sich auch eine etwas ausführlichere Erklärung: Email Address Validator. [...]

  • Leave a Reply