Photos:

China pictures processed

 
0%
 
2009.03.11 @16:27 

The best regexp possible for email validation even in javascript

The best regexp possible for email validation even in javascript

For a while I sometimes had a look on this page : http://fightingforalostcause.net/misc/2006/compare-email-regex.php

for a good regexp to validate emails.

I thought the idea was neat : finding some of the most used regexps on the web and compare them thanx to a good unit test with an interesting set of valid and unvalid emails.

I totally agree with Ian that "It's my philosophy that it's better to accept a few invalid addresses than reject any valid ones, so I'm shooting for 0 false-positives and as few false-negatives as possible."

Also sadly his winning regexp does not seem to work in javascript because of advanced regexp features that are not supported in javascript.

And since the second one on his list seemed simpler and easier to enhance while not far from the finishing line...

Here is my attempt to sanitize the world of email addresses :

based Warren Gaebel's regexp

Here are the results :

Should be Valid:

l3tt3rsAndNumb3rs@domain.com : Valid
has-dash@domain.com : Valid
hasApostrophe.o'leary@domain.org : Valid
uncommonTLD@domain.museum : Valid
uncommonTLD@domain.travel : Valid
uncommonTLD@domain.mobi : Valid
countryCodeTLD@domain.uk : Valid
lettersInDomain@911.com : Valid
underscore_inLocal@domain.net : Valid
IPInsteadOfDomain@127.0.0.1 : Valid
IPAndPort@127.0.0.1:25 : Valid
subdomain@sub.domain.com : Valid
local@dash-inDomain.com : Valid
dot.inLocal@foo.com : Valid
a@singleLetterLocal.org : Valid
singleLetterDomain@x.org : Valid
&*=?^+{}'~@validCharsInLocal.net : Valid

 

Should be NOT Valid :

missingDomain@.com : Not Valid
@missingLocal.org : Not Valid
missingatSign.net : Not Valid
missingDot@com : Not Valid
two@@signs.com : Not Valid
colonButNoPort@127.0.0.1: : Not Valid
  : Not Valid
someone-else@127.0.0.1.26 : Not Valid
.localStartsWithDot@domain.com : Not Valid
localEndsWithDot.@domain.com : Not Valid
two..consecutiveDots@domain.com : Not Valid
domainStartsWithDash@-domain.com : Not Valid
domainEndsWithDash@domain-.com : Valid
TLDDoesntExist@domain.moc : Not Valid
numbersInTLD@domain.c0m : Not Valid
missingTLD@domain. : Not Valid
! "#$%(),/;<>[]`|@invalidCharsInLocal.org : Not Valid
invalidCharsInDomain@! "#$%(),/;<>_[]`|.org : Not Valid
local@SecondLevelDomainNamesAreInvalidIfTheyAreLongerThan64Charactersss.org : Valid

 

Javascript Unit Test code

This way you can use the same regexp code in several languages including Javascript.

Note that it can be improved but the "domain-" case is a bit painful to be improved in a simple way. Indeed it's easy to forbid to finish a domain with a "-" if you divide your (sub)domain in three parts but in this case you would forbid domain names with only one character... resolving this without advanced tricks (lookahead/back) seems trickier.

Another thing to keep in mind is that it uses a list of pre-defined TLDs, when new TLDs are created, the regexp would need to be updated otherwise flag the new TLDs as invalid. But on the other hand, it's neat to be able to find typos in domain names sur as "@xx.infos" where it should have been "@xx.info".

So dont forget to add to the list if anything new comes in (greater then two characters).

If you dont want to update your old code then change :

to :

this works. (but adds a false negative in the unit test obviously since "moc" does not really exist on the net)

"No Comments"
Post a Comment
Author : All fields are optional apart from the comment itself.
Website :
Email : Email will not be displayed nor given to anyone but the blog's author.
Tracking : If email is provided you can receive an alert when someone answers. In this email you will be able to disable it.
Comment :