The best regexp possible for email validation even in javascript
The best regexp possible for email validation even in javascriptFor a while I sometimes had a look on this page : http://fightingforalostcause.net/misc/2006/compare-email-regex.php
for a good regexp to validate emails.
I thought the idea was neat : finding some of the most used regexps on the web and compare them thanx to a good unit test with an interesting set of valid and unvalid emails.
I totally agree with Ian that "It's my philosophy that it's better to accept a few invalid addresses than reject any valid ones, so I'm shooting for 0 false-positives and as few false-negatives as possible."
Also sadly his winning regexp does not seem to work in javascript because of advanced regexp features that are not supported in javascript.
And since the second one on his list seemed simpler and easier to enhance while not far from the finishing line...
Here is my attempt to sanitize the world of email addresses :
based Warren Gaebel's regexp
Here are the results :
Should be Valid:
| l3tt3rsAndNumb3rs@domain.com | : | Valid |
| has-dash@domain.com | : | Valid |
| hasApostrophe.o'leary@domain.org | : | Valid |
| uncommonTLD@domain.museum | : | Valid |
| uncommonTLD@domain.travel | : | Valid |
| uncommonTLD@domain.mobi | : | Valid |
| countryCodeTLD@domain.uk | : | Valid |
| lettersInDomain@911.com | : | Valid |
| underscore_inLocal@domain.net | : | Valid |
| IPInsteadOfDomain@127.0.0.1 | : | Valid |
| IPAndPort@127.0.0.1:25 | : | Valid |
| subdomain@sub.domain.com | : | Valid |
| local@dash-inDomain.com | : | Valid |
| dot.inLocal@foo.com | : | Valid |
| a@singleLetterLocal.org | : | Valid |
| singleLetterDomain@x.org | : | Valid |
| &*=?^+{}'~@validCharsInLocal.net | : | Valid |
Should be NOT Valid :
| missingDomain@.com | : | Not Valid |
| @missingLocal.org | : | Not Valid |
| missingatSign.net | : | Not Valid |
| missingDot@com | : | Not Valid |
| two@@signs.com | : | Not Valid |
| colonButNoPort@127.0.0.1: | : | Not Valid |
| : | Not Valid | |
| someone-else@127.0.0.1.26 | : | Not Valid |
| .localStartsWithDot@domain.com | : | Not Valid |
| localEndsWithDot.@domain.com | : | Not Valid |
| two..consecutiveDots@domain.com | : | Not Valid |
| domainStartsWithDash@-domain.com | : | Not Valid |
| domainEndsWithDash@domain-.com | : | Valid |
| TLDDoesntExist@domain.moc | : | Not Valid |
| numbersInTLD@domain.c0m | : | Not Valid |
| missingTLD@domain. | : | Not Valid |
| ! "#$%(),/;<>[]`|@invalidCharsInLocal.org | : | Not Valid |
| invalidCharsInDomain@! "#$%(),/;<>_[]`|.org | : | Not Valid |
| local@SecondLevelDomainNamesAreInvalidIfTheyAreLongerThan64Charactersss.org | : | Valid |
Javascript Unit Test code
This way you can use the same regexp code in several languages including Javascript.
Note that it can be improved but the "domain-" case is a bit painful to be improved in a simple way. Indeed it's easy to forbid to finish a domain with a "-" if you divide your (sub)domain in three parts but in this case you would forbid domain names with only one character... resolving this without advanced tricks (lookahead/back) seems trickier.
Another thing to keep in mind is that it uses a list of pre-defined TLDs, when new TLDs are created, the regexp would need to be updated otherwise flag the new TLDs as invalid. But on the other hand, it's neat to be able to find typos in domain names sur as "@xx.infos" where it should have been "@xx.info".
So dont forget to add to the list if anything new comes in (greater then two characters).
If you dont want to update your old code then change :
to :
this works. (but adds a false negative in the unit test obviously since "moc" does not really exist on the net)