Should Rails be able to validate email out of the box?

We’ve all been there. You’re making a new Rails app, and you’re at the point where you have to validate email input. Uh oh. It’s not out of the box, and you have to go scouring the internet for either a Regex or a gem. — That, or if you’re methodical/experienced enough, you already have a go-to.

For many users, we’re sending them on their own to assess what a good email address should be, what rules to accept, etc. Those questions probably deserve some thought based on the use case. But I think this is a case of over-configuration here. Choosing from the many gems available poses quite a cognitive load for something that is neither critical nor that valuable.

I believe validating email is a task common enough that it should be included in our API. Would such a pull request be welcome? If so, what would be important to get right?

We might want to include a reasonable format by convention, and let the people who really care configure at their leisure.

As to the logic, I would say one of the main goals is to prevent accidental user error. A few years back, I thought I was home free with a popular gem, when a few emails resembling first..last@gmail.com bounced syntax errors on sending.

Here’s what I believe would catch a lot of entry mistakes while accepting 99.9% of submissions:

  • Disallow special characters besides commonly used ones (. _ - +)
  • Disallow quoting the local part ("<xyz>"@domain.com)
  • Disallow repetition of one non-alpha character (eg. .., __, etc.)

I could conceivably see ways to add or remove certain validation rules.

Now I’m sure there will be plenty of opinions and disagreement, I’m just trying to see if there’s some sort of consensus as to a reasonable convention that works for most end-developers and allows them to move on to more interesting problems and be happy.

As a reminder, my main aim when I develop for email validation is to be able to notify the user that what they’ve entered might not be what they wanted to enter.

3 Likes

Jonathan Allard via Rails rubyonrails@discoursemail.com writes:

We’ve all been there. You’re making a new Rails app, and you’re at the point where you have to validate email input. Uh oh. It’s not out of the box, and you have to go scouring the internet for either a Regex or a gem. — That, or if you’re methodical/experienced enough, you already have a go-to.

For many users, we’re sending them on their own to assess what a good email address should be, what rules to accept, etc. Those questions probably deserve some thought based on the use case. But I think this is a case of over-configuration here. Choosing from the many gems available poses quite a cognitive load for something that is neither critical nor that valuable.

I believe validating email is a task common enough that it should be included in our API. Would such a pull request be welcome? If so, what would be important to get right?

Yes we’ve all been there for sure. But what I found out after years of working with other people is that everyone has its own opinion about how an email should be validated.

I’m the kind of guy who think that the simplest validation is enough for an email. It’s hard to handle every possible cases and be sure that at some point you’ll have a user that is using a weird but still valid email address.

We might want to include a reasonable format by convention, and let the people who really care configure at their leisure.

As to the logic, I would say one of the main goals is to prevent accidental user error. A few years back, I thought I was home free with a popular gem, when a few emails resembling first..last@gmail.com bounced syntax errors on sending.

Here’s what I believe would catch a lot of entry mistakes while accepting 99.9% of submissions:

  • Disallow special characters besides commonly used ones (. _ - +)

This is a valid email address ~user/list-name@lists.sr.ht.

  • Disallow quoting the local part ("<xyz>"@domain.com)
  • Disallow repetition of one non-alpha character (eg. .., __, etc.)

I think that validating all possible valid email formats will be a bit more challenging.

If you simply ignore the 0.1% edge cases then you’ll end up having issues popping on the Rails issue tracker complaining that the provided email validation is buggy.

I could conceivably see ways to add or remove certain validation rules.

Now I’m sure there will be plenty of opinions and disagreement, I’m just trying to see if there’s some sort of consensus as to a reasonable convention that works for most end-developers and allows them to move on to more interesting problems and be happy.

As a reminder, my main aim when I develop for email validation is to be able to notify the user that what they’ve entered might not be what they wanted to enter.

I have to admit that I often ask to myself why there is no validation for email address available out-of-the-box but then I remember there are so many opinions out there about this topic that it would be hard to find something that would please everyone.

Beside the consensus let’s recall that Rails is an opinionated framework and that some controversial decisions were made in the past. So maybe that for a non-critical topic such as email format validation we can provide a validation. It would please a lot of developers for sure.

1 Like

I’ve definitely been there on every one of my apps! In fact, I’m dealing with an edge case right now! I think it would be great to include this in the Rails API.

If you simply ignore the 0.1% edge cases then you’ll end up having issues popping on the Rails issue tracker complaining that the provided email validation is buggy.

I totally agree, but is it possible to get to a 100% email validation by engaging the Rails community in improving the validation?

2 Likes

The way I have seen this sort of feature addition happen to Rails in the past is as follows:

  1. One or more gems are introduced that explore the feature. (check, there are LOTS of these)
  2. One or two make it to the top of the usage/contributions/versions heap. (unsure where the market stands)
  3. DHH or similar say, “Hey! we should Sherlock this!”
  4. It becomes part of the “Omakase” offering that is Rails.

I’ve also seen it go the other way, where functionality is stripped out of the core, anchored in an “optional” add-on gem, and used and maintained by a fervent few. (See ActiveResource.)

There is a deliberate tension between having an Everything Ready To Go framework, and having a Minimal, Composable framework. Neither extreme is going to satisfy everyone. Somewhere in the middle is an acceptable trade-off.

Also, recall: Devise’s default e-mail validator (very widely used) simply checks to see if the e-mail address contains a [string, @, string] sandwich in there somewhere.

Walter

1 Like

Maybe @dhh will have an abstraction from Hey for Rails!

Maybe! The way I have seen this play out, at least in the project I was involved with (VanillaJS, becoming RailsJS) was that the idea was implemented in core, but the code was refactored or wholesale replaced. I had wanted to mention that part, around step 4, but forgot to include it.

Walter

Thanks for your thoughful points!

Maybe so. I’m not sure what share of developers really care about the format of emails beyond a few “reasonable” rules.

Besides, nobody would be forced to use the built-in validators. If you don’t use them and keep using current gems, it’s all good! I believe we should just be offering a simple option.

Proposed goals: reduce developer time spent on email validation, and helping end users catch their own mistakes

You are correct. The thing is I’ve seen real-world examples of this kind of validation being not very useful for catching user mistakes, leading in not being able to reach them.

Let me summarize my aims once more:

  • Reasonable defaults for people who don’t care, that work for >99.9% of users in general public applications
  • Easily configurable options for people who do care
  • Using defaults catches common mistakes by end users and help them enter a reachable email

My aim is not—unfortunately for some—to allow every single obscure spec-valid to pass validation. At least not with the “boring” validator. Still, I believe we could still get there with a terse syntax.

The cognitive load of searching for, evaluating, and installing one more gem

What I’m also trying to avoid is the gem search headache, and installing yet another gem™.

I understand that we don’t want the library to be too large. However I would say that this is one of the features that are both frequent and light enough to amply justify it.

That’s a good point. Honestly, I was on the verge of making a gem before coming here. What’s holding me back is that this would make yet one more gem, which I’m not sure is very helpful to findability. It’s basically—unfortunately—a marketing problem. (The Standards XKCD also comes to mind)

As someone who’s particularly sensitive to (brain) working memory and cognitive load issues, writing one more gem wouldn’t help me a lot if I wasn’t the author and had to find it at the bottom of a Google search.

A proposed model: configurable strictness and specific rules

If I look at gem email_address, several “types” of formats are mentioned. Stricter and less strict.

I see two configuration models here:

  • Choose your own level of “strictness”
  • Allow or disallow certain “rules”

That follows my mantra of libraries that should offer you to care as much (or as little) as you want. (Basically convention over configuration, with configuration easy)

I’m pretty sure we can boil down even most validation preferences of you folks down to a few keywords.

So one could write:

class Record < ApplicationRecord
  # Simplest
  validates_email_syntax_of :email

  # More opinions
  validates_email_syntax_of :email, strictness: :only_require_at_sign

  validates_email_syntax_of :email, allow: :consecutive_symbols
end

So for instance,

we could write something like:

validates_email_syntax_of :email, allow: :extended_symbols

Enumerating the different strictness levels and rules

So, what are the clusters of opinions on emails? If I say there are two types of developers: those who only need an at-sign, and those who only allow conventional-looking emails; am I right? — Surely not, but the exercise is just to think about the landscape of what’s allowable and how we can model/simplify it down to a few clusters of opinion, added with some small variations (rules).

Time well spent?

In the end, and that’s purely opinion, but I do not think it is time well spent as a developer to take a few hours to think about obscure email formats. Nor do I think it’s a good idea to have every Rails developer either think about it, or take a half hour to search-evaluate-install-configure an email gem.

I don’t even think it deserves this many opinions. For most (unspecialized) cases. Okay, two or three schools of thought. But for every developer to have their own homebrewed version of email validity is too much in my opinion. The spec doesn’t cut it with respect to the real world.

Controversial opinion aside, I believe it is squarely within Rails philosophy to offer simple, built-in invocations for a large majority of devs. We could accomodate even those who care.

1 Like

How about spending zero hours on “validating” and just send out a confirmation email and be done with it?

A technically “valid” email address that’s unusable – non-existent domain, user lost access, whatever – is of no value anyway.

1 Like

Just to hopefully clarify and to answer your question: to be able to notify the user who has made a mistake typing their email.

1 Like

?? which, again, is something your “validation” is not guaranteed to detect, so why bother?

I mean, you asked, and my viewpoint is no, there’s no value in adding this to Rails.

(I didn’t mean to offend with my previous reply, I just wanted to make the picture clearer. I edited it for extra clarity.)

To answer your question, we’re indeed not guaranteed to catch 100% (nothing is ever 100%), but I’m convinced we can add a validation that catches a huge majority mistakes while denying very few accurately-entered emails. This is all about tradeoffs.

This is a very tricky problem:

At discourse we use this regex:

We certainly do not use this regex even though we know it exists.

I think there is much truth here.

Even if Rails included this regex exactly and named it (:sort_of_valid_email_address), we would still use our own version as the Rails version would certainly evolve over the years and we like control over how lax/strict we are with email validation.

I am not sure getting the right flavor here is worth the effort. It is possible the Mail gem already has something.

Maybe a 100% RFC compliant mode and a simple validates_email should be added for people that want strict email compliance. At least when you are talking RFC there is very little interpretation going on.

6 Likes

Hmm, that’s a good point about not wanting Rails upgrades/changing the internal regex rendering the current data invalid. There’s probably a good case to be made that this should ideally be explicit.

On the other hand, if one doesn’t really care, I don’t see why this would start to be an issue.

For now, it leads me to the conclusion that if such a thing is included, either: email validation rules are not explicit and can change; or there is a deterministic way from invocation to validation rules. (Yes, regex is a way to do that. – Another inspiration could be migration DSL versioning, it inherits from AR::Migration[6.0] and so on.)

I’m wondering – out loud – whether a simpler invocation/process is possible, and whether a Rails macro is the right tool for making development easier.

(Mind you, I still expect large, mature, or specialized applications such as Discourse to be those who specify email validation more tightly. In my mind, it should be proportional to the complexity of the whole app.)

In practice we don’t find our lax email test a problem, all emails must actually work for an account to be activated. So the 100% reliable test is to send a message to an email and get the users to click on a validation link.

The lax test is enough to catch outlier issues like typos and so on.

1 Like

I lean towards no. As others have said, the way to validate an email address is to send an email to it, because there are plenty of valid-per-RFC email addresses that aren’t usable in practice due to middlebox incompatibilities and non-compliance.

I do see the usefulness of outright rejecting mistyped or obviously-not-usable addresses, so I guess I could be convinced. But it seems like the built-in regex would have to be too broad to be much help.

Re. Hey, we aren’t validating email addresses by regex currently—we only require the @. We’re relying on our mail servers bouncing messages to non-deliverable addresses.

6 Likes

Thanks for your thoughtful opinions @samsaffron and @georgeclaghorn!

So it looks like including it in the library isn’t the right move for now.

It seems like what would help the most at this point would be to just make the information more accessible, easier to find, and simple to implement/include. Answering questions like: What are the different validation styles? What do I need? Should I follow the RFC? If not, why and how? What regex should I use/should I use only regex? and so on

I’ll try to write/make something about that soon. Meanwhile, more pointers or questions are welcome!

2 Likes

Just wanted to note that HTML5 has its own regex for email validation that is not strictly RFC compliant but is used for email form inputs. That’d be another option for an easy to use Rails default that matches the default client side behavior of forms.

7 Likes

I signed up for a website where the account creation allowed my email but the login form rejects it, and the account reset process broke. So now I’m completely locked out.

Getting email validation wrong is infuriating, and getting it right might actually be impossible.

1 Like

It sure is. I’m curious, is there a feature your email has that trips some filters up?

The fact that the RFC is so out there (and HTML validation is different) seems problematic.

Gmail +tags to automatically label incoming email.