auto_link failing on URLs with square brackets in query params

Patch included in the ticket.

http://rails.lighthouseapp.com/projects/8994-ruby-on-rails/tickets/1353-auto_link-helper-fails-to-parse-some-urls-with-brackets

Rich

Something sensible needs to be done to auto_link_urls. It's a hideous regular expression which no one can hope to read, hacked around over the years to add characters here and there but there's always something it doesn't handle. Could we not rewrite this from scratch starting from the grammar given in the RFC on urls (or some other more complete solution than lobbing a character in here or there)?

Fred

Something sensible needs to be done to auto_link_urls. It's a hideous
regular expression which no one can hope to read, hacked around over
the years to add characters here and there but there's always
something it doesn't handle. Could we not rewrite this from scratch
starting from the grammar given in the RFC on urls (or some other more
complete solution than lobbing a character in here or there)?

Indeed I think it's about time to consider rejigging how all that code
functions. Perhaps it's worth looking into adopting addressable or
something similar? The URI library from stdlib has some nastiness
with it's parsing and matching code.

Care to take a look into it?

Something sensible needs to be done to auto_link_urls. It's a hideous
regular expression which no one can hope to read, hacked around over
the years to add characters here and there but there's always
something it doesn't handle. Could we not rewrite this from scratch
starting from the grammar given in the RFC on urls (or some other
more
complete solution than lobbing a character in here or there)?

Indeed I think it's about time to consider rejigging how all that code
functions. Perhaps it's worth looking into adopting addressable or
something similar? The URI library from stdlib has some nastiness
with it's parsing and matching code.

Care to take a look into it?

Don't see why not. I'll see what approaches I can dig up.

Fred

The problem is that URLs allow much more characters that people actually want to be autolinked with the URL. Common pitfall is the regexp capturing the punctuation after the URL in a sentence.

It’s all been nicely described on Coding Horror recently: http://www.codinghorror.com/blog/archives/001181.html

And more recently, there are URLs – even domain names – that include Unicode characters, not that browser support for it is on the rise.

I’d say: let’s autolink everything that starts with “https?://” or “www.” up to the first whitespace or punctuation character before the whitespace if that character isn’t a closing parenthesis or bracket that has a matching one on the beginning.

Whitelisting characters will always leave someone out, I’m afraid … But my solution is just a quick thought, there may be tons of cases where it wouldn’t be appropriate.

Typo. I meant to say “…now* that browser support for it is on the rise”

Could we not rewrite this from scratch
starting from the grammar given in the RFC on urls (or some other more
complete solution than lobbing a character in here or there)?

The problem is that URLs allow much more characters that people actually want to be autolinked with the URL. Common pitfall is the regexp capturing the punctuation after the URL in a sentence.

It's all been nicely described on Coding Horror recently: http://www.codinghorror.com/blog/archives/001181.html

That is indeed very timely!

And more recently, there are URLs -- even domain names -- that include Unicode characters, not that browser support for it is on the rise.

I'd say: let's autolink everything that starts with "https?://" or "www." up to the first whitespace or punctuation character before the whitespace *if* that character isn't a closing parenthesis or bracket that has a matching one on the beginning.

Whitelisting characters will always leave someone out, I'm afraid ... But my solution is just a quick thought, there may be tons of cases where it wouldn't be appropriate.

I think it's certainly in the right direction.

Fred

Patched: http://rails.lighthouseapp.com/projects/8994-ruby-on-rails/tickets/1353-auto_link-helper-fails-to-parse-some-urls-with-brackets#ticket-1353-3

Previously hideous regexp becomes this: %r{( https?:// | www. ) [^\s<]+}x

Added feature: intelligent closing bracket handling. You are now able to write this:

I’ve read an interesting article about

sprites (link: http://en.wikipedia.org/wiki/Sprite_(computer_graphics))

All this in less code than before.

Just as an FYI - this is awesome. People have always bitched at me
about the auto-linking stuff in my apps, and this looks like it solves
all their problems.

Thank you!
- Trevor