Here is something Ive never seen before:
I have a list of urls fed into mechanize (which uses net/http to grab pages)
I have it setup as thus: require 'mechanize' agent = WWW::Mechanize.new; error_count = 0 begin Timeout::timeout(2) { @tracked_page = agent.get("http://#{site_url}") } rescue Timeout::Error => timeout_error puts "I TIMED OUT AFTER 2 SECS BUT IM TRYING AGAIN: #{timeout_error}" error_count += 1 if error_count < 5 puts "ATTEMPT NUMBER #{@error_count} QUITTING AFTER 4 TRIES" retry end end
This is all well and good, it works fine and catches any timeout exceptions, except when its trying to deal with one particular URL (www.webdevking.com).
This URL is not currently resolving to any host. it returns "unknown host" when you try to connect to it.
For some reason, when I run this code on the url: www.webdevking.com it hangs for upwards of 30 seconds. For some reason, it fails to obey the Timeout of two seconds. I've tried setting the mechanize timeout: agent.open_timeout and agent.read_timeout, but these aren't obeyed either. The timeout is obeyed for every URL but that one.
I would like to get to the root of this problem, because a 30 second slowdown when process a batch of urls is unacceptable.
Any ideas?!?!
Thanks in advance.