Hello Friends,
I need to write a regular expression which will extract and return the domain name.
for example
if a user parse any of the below mention url it should save only “foo.com ”
http://www.foo.com/
http://www.foo.com/something
http://foo.com/
https://something.foo.com/
Thanks for any help…
Thanks
abhis
Good way to Start is trying it to learn on online Regular Expression Editor
http://rubular.com
Hey srinivas,
Thanks for reply.
Somehow I am able to get the outpout, but the only problem is that i have to define all the uk|com|net|org|in
So just trying to figure out which will be the best way to get the output.
url_pattern = /^(?:.+?.)+(.+?.(?:co.uk|com|net|org|in))(:[0-9]{2,5})?/. $/is
url = “http://www.foo.com ”
url_pattern.match(url)
$1 #=> “foo.com ”
Thanks
Abhishek
Con
(Con)
November 12, 2009, 7:30am
4
Hello Friends,
I need to write a regular expression which will extract and return the domain name.
for example
if a user parse any of the below mention url it should save only “foo.com ”
http://www.foo.com/
http://www.foo.com/something
http://foo.com/
https://something.foo.com/
Thanks for any help…
Thanks
abhis
require ‘uri’
urls = [ “http://www.foo.com/ ”, “http://www.foo.com/something ”, “http://foo.com/ ”, “https://something.foo.com/ ” ]
urls.each { |url| puts URI::parse( url ).host.split( “.” )[-2,2].join(“.”) }
Good luck,
-Conrad
Hi Abhishek
You can try using Addressable gem for your requirement .
Step 1 : Install Addressable gem with the following command .
$sudo gem install addressable
Step 2 : Will be explaining with IRB u can try and integrate with
your rails application .
$ irb
> require 'rubygems'
> require 'addressable/uri'
> uri = Addressable::URI.parse("http://google.com ")
=> #<Addressable::URI:0xfdb9aee5c URI:http://google.com >
Step 3 : You can extract only the host with the following command
> uri.host
=> "google.com"
There are many other different options which you can explore
http://addressable.rubyforge.org/api/classes/Addressable/URI.html
Hope this helps !
Best regards,
Srinivas Iyer
http://twitter.com/srinivasiyermv
Con
(Con)
November 12, 2009, 10:31am
6
Hi Abhishek
You can try using Addressable gem for your requirement .
Step 1 : Install Addressable gem with the following command .
$sudo gem install addressable
Step 2 : Will be explaining with IRB u can try and integrate with
your rails application .
$ irb
> require 'rubygems'
> require 'addressable/uri'
> uri = Addressable::URI.parse("[http://google.com](http://google.com)")
=> #<Addressable::URI:0xfdb9aee5c URI:[http://google.com](http://google.com)>
Step 3 : You can extract only the host with the following command
> uri.host
=> "[google.com](http://google.com)"
There are many other different options which you can explore
http://addressable.rubyforge.org/api/classes/Addressable/URI.html
Hope this helps !
Best regards,
Srinivas Iyer
http://talkonsomething.com
http://twitter.com/srinivasiyermv
Hi, the addressable gem doesn’t produce the domain part of the web address. For example,
irb(main):002:0> require ‘addressable/uri’
=> true
irb(main):003:0> uri = Addressable::URI.parse(“http://www.usc.edu/home.html ” )
=> #<Addressable::URI:0x90e89c URI:http://www.usc.edu/home.html >
irb(main):004:0> uri.host
=> “www.usc.edu ”
-Conrad
Tony
(Tony)
November 12, 2009, 2:59pm
8
# Given a URL, return a domain
def self.url_to_domain(url)
begin
host = URI.parse(self.fix_url(url)).host
host.gsub(/\Awww\./, "")
rescue
""
end
end
Tony
(Tony)
November 12, 2009, 3:16pm
9
Oops, forgot to add the other function i was using:
# Prepend URL with http if necessary
def self.fix_url(u)
!!( u !~ /\A(?:http:\/\/|https:\/\/)/i ) ? "http://#{u}" : u
end
Note that you need to require uri:
require 'uri'
I put this in a module called Utilities so the whole thing is:
require 'uri'
module Utilities
# Given a URL, return a domain
def self.url_to_domain(url)
begin
host = URI.parse(self.fix_url(url)).host
host.gsub(/\Awww\./, "")
rescue
""
end
end
# Prepend URL with http if necessary
def self.fix_url(u)
!!( u !~ /\A(?:http:\/\/|https:\/\/)/i ) ? "http://#{u}" : u
end
end
And you call it with Utilities::url_to_domain(u)
Hello
Thanks friends for a superb solutions. Really appreciated.
Thanks
Abhis
fort
(fort)
November 23, 2009, 8:27pm
11
I faced the exact same situation a while ago, here's what I came up
with after reading the rest of this thread:
#!/usr/bin/env ruby
require 'uri'
module DomainExtractor
VALID_GENERIC_SUFIXES_RE = /^(com|net|org|co)$/
def self.extract(url)
u = fix_url(url)
uri = URI::parse(u)
domain = uri.host
chunks = domain.split('.')
if ! (chunks[-1] =~ VALID_GENERIC_SUFIXES_RE).nil?
domain = chunks[-2, 2].join('.')
elsif ! (chunks[-2] =~ VALID_GENERIC_SUFIXES_RE).nil?
domain = chunks[-3, 3].join('.')
else
domain = ""
end
domain.gsub(/\^www\./, "")
rescue
""
end
def self.fix_url(url)
!!( url !~ /\A(?:http:\/\/|https:\/\/)/i ) ? "http://#{url}" :
url
end
end
# test
urls = [
"http://google.com ",
"http://www.google.com ",
"http://google.com.uy ",
"http://www.google.com.uy ",
"http://google.com.uy/index.html" ,
"http://subdomain1.google.com.uy/index.html" ,
"http://subdomain1.subdomain2.google.com ",
"http://www.subdomain1.google.com.uy/index.html" ,
"http://subdomain1.google.net/index.html" ,
"http://subdomain1.sub2.sub3.google.org.kz?test=3" ,
"Email Troubleshooting - Missing Emails | Media Temple Community
+cron",
"https://creaproject.basecamphq.com/projects/3620850/todo_items/
413078/comments",
"google.com",
"google.com.uy",
"google.com.uy/index.php",
"sub1.sub2.google.com.uy?test=value",
"www.sub1.sub2.google.com.uy?test=value",
"http://sub1.sub2.google.com.uy?test=value" ,
"http://www.wwwsub1.sub2.google.com.uy?test=value" ;
]
urls.each do |url|
puts
puts "URL : #{url}"
result = DomainExtractor::extract(url)
puts "result: #{result}"
end