Extract Domain name (url)

Hello Friends,
I need to write a regular expression which will extract and return the domain name.

for example
if a user parse any of the below mention url it should save only “foo.com

http://www.foo.com/
http://www.foo.com/something
http://foo.com/
https://something.foo.com/

Thanks for any help…

Thanks
abhis

Good way to Start is trying it to learn on online Regular Expression Editor

http://rubular.com

Hey srinivas,

Thanks for reply.

Somehow I am able to get the outpout, but the only problem is that i have to define all the uk|com|net|org|in

So just trying to figure out which will be the best way to get the output.

url_pattern = /^(?:.+?.)+(.+?.(?:co.uk|com|net|org|in))(:[0-9]{2,5})?/.$/is
url = “http://www.foo.com
url_pattern.match(url)
$1 #=> “foo.com

Thanks
Abhishek

Hello Friends,
I need to write a regular expression which will extract and return the domain name.

for example
if a user parse any of the below mention url it should save only “foo.com

http://www.foo.com/
http://www.foo.com/something
http://foo.com/

https://something.foo.com/

Thanks for any help…

Thanks
abhis

require ‘uri’

urls = [ “http://www.foo.com/”, “http://www.foo.com/something”, “http://foo.com/”, “https://something.foo.com/” ]

urls.each { |url| puts URI::parse( url ).host.split( “.” )[-2,2].join(".") }

Good luck,

-Conrad

Hi Abhishek

  You can try using Addressable gem for your requirement .

Step 1 : Install Addressable gem with the following command .

          $sudo gem install addressable

Step 2 : Will be explaining with IRB u can try and integrate with
your rails application .

            $ irb
             > require 'rubygems'
             > require 'addressable/uri'
             > uri = Addressable::URI.parse("http://google.com")
                  => #<Addressable::URI:0xfdb9aee5c URI:http://google.com>

Step 3 : You can extract only the host with the following command

            > uri.host
             => "google.com"

There are many other different options which you can explore
http://addressable.rubyforge.org/api/classes/Addressable/URI.html

Hope this helps !

Best regards,
Srinivas Iyer
http://talkonsomething.com
http://twitter.com/srinivasiyermv

Hi Abhishek

You can try using Addressable gem for your requirement .

Step 1 : Install Addressable gem with the following command .

      $sudo gem install  addressable

Step 2 : Will be explaining with IRB u can try and integrate with

your rails application .

        $ irb

         > require 'rubygems'

         > require 'addressable/uri'

         >  uri = Addressable::URI.parse("[http://google.com](http://google.com)")

              => #<Addressable::URI:0xfdb9aee5c URI:[http://google.com](http://google.com)>

Step 3 : You can extract only the host with the following command

        > uri.host

         => "[google.com](http://google.com)"

There are many other different options which you can explore

http://addressable.rubyforge.org/api/classes/Addressable/URI.html

Hope this helps !

Best regards,

Srinivas Iyer

http://talkonsomething.com

http://twitter.com/srinivasiyermv

Hi, the addressable gem doesn’t produce the domain part of the web address. For example,

irb(main):002:0> require ‘addressable/uri’

=> true

irb(main):003:0> uri = Addressable::URI.parse(“http://www.usc.edu/home.html” )

=> #<Addressable::URI:0x90e89c URI:http://www.usc.edu/home.html>

irb(main):004:0> uri.host

=> “www.usc.edu

-Conrad

irb(main):002:0> require 'addressable/uri'
=> true
irb(main):003:0> uri =
Addressable::URI.parse("http://www.usc.edu/home.html" )
=> #<Addressable::URI:0x90e89c URI:http://www.usc.edu/home.html>
irb(main):004:0> uri.host
=> "www.usc.edu"

# Given a URL, return a domain
  def self.url_to_domain(url)
    begin
      host = URI.parse(self.fix_url(url)).host
      host.gsub(/\Awww\./, "")
    rescue
      ""
    end
  end

Oops, forgot to add the other function i was using:
# Prepend URL with http if necessary
  def self.fix_url(u)
    !!( u !~ /\A(?:http:\/\/|https:\/\/)/i ) ? "http://#{u}" : u
  end

Note that you need to require uri:

require 'uri'

I put this in a module called Utilities so the whole thing is:

require 'uri'

module Utilities

  # Given a URL, return a domain
  def self.url_to_domain(url)
    begin
      host = URI.parse(self.fix_url(url)).host
      host.gsub(/\Awww\./, "")
    rescue
      ""
    end
  end

  # Prepend URL with http if necessary
  def self.fix_url(u)
    !!( u !~ /\A(?:http:\/\/|https:\/\/)/i ) ? "http://#{u}" : u
  end

end

And you call it with Utilities::url_to_domain(u)

Hello
Thanks friends for a superb solutions. Really appreciated.

Thanks
Abhis

I faced the exact same situation a while ago, here's what I came up
with after reading the rest of this thread:

#!/usr/bin/env ruby

require 'uri'

module DomainExtractor
    VALID_GENERIC_SUFIXES_RE = /^(com|net|org|co)$/

    def self.extract(url)
        u = fix_url(url)
        uri = URI::parse(u)
        domain = uri.host
        chunks = domain.split('.')

        if ! (chunks[-1] =~ VALID_GENERIC_SUFIXES_RE).nil?
            domain = chunks[-2, 2].join('.')
        elsif ! (chunks[-2] =~ VALID_GENERIC_SUFIXES_RE).nil?
            domain = chunks[-3, 3].join('.')
        else
            domain = ""
        end
        domain.gsub(/\^www\./, "")
    rescue
      ""
    end

    def self.fix_url(url)
        !!( url !~ /\A(?:http:\/\/|https:\/\/)/i ) ? "http://#{url}" :
url
    end
end

# test
urls = [
    "http://google.com",
    "http://www.google.com",
    "http://google.com.uy",
    "http://www.google.com.uy",
    "http://google.com.uy/index.html",
    "http://subdomain1.google.com.uy/index.html",
    "http://subdomain1.subdomain2.google.com",
    "http://www.subdomain1.google.com.uy/index.html",
    "http://subdomain1.google.net/index.html",
    "http://subdomain1.sub2.sub3.google.org.kz?test=3",
    "http://kb.mediatemple.net/questions/251/Running+rake+tasks+from
+cron",
    "https://creaproject.basecamphq.com/projects/3620850/todo_items/
413078/comments",
    "google.com",
    "google.com.uy",
    "google.com.uy/index.php",
    "sub1.sub2.google.com.uy?test=value",
    "www.sub1.sub2.google.com.uy?test=value",
    "http://sub1.sub2.google.com.uy?test=value",
    "http://www.wwwsub1.sub2.google.com.uy?test=value"
]

urls.each do |url|
    puts
    puts "URL : #{url}"
    result = DomainExtractor::extract(url)
    puts "result: #{result}"
end