Extract Domain name (url)

Hello Friends, I need to write a regular expression which will extract and return the domain name.

for example if a user parse any of the below mention url it should save only “foo.com

http://www.foo.com/ http://www.foo.com/something http://foo.com/ https://something.foo.com/

Thanks for any help…

Thanks abhis

Good way to Start is trying it to learn on online Regular Expression Editor


Hey srinivas,

Thanks for reply.

Somehow I am able to get the outpout, but the only problem is that i have to define all the uk|com|net|org|in

So just trying to figure out which will be the best way to get the output.

url_pattern = /^(?:.+?.)+(.+?.(?:co.uk|com|net|org|in))(:[0-9]{2,5})?/.$/is url = “http://www.foo.com” url_pattern.match(url) $1 #=> “foo.com

Thanks Abhishek

require ‘uri’

urls = [ “http://www.foo.com/”, “http://www.foo.com/something”, “http://foo.com/”, “https://something.foo.com/” ]

urls.each { |url| puts URI::parse( url ).host.split( “.” )[-2,2].join(“.”) }

Good luck,


Hi Abhishek

  You can try using Addressable gem for your requirement .

Step 1 : Install Addressable gem with the following command .

          $sudo gem install addressable

Step 2 : Will be explaining with IRB u can try and integrate with your rails application .

            $ irb              > require 'rubygems'              > require 'addressable/uri'              > uri = Addressable::URI.parse("http://google.com")                   => #<Addressable::URI:0xfdb9aee5c URI:http://google.com>

Step 3 : You can extract only the host with the following command

            > uri.host              => "google.com"

There are many other different options which you can explore http://addressable.rubyforge.org/api/classes/Addressable/URI.html

Hope this helps !

Best regards, Srinivas Iyer


Hi Abhishek

Hi, the addressable gem doesn’t produce the domain part of the web address. For example,

irb(main):002:0> require ‘addressable/uri’

=> true

irb(main):003:0> uri = Addressable::URI.parse(“http://www.usc.edu/home.html” )

=> #<Addressable::URI:0x90e89c URI:http://www.usc.edu/home.html>

irb(main):004:0> uri.host

=> “www.usc.edu


# Given a URL, return a domain   def self.url_to_domain(url)     begin       host = URI.parse(self.fix_url(url)).host       host.gsub(/\Awww\./, "")     rescue       ""     end   end

Oops, forgot to add the other function i was using: # Prepend URL with http if necessary   def self.fix_url(u)     !!( u !~ /\A(?:http:\/\/|https:\/\/)/i ) ? "http://#{u}" : u   end

Note that you need to require uri:

require 'uri'

I put this in a module called Utilities so the whole thing is:

require 'uri'

module Utilities

  # Given a URL, return a domain   def self.url_to_domain(url)     begin       host = URI.parse(self.fix_url(url)).host       host.gsub(/\Awww\./, "")     rescue       ""     end   end

  # Prepend URL with http if necessary   def self.fix_url(u)     !!( u !~ /\A(?:http:\/\/|https:\/\/)/i ) ? "http://#{u}" : u   end


And you call it with Utilities::url_to_domain(u)

Hello Thanks friends for a superb solutions. Really appreciated.

Thanks Abhis

I faced the exact same situation a while ago, here's what I came up with after reading the rest of this thread:

#!/usr/bin/env ruby

require 'uri'

module DomainExtractor     VALID_GENERIC_SUFIXES_RE = /^(com|net|org|co)$/

    def self.extract(url)         u = fix_url(url)         uri = URI::parse(u)         domain = uri.host         chunks = domain.split('.')

        if ! (chunks[-1] =~ VALID_GENERIC_SUFIXES_RE).nil?             domain = chunks[-2, 2].join('.')         elsif ! (chunks[-2] =~ VALID_GENERIC_SUFIXES_RE).nil?             domain = chunks[-3, 3].join('.')         else             domain = ""         end         domain.gsub(/\^www\./, "")     rescue       ""     end

    def self.fix_url(url)         !!( url !~ /\A(?:http:\/\/|https:\/\/)/i ) ? "http://#{url}" : url     end end

# test urls = [     "http://google.com",     "http://www.google.com",     "http://google.com.uy",     "http://www.google.com.uy",     "http://google.com.uy/index.html&quot;,     "http://subdomain1.google.com.uy/index.html&quot;,     "http://subdomain1.subdomain2.google.com",     "http://www.subdomain1.google.com.uy/index.html&quot;,     "http://subdomain1.google.net/index.html&quot;,     "http://subdomain1.sub2.sub3.google.org.kz?test=3&quot;,     "Email Troubleshooting - Missing Emails | Media Temple Community +cron",     "https://creaproject.basecamphq.com/projects/3620850/todo_items/ 413078/comments",     "google.com",     "google.com.uy",     "google.com.uy/index.php",     "sub1.sub2.google.com.uy?test=value",     "www.sub1.sub2.google.com.uy?test=value",     "http://sub1.sub2.google.com.uy?test=value&quot;,     "http://www.wwwsub1.sub2.google.com.uy?test=value&quot; ]

urls.each do |url|     puts     puts "URL : #{url}"     result = DomainExtractor::extract(url)     puts "result: #{result}" end