How to get data from other website?

Hi, everybody.

I am newbie RoR.

I want to get all data from other website.

Thanks a lot and looking forward.

hi
looking for mechanize and nokogiri

Ivan Nastyukhin
dieinzige@me.com

try hpricot it’ll be usefull

Bala wrote:

try hpricot it'll be usefull

Thanks, Bala.
I try using 'hpricot' but I can get a little data on website, not all.
I think some data to view by Ajax or JavaScript. Can you tell me other
idea?

Vuong Hung wrote:

Bala wrote:

try hpricot it'll be usefull

Thanks, Bala.
I try using 'hpricot' but I can get a little data on website, not all.
I think some data to view by Ajax or JavaScript. Can you tell me other
idea?

Code to this:
Controller:
  @doc = Hpricot(open("http://priceonline.hsc.com.vn"))

And View: I show all anything of "@doc"
  <%=doc%>
But not find data, only some data of layout.

Help me. Thanks.

If you put in the view then you may need to strip of the html tags.

Try the following in the console.

require ‘open-uri’
require ‘nokogiri’

doc = Nokogiri::HTML(open(“http://www.tamil.net”).read)
puts doc.inner_html

If you want to place it in the view, then try “doc.inner_text”
to strip of the html tags.

Amala Singh wrote:

If you put in the view then you may need to strip of the html tags.

Try the following in the console.

require 'open-uri'
require 'nokogiri'

doc = Nokogiri::HTML(open("http://www.tamil.net").read)
                                      puts doc.inner_html

If you want to place it in the view, then try "doc.inner_text"
to strip of the html tags.

@doc = Hpricot(open("http://priceonline.hsc.com.vn"))
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to rubyonrails-talk@googlegroups.com.
To unsubscribe from this group, send email to
rubyonrails-talk+unsubscribe@googlegroups.com<rubyonrails-talk%2Bunsubscribe@googlegroups.com>
.
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en.

--
--
Nandri(Thanks in Tamil),
Amala Singh

Thanks, Amala Singh.
I have the same your guide but result same before.
I can't get all data on website.

Help me,

Vuong Hung wrote:

Amala Singh wrote:

If you put in the view then you may need to strip of the html tags.

Try the following in the console.

require 'open-uri'
require 'nokogiri'

doc = Nokogiri::HTML(open("http://www.tamil.net").read)
                                      puts doc.inner_html

If you want to place it in the view, then try "doc.inner_text"
to strip of the html tags.

@doc = Hpricot(open("http://priceonline.hsc.com.vn"))
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to rubyonrails-talk@googlegroups.com.
To unsubscribe from this group, send email to
rubyonrails-talk+unsubscribe@googlegroups.com<rubyonrails-talk%2Bunsubscribe@googlegroups.com>
.
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en.

--
--
Nandri(Thanks in Tamil),
Amala Singh

Thanks, Amala Singh.
I have the same your guide but result same before.
I can't get all data on website.

I try other website is good, but i can't this website
"http://priceonline.hsc.com.vn".
Can you tell me other idea?

Help me,

aha. I checked it.
I did this:
doc = Nokogiri::HTML(open(“http://priceonline.hsc.com.vn”).read)
my_file = File.new(“abc.html”, “w”)
my_file.puts doc.inner_html
my_file.close

And compared the abc.html with the website, abc.html was the very same as the website.

I think it is something to with relative Javascript/Ajax stuff. Source is not showing the complete path.

<script src="JS/Ajax.js" type="text/javascript"></script>

Amala Singh wrote:

aha. I checked it.
I did this:
doc = Nokogiri::HTML(open("http://priceonline.hsc.com.vn").read)
    my_file = File.new("abc.html", "w")
    my_file.puts doc.inner_html
    my_file.close

And compared the abc.html with the website, abc.html was the very same
as
the website.

Thanks a lot Amala Singh!
But I don't understand your comment this:
You guide to clearly.

I think it is something to with relative Javascript/Ajax stuff. Source
is
not showing the complete path.

Some code here:

<script src="JS/Ajax.js
<view-source:http://priceonline.hsc.com.vn/JS/Ajax.js>"
type="text/javascript"></script>

What's it mean? What does it do? where can i put it?
Thanks for your rely.

Since it is a relative path to the server, it is trying to access javascript files in your server.

So you need to download all the javascript files which the html refer to and
place it to the relative path in your server.