How to scrape a page without knowing its html structure

I’m doing one module in my site, there I need to import user blog into

my site. I can use RSS feeds to read the blog information but using

RSS feeds I’m not getting entire information. So, I need to scrape the

user blog page. How to scrape a pages without knowing its html

structure of a page? Please anyone can help me for this issue. Thanks

in advance.

You asked this exact question 4 days ago and got 2 answers, that basically you can't -- you have to know *something* about way the pages are marked up.

It's still true. :slight_smile:

It seems that looking at the structure would be the easiest way, but if you wanted something more complex...your scraping program could infer the layout structure and separate this from the content. Your program would need to be fed multiple pages and would assume the layout to be the portion that stays mostly the same from page to page. That's an oversimplification, but that's the general idea.

Good luck.