PDF text search in rails

ripan · June 3, 2008, 5:25am

is there any plugin which could search in PDF documents. For example, user should be able to search for keywords in the PDF contents.

Jodi1 · June 3, 2008, 12:41pm

Good morning -

Mark_Thomas2 · June 3, 2008, 12:46pm

is there any plugin which could search in PDF documents.

Maybe you can try this: http://raa.ruby-lang.org/project/rpdf2txt/ or JRoR and one of the many Java PDF libraries. I'm not aware of a Rails plugin.

Mark_Thomas2 · June 3, 2008, 1:25pm

Someone submitted a patch for acts_as_solr to index documents - read
the google group for this project

I didn't think solr would do this, since it provides index and query but not parsing of rich formats. However, there seems to be a patch that extracts text (but not metadata) from rich documents into solr: UpdateRichDocuments - Solr - Apache Software Foundation. The solr committers are reluctant to use that patch, though, and would rather build a bridge from Tika (Apache Tika – Apache Tika) to solr, even if that is further down the road.

I did find the patch to acts_as_solr here: http://www.nabble.com/Rich-Document-support-for-solr-ruby-and-acts_as_solr-p17161561.html But since this patch relies on the uncommitted solr patch, I wouldn't rely on this being viable for the long-term.

A less tenuous solution may be to extract the text from a PDF via some other library (perhaps rpdf2txt or PDFbox), and indexing it using the standard acts_as_solr.

- Mark.

Topic		Replies	Views
Search inside files rubyonrails-talk	3	122	September 20, 2010
Index word and pdf documents for full-text search rubyonrails-talk	3	340	August 28, 2007
how to read content from existing pdf through rubyonrails rubyonrails-talk	0	136	August 23, 2008
Extract text from PDF file rubyonrails-talk	5	1305	January 31, 2011
Text Extraction and Indexing rubyonrails-talk	1	115	April 20, 2007

PDF text search in rails

Related topics

More Resources