is there any plugin which could search in PDF documents. For example, user should be able to search for keywords in the PDF contents.
Good morning -
is there any plugin which could search in PDF documents.
Maybe you can try this: http://raa.ruby-lang.org/project/rpdf2txt/ or JRoR and one of the many Java PDF libraries. I'm not aware of a Rails plugin.
Someone submitted a patch for acts_as_solr to index documents - read
the google group for this project
I didn't think solr would do this, since it provides index and query but not parsing of rich formats. However, there seems to be a patch that extracts text (but not metadata) from rich documents into solr: UpdateRichDocuments - Solr - Apache Software Foundation. The solr committers are reluctant to use that patch, though, and would rather build a bridge from Tika (Apache Tika – Apache Tika) to solr, even if that is further down the road.
I did find the patch to acts_as_solr here: http://www.nabble.com/Rich-Document-support-for-solr-ruby-and-acts_as_solr-p17161561.html But since this patch relies on the uncommitted solr patch, I wouldn't rely on this being viable for the long-term.
A less tenuous solution may be to extract the text from a PDF via some other library (perhaps rpdf2txt or PDFbox), and indexing it using the standard acts_as_solr.
- Mark.