In Search of Search!

I am implementing a search on my site and was wondering which will be the best way to go about it. We want a full text search and an advance search.

We will have huge amounts of data that we would want to search. - on multiple tables. I went through the plugins acts_as_ferret and acts_as_solr. but ferret seems to have a locking problem at high load and solr needs a java server. So what do you recommend? how do we go about this? What will be the overheads of solr? extra costs/complexity?

Details of my database: My site is about surveys on colleges, and hence will have surveys_table, comments_table, testimonial_table, etc. I want to give an option for the users to search these surveys!

Please suggest! I know this topic has been discussed many a times..but I couldnt find the answer I wanted.

Thanks in advance!

I'm just theorizing here, but these are my 2 cents.

Because Ferret is stored in file system-based tables, you'll always have locking problems at high load. But it's still a much more elegant solution than anything else I can think of. My question is how high of a load until you start to get real problems with this? I know that in data warehousing they've gotten around some of these bottlenecks by caching searches. They take the top 10% of searches and store these on a separate system and the rest go through the main system. Would a similar approach be fruitful? Say the search cache is updated every hour or so and it contains 10% of the searches, maybe 50-60% of the search load. All this is stored in the RDBMS, so you don't have locking issues on those.

Leaving that paradigm, I wonder if there could be anything else non-db specific. I don't know. To get around the locking issue, you'd probably best set things in a database. So, you could use something like MATCH in MySQL. If you were going to do this, however, maybe you'd want to stem your searches:

sudo gem install stemmer

in the controller: params[:q] = params[:q].split.map{|word|                 word.downcase.stem + '*'               }.uniq.join(' ')

I don't know though. Taking this kind of approach leaves you open to all the gotchas that you'll have to build from scratch. You need to start getting tricky with finding phrases, etc.

Sorry if I'm not leading you anywhere with these musings. Good luck

I don't have specific answers, but you can read abou DRb server and A- A-F here. It's a great list, the authors of ferret and acts_as are quite active:

http://rubyforge.org/pipermail/ferret-talk/2007-May/

Full archive: http://rubyforge.org/pipermail/ferret-talk/

Thanks for your insights! My guess is that it will take sometime to hit high load...also..I am not sure what 'high load' means? I mean how much is high load for ferret?

I will consider these suggestions..and get back to you incase I need more answers! :slight_smile:

Also, can anyone give me insights on solr?

vishwas

   > I mean how much is high load for ferret?

FYI, the Ferred DRb server is used at technorati for one of their project. See their comments:

   > FWIW, I'm running this in production with about 5 updates/sec    > and 20-30 searches/second without problems.     src: [AAF] remote indexing via DRb with acts_as_ferret - Ferret - Ruby-Forum

Note: the Drb server is part of the aaf (Acts_as_ferret plugin), and it's a no-brainer to use. It   1/ lets you launch a Ferret server (like a DB server, f.ex), and   2/ redirects all the aaf calls (ex: User.find_by_content("John") to the server. see:    http://projects.jkraemer.net/acts_as_ferret/wiki/DrbServer

For more specific questions, there is a dedicated Ferret mailing list :

   http://www.ruby-forum.com/forum/5

Alain Ravet

I've had too many problems with ferret in production to recommend using it. But I have had great luck with sphinx and acts_as_sphinx. sphinx is really fast and builds indexes in seconds not hours. And I have not had any locking issues with it.

Cheers- -- Ezra Zygmuntowicz-- Lead Rails Evangelist -- ez@engineyard.com -- Engine Yard, Serious Rails Hosting -- (866) 518-YARD (9273)

Ezra

   > I've had too many problems with ferret in production to recommend using it.

Was it with recent versions of Ferret/AAF, and the Drb server?

Alain

Alain-

  Unfortunately yes. The drb server does help though. But I've had tons of segfaults and index corruptions with the latest ferret. It works great in development or with small amounts of data, but once apps started to get a serious amount of data the indexes get corrupted randomly and caused segfaults. It's happened in close to 10 apps.

  I really like ferret's integration with rails via acts_as_ferret though, and would use it again if the segfaults and index corruption was fixed.

Cheers- -- Ezra Zygmuntowicz-- Lead Rails Evangelist -- ez@engineyard.com -- Engine Yard, Serious Rails Hosting -- (866) 518-YARD (9273)

If the requirement to run Java isn't surmountable, then Solr is a really great solution. The way you posed your data set, it sounds like scalability is going to be a major issue. Which means you probably are going to need to put search on it's own server anyway. So going the Solr route is't so bad.

Also, I was blown away with how easy Solr is to get up and running. It really is out of the box. We as a community have heard the line "Java is heavy" so much, that we forget how good Java can be in certain instances. Java to build webapps is heavy stack, but Java to run Solr is very easy. Just fire up Jetty and you are done. I wouldn't just discount Solr because of requiring Java. We are using it with a Rails front end called Solr Flare: Flare - Solr - Apache Software Foundation. And we did another project where we wanted to search files uploaded into a Joomla based CMS. We tweaked Solr to parse .pdf, .xls, .doc, and .ppt files for search, and it was all very easy. And the interface between Joomla and Solr was very simple to do.

I used acts_as_ferret for search on a community content site, and it works okay. It didn't blow me away, but it was super easy to setup and use. If you are just trying to get *something* in place, then it might be the baby step you may need.

Eric Pugh

Ok, thanks a ton! So, I was thinking I would start with Ferret and see how it goes before shifting to Solr.

But, how easy is it to transition between these? what about the indexing/other specific data required for each search type?

Also, one of the Rails expert, suggest HyperEstrier[acts_as_searchable] as it seems to be built with speed and sacalability. So is this recommended?