I am getting ready to add searching to the property model of a real estate site I am working on and was looking for advice on
which search plugin to use.
I came to Rails to escape Java and I really don’t want to run a Java Server which as far as I know rules out Acts as Solr.
So it looks like a choice between Soda Search and acts as ferret. How well will these two handle normalized VS non-normalized
data. Like if I am using relations with other objects like has_many etc will these search plugins index the items owned by
I've seen problems in peoples app from ferret as the index gets
bigger, it will sometimes segfault and crash mongrel or you will
start to get ferret locking errors on the index with multiple rails
processes trying to read/write from the same index.
HyperEstraier, and Sphinx have held up better with bigger indexes
then ferret for the apps I've seen.
I did hear that they are working on a drb daemon that will be the
only thing to write to the ferret index with your app talking to it
over drb, this may fix the locking and index corruption issues but i
haven't seen it in the wild yet.
tsearch2 "problem" is that it is Postgres specific. Not a problem for
me as I use ONLY Postgres. But if the app is MySQL based you are out
of luck.
But tsearch2 has a huge advantage over Ferret - it has lexical
capabilities. This meand that searching for plurals or otherwise
modified workd, will find them and the related words. FOr example,
searching for rabit or searching for rabbits, will both bring all docs
with rabbit or rabbits. This is a VERY cool feature, and it's language
specific. There are dictionaries for stop-words for many many
languages.
I am getting ready to add searching to the property model of a real estate
site I am working on and was looking for advice on
which search plugin to use.
I came to Rails to escape Java and I really don't want to run a Java Server
which as far as I know rules out Acts as Solr.
So it looks like a choice between Soda Search and acts as ferret. How well
will these two handle normalized VS non-normalized
data. Like if I am using relations with other objects like has_many etc will
these search plugins index the items owned by
my model as well.
Here's a couple i thought were helpful, touch on the major issues:
sorting vs searching, stemming/tokenization, stopwords, UTF-8 (i
think), tf/idf calculations