Full text search using tsearch2

VP11 · December 2, 2008, 7:43pm

Hi,

I implemented a full text search using postgres > 8.3. My performance is very good to index (using triggers/after_filter or crontab time based index update) and to search for something. With Postgres i can index only lexemes, other solutions, like sphinx based or aaf creates the index based in length. So everything bigger than 2/3 chars is indexed. I think to use a lexeme is better to avoid falses positives in a real world.

My questions are:

- Any good reason to dont use tsearch and choose a aaf/*sphinx solution ? - There is any way to make tsearch index the text by length and not by lexeme (i know, probably this question i should ask in a postgres forum)

Regards,

Victor

Robby_Russell1 · December 3, 2008, 4:34am

Victor,

As a long-time PostgreSQL advocate... I'd encourage you to look at using Sphinx and/or UltraSphinx.

Why? Configuration is way less complicated for your development and production environments. I like knowing that I have the advanced features of PostgreSQL available to me... but I'll side with working within Ruby as much as possible on our projects. Sphinx will allow you to evolve your indexing without needing to muck around the database, which should ease pains with deployments as well.

If you get to a point where Sphinx is falling over from not being to handle your load, you can evaluate using Tsearch2, but in the meantime, I'd just consider it premature optimization without much benefits int he short-term.

Having said that... I haven't worked with tsearch2 in 3+ years.

Good luck!

Robby

VP11 · December 4, 2008, 1:14pm

Hey Robby,

Thanks for your feedback,

Victor,

As a long-time PostgreSQL advocate... I'd encourage you to look at using Sphinx and/or UltraSphinx.

In fact, I used ultrasphinx, and with an average load, it wasnt uncommon to have to restart ultrasphinx daemons. Another point that, i tried to explain in my question, is the way that the data is indexed. In ultrasphinx, the document is broken in tokens and everything bigger then 2 chars is indexed. So in my opinion, it generates too much "false positives" (sorry guys, i could not find another word).. there is any way to make *sphinx index lexeme ?

Why? Configuration is way less complicated for your development and production environments. I like knowing that I have the advanced features of PostgreSQL available to me... but I'll side with working within Ruby as much as possible on our projects. Sphinx will allow you to evolve your indexing without needing to muck around the database, which should ease pains with deployments as well.

I think when i remove an extra daemon, i'm drying my system architecture, right ? I mean, if you dont really spend some hours reading (ultra|think) sphinx code, it will be always a black box in your system that you dont understand. So am I loosing something ? maybe the learn courve to tsearch and all postgres caveats are bigger then *really* learn how *sphinx based solutions works ?

Best regards,

Victor

Thorsten_Muller2 · December 4, 2008, 1:25pm

In ultrasphinx, the document is broken in tokens and everything bigger then 2 chars is indexed. So in my opinion, it generates too much "false positives" (sorry guys, i could not find another word).. there is any way to make *sphinx index lexeme ?

Sure, you can configure min_infix_len, min_word_len and whatever in your sphinx configuration.

I mean, if you dont really spend some hours reading (ultra|think) sphinx code, it will be always a black box in your system that you dont understand.

Whow, so you read Postgres code before using it? Same goes for Ruby, Rails, and 99% of what tools we use.

VP11 · December 4, 2008, 1:39pm

Whow, so you read Postgres code before using it? Same goes for Ruby, Rails, and 99% of what tools we use.

lol.. no, i mean at least the contrib/tsearch folder i read! ok the PostgreSQL: Documentation: 8.3: Full Text Search as well, and too be honest its better documented then the sphinx plugins

Thanks for the tip about ultrasphinx, but i would like to know if its possible to do opposite, to make it work with lexemes. Like for example if you look for run, you will find run, runs, ran and running. All of these words are forms of the same lexeme: RUN

I dont wanna be troll or blame the plugins, i heard a lot of good things about thinksphinx, not so good things about ultrasphinx and almost nothing about tsearch, that for me is working fine, except that some developers here comes with the questions: "if i type "te" it dont bring me my "test" post and ultrasphinx does". But well in real world users wont search for "te" and if they search for te, they will find everything like te,teahupo, tea, teta, tendency, etc, etc, etc.

Regards,

Victor

Thorsten_Muller2 · December 4, 2008, 2:01pm

Thanks for the tip about ultrasphinx, but i would like to know if its possible to do opposite, to make it work with lexemes. Like for example if you look for run, you will find run, runs, ran and running. All of these words are forms of the same lexeme: RUN

hmm, I don't know about "ran" but if you enable star search, then ru* wil at least find run, runs, and running. This may be possible too, but I can't say for sure, since I didn't need this kind of search Maybe have a look at the docs: http://www.sphinxsearch.com/docs/current.html#searching

I dont wanna be troll or blame the plugins, i heard a lot of good things about thinksphinx, not so good things about ultrasphinx and almost nothing about tsearch, that for me is working fine, except that some developers here comes with the questions: "if i type "te" it dont bring me my "test" post and ultrasphinx does". But well in real world users wont search for "te" and if they search for te, they will find everything like te,teahupo, tea, teta, tendency, etc, etc, etc.

lo, I agree. It was hard enough to read through lots of tutorials and documents. We started with ultrasphinx at some time back and it worked good enough for our needs. For new projects I would think about changing that to thinksphinx. Especially if I expect the index to be combined from several tables. But for basic search both should do well enough.

Frederick_Cheung · December 4, 2008, 2:02pm

Whow, so you read Postgres code before using it? Same goes for Ruby, Rails, and 99% of what tools we use.

lol.. no, i mean at least the contrib/tsearch folder i read! ok the PostgreSQL: Documentation: 8.3: Full Text Search as well, and too be honest its better documented then the sphinx plugins

Thanks for the tip about ultrasphinx, but i would like to know if its possible to do opposite, to make it work with lexemes. Like for example if you look for run, you will find run, runs, ran and running. All of these words are forms of the same lexeme: RUN

I dont wanna be troll or blame the plugins, i heard a lot of good things about thinksphinx, not so good things about ultrasphinx and almost nothing about tsearch, that for me is working fine, except that some developers here comes with the questions: "if i type "te" it dont bring me my "test" post and ultrasphinx does". But well in real world users wont search for "te" and if they search for te, they will find everything like te,teahupo, tea, teta, tendency, etc, etc, etc.

sphinx can do stemming. as for searching for prefixes, that's optional.

Fred

VP11 · December 4, 2008, 3:20pm

thanks for the tips. As documentation thinksphinx has nice railscast about. But geocoding searches for example are not documented until now (or at least, until last week hehe)

Regards,

Victor

Topic		Replies	Views
Search engine that works with MS SQL Server rubyonrails-talk	3	156	February 25, 2009
best way to search - alternatives to full text (thinking sphinx)? rubyonrails-talk	2	176	September 14, 2009
Thinking Sphinx performance problem rubyonrails-talk	0	120	January 14, 2014
Ultrasphinx help rubyonrails-talk	0	88	November 15, 2010
Rails 3.2 what are the best gems for full-text search in sql db ? rubyonrails-talk	10	224	October 14, 2012

Full text search using tsearch2

Related topics

More Resources