Rails and Sphinx

Hi, i'm currently using ferret+acts_as_ferret for doing some searches in the db (not so big for now, just 40k~ records, but they will be near 2M+ soon). For now ferret it's working quite well, but sometimes i've indexes corrupted (i use it's drb server in production) and i've to rebuild all the indexes, so i've bad indexes, angry users and time spent to make it right. I'm looking at sphinx and it seems the answers to this problems, as it seems that it doesn't have this such of problems. But i've seen that i've to remake indexes everytime (/hour or less).. does it take a lot to rebuild them? Is there anyone who use shpinx in production? Another big question about it is if it's possible to create custom indexes without having the records in the db, for example with ferret it's just a :fields => {:my_custom_field => {}} and then a method named my_custom_field which set what i need (for example a list of strings/integers, that i need to index, like category ids "4 8 15 16 23 42", and then doing a query searching with for example 15, or stuff like that (actually i've 8 fields, but soon they will be around 12-15) Is it possible something like this with sphinx? For ferret there is acts_as_ferret, for sphinx what do you suggest ? acts_as_sphinx or is there something else? thank you

Hi, i'm currently using ferret+acts_as_ferret for doing some searches in the db (not so big for now, just 40k~ records, but they will be near 2M+ soon). For now ferret it's working quite well, but sometimes i've indexes corrupted (i use it's drb server in production) and i've to rebuild all the indexes, so i've bad indexes, angry users and time spent to make it right. I'm looking at sphinx and it seems the answers to this problems, as it seems that it doesn't have this such of problems. But i've seen that i've to remake indexes everytime (/hour or less).. does it take a lot to rebuild them?

Look into the 'delta' options. That way you have a main index and a
delta index of recent changes. Quicker to update and then nightly you
can rebuild the whole thing.

Is there anyone who use shpinx in production?

I have in the past yes, using the thinking sphinx plugin.

Worked great.

You can send an xml “docset” document to sphinx to build the index; use the xmlpipe2 source type in your conf and have a script that will generate the xml to STDOUT.

The docset contains a set of documents with id’s (which you might map to the id in a table in your database if you were using it). Each document contains a set of field tags with the information you want to index.

I don’t know how that squares with large datasets - my app was very small. I use the Riddle client that comes with Thinking Sphinx to talk to searchd.

Some of this stuff is documented on the sphinx website somewhere, though piecing it altogether took some effort.

You can send an xml "docset" document to sphinx to build the index; use the xmlpipe2 source type in your conf and have a script that will generate the xml to STDOUT. The docset contains a set of documents with id's (which you might map to the id in a table in your database if you were using it). Each document contains a set of field tags with the information you want to index. I don't know how that squares with large datasets - my app was very small. I use the Riddle client that comes with Thinking Sphinx to talk to searchd.

I'm worried that it would takes quite a long with a lot of records (>2M) :frowning:

@Philip: I've seen about delta indexes, and I'll go with them :slight_smile: From what I've read sphinx is also a lot faster than ferret to rebuild all the indexes :slight_smile: