This is more of a theory question then a RoR question, but perhaps
there is some gem to help that I am unaware of.
I am trying to aggregate store listings provided by 15 or so online
shops, some of them provide the manufacturer's SKU/MID information but
some do not. When the SKU/MID is provided it is a simple matter to
find the same items across multiple stores. However I have found
myself with 5,000+ items just floating because they do not have a SKU/
MID. I am currently chugging through them to associate them with the
same items from multiple stores.
I am looking for a way to simplify this, right now I use the shpinx
index to try to find similar items, but its hit or miss. Are there any
algorithms I should look into that will help me identify items that
are likely to be similar?