I’m creating an app that will import products from several XML feeds. In the XML there is a category specified, like T-Shirt for instance. The problem is that different resellers specify the categories differently. For instance, what one reseller calls “T-Shirts” another may call “T-Shirt”, a third “short sleeved shirts” and so on.
I want to somehow map these categories to the categories I have myself. So I need some tips on how I should create my database.
The idea I have is to create a “raw_categories” table which contains the name of the resellers category and a “category_id” which has a belongs_to relationship to my own “categories” table. Then when I import I simply try to find a raw_category which has a matching name and if there is one, pick it, otherwise add a new one. This new one I can then manually relate to one of my own categories.
Do you understand how I mean, and is it a good approach? Is there a better/more efficient way?
If this is a good idea. How do I do it in Rails? Should I use something like this (I think I’ve seen something like this in the API doc):
products model
has_one :category, :through => :raw_categories
I estimate that there will be about 40k to 100k products in the database.
Hi!
I'm creating an app that will import products from several XML feeds. In the
XML there is a category specified, like T-Shirt for instance. The problem is
that different resellers specify the categories differently. For instance,
what one reseller calls "T-Shirts" another may call "T-Shirt", a third
"short sleeved shirts" and so on.
I want to somehow map these categories to the categories I have myself. So I
need some tips on how I should create my database.
The idea I have is to create a "raw_categories" table which contains the
name of the resellers category and a "category_id" which has a belongs_to
relationship to my own "categories" table. Then when I import I simply try
to find a raw_category which has a matching name and if there is one, pick
it, otherwise add a new one. This new one I can then manually relate to one
of my own categories.
Do you understand how I mean, and is it a good approach? Is there a
better/more efficient way?
- If this is a good idea. How do I do it in Rails? Should I use something
like this (I think I've seen something like this in the API doc):
# products model
has_one :category, :through => :raw_categories
I think possibly you want product belongs_to raw_category and product
belongs_to category through raw_category. It has to be that way round
because you need raw_category has_many products.
Then you need category has_many raw_categories and raw_category
belongs_to product. You can also then say category has_many products
through raw_categories.
Are you sure you need this complexity however? You could just have
product belongs_to category (and the reverse) and work out which one
it is when you parse the xml, by looking it up in the raw_categories
table.
Yes, I was thinking about just having the reseller categories (raw categories) as a lookup table. So for each product I import I just check what category it should be in.
One issue with this is that it might get a bit messy if I would change the relations between a reseller category and a category. If I would change this I have to go through all the products and update them accordingly. If I would use the suggestion in my original post it would work automatically.
On the other hand, it might be very inefficient to do it like that… It would save some database work with a direct relation between products and my categories.
I think I will try it out if no one else has any other suggestions?
Thank you for the answer!
Yes, I was thinking about just having the reseller categories (raw
categories) as a lookup table. So for each product I import I just check
what category it should be in.
One issue with this is that it might get a bit messy if I would change the
relations between a reseller category and a category. If I would change this
I have to go through all the products and update them accordingly. If I
would use the suggestion in my original post it would work automatically.
Yes, if there is the slightest possibility of wanting to do this then
that is the way to go.
On the other hand, it might be very inefficient to do it like that... It
would save some database work with a direct relation between products and my
categories.
Don't worry about efficiency. I can virtually guarantee that the
critical issues for performance will not be those you imagine at the
start of a project. You can worry about that much later in the
unlikely event that you need to.