How should I index the terms table for maximum speed? It doesn't have to be
Rails migration doable. E.g. a primary key of (user_id, article_id, fnv) is
okay. fnv is a 63 bit Fowler-Noll-Vo hash.
def self.neighbors(user_id, article_id)
sql = "SELECT t1.article_id, SUM(t1.freq * t2.freq) AS cosim FROM " \
"tokens AS t1 JOIN tokens AS t2 " \
"ON t1.fnv = t2.fnv WHERE t1.user_id = #{user_id} AND t2.user_id = #{user_id} AND " \
"t1.scoring = 1 AND t2.scoring = 0 AND t2.article_id = #{article_id} GROUP BY t1.article_id " \
"ORDER BY cosim DESC LIMIT 3"
connection.select_rows(sql)
end
How should I index the terms table for maximum speed?
How can we tell you? You neglected to say how you're using that
table...or is the query below the only one you're interested in?
It doesn't have
to be
Rails migration doable.
But it will be, since adding indices generally is.
E.g. a primary key of (user_id, article_id,
fnv) is
okay. fnv is a 63 bit Fowler-Noll-Vo hash.
def self.neighbors(user_id, article_id)
sql = "SELECT t1.article_id, SUM(t1.freq * t2.freq) AS cosim FROM "
\
"tokens AS t1 JOIN tokens AS t2 " \
"ON t1.fnv = t2.fnv WHERE t1.user_id = #{user_id} AND t2.user_id =
#{user_id} AND " \
"t1.scoring = 1 AND t2.scoring = 0 AND t2.article_id = #{article_id}
GROUP BY t1.article_id " \
"ORDER BY cosim DESC LIMIT 3"
connection.select_rows(sql)
end
Run EXPLAIN SELECT on this query (or whatever your DB's equivalent is).
See where it's doing full table scans and add indices as appropriate.
Jeffrey L. Taylor wrote in post #958953:
> How should I index the terms table for maximum speed?
How can we tell you? You neglected to say how you're using that
table...or is the query below the only one you're interested in?
Correct, this is the one I care about. All others are trivial in terms of
resources compared to this one.
> It doesn't have
> to be
> Rails migration doable.
But it will be, since adding indices generally is.
Composite keys are not supported in stock Rails. And I will not switch DB
servers next week, so having the index creation in portable form is not a necessity.
> Jeffrey L. Taylor wrote in post #958953:
> > How should I index the terms table for maximum speed?
> How can we tell you? You neglected to say how you're using that
> table...or is the query below the only one you're interested in?
Correct, this is the one I care about. All others are trivial in terms of
resources compared to this one.
> > It doesn't have
> > to be
> > Rails migration doable.
> But it will be, since adding indices generally is.
Composite keys are not supported in stock Rails. And I will not switch DB
servers next week, so having the index creation in portable form is not a necessity.
> > E.g. a primary key of (user_id, article_id,
> > fnv) is
> > okay. fnv is a 63 bit Fowler-Noll-Vo hash.
> > def self.neighbors(user_id, article_id)
> > sql = "SELECT t1.article_id, SUM(t1.freq * t2.freq) AS cosim FROM "
> > \
> > "tokens AS t1 JOIN tokens AS t2 " \
> > "ON t1.fnv = t2.fnv WHERE t1.user_id = #{user_id} AND t2.user_id =
> > #{user_id} AND " \
> > "t1.scoring = 1 AND t2.scoring = 0 AND t2.article_id = #{article_id}
> > GROUP BY t1.article_id " \
> > "ORDER BY cosim DESC LIMIT 3"
> > connection.select_rows(sql)
> > end
> Run EXPLAIN SELECT on this query (or whatever your DB's equivalent is).
> See where it's doing full table scans and add indices as appropriate.
> explain extended SELECT t1.article_id, SUM(t1.freq * t2.freq) FROM tokens AS t1 JOIN tokens AS t2 ON t1.token = t2.token AND t1.user_id = 1 AND t2.user_id = 1 AND t1.scoring = 1 AND t2.scoring = 0 GROUP BY article_id;
+----+-------------+-------+------+---------------+------+---------+------+ -------+----------+----------------------------------------------+
> id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------+---------------+------+---------+------+ -------+----------+----------------------------------------------+
> 1 | SIMPLE | t1 | ALL | user_id | NULL | NULL | NULL | 34773 | 100.00 | Using where; Using temporary; Using filesort |
> 1 | SIMPLE | t2 | ALL | user_id | NULL | NULL | NULL | 34773 | 100.00 | Using where; Using join buffer |
+----+-------------+-------+------+---------------+------+---------+------+ -------+----------+----------------------------------------------+
2 rows in set, 1 warning (0.07 sec)
you could try an index on user_id,scoring, and you definitely want one
on token (any time you do a join you want an index on the columns you
join on). You might try adding article_id to that first index to see
if you can get the DB to use that to help by the grouping
Jeffrey L. Taylor wrote in post #959191:
> Quoting Marnen Laibow-Koser <lists@ruby-forum.com>:
>> Jeffrey L. Taylor wrote in post #958953:
>> > How should I index the terms table for maximum speed?
>>
>> How can we tell you? You neglected to say how you're using that
>> table...or is the query below the only one you're interested in?
>>
>
> Correct, this is the one I care about. All others are trivial in terms
> of
> resources compared to this one.
>
>> > It doesn't have
>> > to be
>> > Rails migration doable.
>>
>> But it will be, since adding indices generally is.
>>
>
> Composite keys are not supported in stock Rails.
Composite *primary keys* are not (but there's a plugin). Composite
*indices* are. You can specify multiple columns in add_index.
Sorry I left out a constraint that turned out to be critical. This is the
primary and only index on this table. I've tried the plug-in. It was not
reliable in my use scenario.
> And I will not switch
> DB
> servers next week, so having the index creation in portable form is not
> a necessity.
Never write SQL in Rails unless there's no other way.
If it is feasible in Rails, I do it that way. However, I was explicitly
lifting the constraint of doable in Rails for this usecase. If the usecase
included dozens of servers and multiple DB server software, then staying in
Rails would make sense. I have one production server (with test database) and
one or two development setups (with test databases). Setup in the database
client program is very doable. Fighting to shoehorn a usecase that Rails does
not handle into Rails is not productive.
Composite *primary keys* are not (but there's a plugin). Composite
*indices* are. You can specify multiple columns in add_index.
Sorry I left out a constraint that turned out to be critical. This is
the
primary and only index on this table. I've tried the plug-in. It was
not
reliable in my use scenario.
Then here's what I would suggest. Keep the composite index, but do not
make it the primary key. Use a surrogate key (that is, a Rails-style id
field) instead. You'll be happier. Rails will be happier. Composite
primary keys tend to be annoying (yes, I've worked with them
successfully, but I'm glad I no longer have to), and anyway surrogate
keys are nicer because you can guarantee that they'll never ever change
over the life of the record.
> And I will not switch
> DB
> servers next week, so having the index creation in portable form is not
> a necessity.
Never write SQL in Rails unless there's no other way.
If it is feasible in Rails, I do it that way. However, I was explicitly
lifting the constraint of doable in Rails for this usecase. If the
usecase
included dozens of servers and multiple DB server software, then staying
in
Rails would make sense. I have one production server (with test
database) and
one or two development setups (with test databases). Setup in the
database
client program is very doable.
But it's a bad idea since it defeats automation. The right way to do
this -- if you *must* -- is to put raw SQL in the migration and set the
schema dumper to use SQL. That way you can still use migrations and the
schema file.
Fighting to shoehorn a usecase that
Rails does
not handle into Rails is not productive.
On the contrary, it's very productive. That's how many of the best
Rails plugins got developed.
But in this case, all you need to do is add a surrogate key.