SOT MySQL index question

How should I index the terms table for maximum speed? It doesn't have to be Rails migration doable. E.g. a primary key of (user_id, article_id, fnv) is okay. fnv is a 63 bit Fowler-Noll-Vo hash.

  def self.neighbors(user_id, article_id)     sql = "SELECT t1.article_id, SUM(t1.freq * t2.freq) AS cosim FROM " \     "tokens AS t1 JOIN tokens AS t2 " \     "ON t1.fnv = t2.fnv WHERE t1.user_id = #{user_id} AND t2.user_id = #{user_id} AND " \     "t1.scoring = 1 AND t2.scoring = 0 AND t2.article_id = #{article_id} GROUP BY t1.article_id " \     "ORDER BY cosim DESC LIMIT 3"     connection.select_rows(sql)   end

TIA,   Jeffrey

Jeffrey L. Taylor wrote in post #958953:

How should I index the terms table for maximum speed?

How can we tell you? You neglected to say how you're using that table...or is the query below the only one you're interested in?

It doesn't have to be Rails migration doable.

But it will be, since adding indices generally is.

E.g. a primary key of (user_id, article_id, fnv) is okay. fnv is a 63 bit Fowler-Noll-Vo hash.

  def self.neighbors(user_id, article_id)     sql = "SELECT t1.article_id, SUM(t1.freq * t2.freq) AS cosim FROM " \     "tokens AS t1 JOIN tokens AS t2 " \     "ON t1.fnv = t2.fnv WHERE t1.user_id = #{user_id} AND t2.user_id = #{user_id} AND " \     "t1.scoring = 1 AND t2.scoring = 0 AND t2.article_id = #{article_id} GROUP BY t1.article_id " \     "ORDER BY cosim DESC LIMIT 3"     connection.select_rows(sql)   end

Run EXPLAIN SELECT on this query (or whatever your DB's equivalent is). See where it's doing full table scans and add indices as appropriate.

TIA,   Jeffrey

Best,

Quoting Marnen Laibow-Koser <lists@ruby-forum.com>:

Jeffrey L. Taylor wrote in post #958953: > How should I index the terms table for maximum speed?

How can we tell you? You neglected to say how you're using that table...or is the query below the only one you're interested in?

Correct, this is the one I care about. All others are trivial in terms of resources compared to this one.

> It doesn't have > to be > Rails migration doable.

But it will be, since adding indices generally is.

Composite keys are not supported in stock Rails. And I will not switch DB servers next week, so having the index creation in portable form is not a necessity.

Quoting Marnen Laibow-Koser <li...@ruby-forum.com>:

> Jeffrey L. Taylor wrote in post #958953: > > How should I index the terms table for maximum speed?

> How can we tell you? You neglected to say how you're using that > table...or is the query below the only one you're interested in?

Correct, this is the one I care about. All others are trivial in terms of resources compared to this one.

> > It doesn't have > > to be > > Rails migration doable.

> But it will be, since adding indices generally is.

Composite keys are not supported in stock Rails. And I will not switch DB servers next week, so having the index creation in portable form is not a necessity.

> > E.g. a primary key of (user_id, article_id, > > fnv) is > > okay. fnv is a 63 bit Fowler-Noll-Vo hash.

> > def self.neighbors(user_id, article_id) > > sql = "SELECT t1.article_id, SUM(t1.freq * t2.freq) AS cosim FROM " > > \ > > "tokens AS t1 JOIN tokens AS t2 " \ > > "ON t1.fnv = t2.fnv WHERE t1.user_id = #{user_id} AND t2.user_id = > > #{user_id} AND " \ > > "t1.scoring = 1 AND t2.scoring = 0 AND t2.article_id = #{article_id} > > GROUP BY t1.article_id " \ > > "ORDER BY cosim DESC LIMIT 3" > > connection.select_rows(sql) > > end

> Run EXPLAIN SELECT on this query (or whatever your DB's equivalent is). > See where it's doing full table scans and add indices as appropriate.

> explain extended SELECT t1.article_id, SUM(t1.freq * t2.freq) FROM tokens AS t1 JOIN tokens AS t2 ON t1.token = t2.token AND t1.user_id = 1 AND t2.user_id = 1 AND t1.scoring = 1 AND t2.scoring = 0 GROUP BY article_id; +----+-------------+-------+------+---------------+------+---------+------+ -------+----------+----------------------------------------------+ > id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------+---------------+------+---------+------+ -------+----------+----------------------------------------------+ > 1 | SIMPLE | t1 | ALL | user_id | NULL | NULL | NULL | 34773 | 100.00 | Using where; Using temporary; Using filesort | > 1 | SIMPLE | t2 | ALL | user_id | NULL | NULL | NULL | 34773 | 100.00 | Using where; Using join buffer | +----+-------------+-------+------+---------------+------+---------+------+ -------+----------+----------------------------------------------+ 2 rows in set, 1 warning (0.07 sec)

you could try an index on user_id,scoring, and you definitely want one on token (any time you do a join you want an index on the columns you join on). You might try adding article_id to that first index to see if you can get the DB to use that to help by the grouping

Fred

Jeffrey L. Taylor wrote in post #959191:

Quoting Marnen Laibow-Koser <lists@ruby-forum.com>:

Jeffrey L. Taylor wrote in post #958953: > How should I index the terms table for maximum speed?

How can we tell you? You neglected to say how you're using that table...or is the query below the only one you're interested in?

Correct, this is the one I care about. All others are trivial in terms of resources compared to this one.

> It doesn't have > to be > Rails migration doable.

But it will be, since adding indices generally is.

Composite keys are not supported in stock Rails.

Composite *primary keys* are not (but there's a plugin). Composite *indices* are. You can specify multiple columns in add_index.

And I will not switch DB servers next week, so having the index creation in portable form is not a necessity.

Never write SQL in Rails unless there's no other way.

Best,

Quoting Marnen Laibow-Koser <lists@ruby-forum.com>:

Jeffrey L. Taylor wrote in post #959191: > Quoting Marnen Laibow-Koser <lists@ruby-forum.com>: >> Jeffrey L. Taylor wrote in post #958953: >> > How should I index the terms table for maximum speed? >> >> How can we tell you? You neglected to say how you're using that >> table...or is the query below the only one you're interested in? >> > > Correct, this is the one I care about. All others are trivial in terms > of > resources compared to this one. > >> > It doesn't have >> > to be >> > Rails migration doable. >> >> But it will be, since adding indices generally is. >> > > Composite keys are not supported in stock Rails.

Composite *primary keys* are not (but there's a plugin). Composite *indices* are. You can specify multiple columns in add_index.

Sorry I left out a constraint that turned out to be critical. This is the primary and only index on this table. I've tried the plug-in. It was not reliable in my use scenario.

> And I will not switch > DB > servers next week, so having the index creation in portable form is not > a necessity.

Never write SQL in Rails unless there's no other way.

If it is feasible in Rails, I do it that way. However, I was explicitly lifting the constraint of doable in Rails for this usecase. If the usecase included dozens of servers and multiple DB server software, then staying in Rails would make sense. I have one production server (with test database) and one or two development setups (with test databases). Setup in the database client program is very doable. Fighting to shoehorn a usecase that Rails does not handle into Rails is not productive.

Jeffrey

Jeffrey L. Taylor wrote in post #960038:

Quoting Marnen Laibow-Koser <lists@ruby-forum.com>:

> of

Composite *primary keys* are not (but there's a plugin). Composite *indices* are. You can specify multiple columns in add_index.

Sorry I left out a constraint that turned out to be critical. This is the primary and only index on this table. I've tried the plug-in. It was not reliable in my use scenario.

Then here's what I would suggest. Keep the composite index, but do not make it the primary key. Use a surrogate key (that is, a Rails-style id field) instead. You'll be happier. Rails will be happier. Composite primary keys tend to be annoying (yes, I've worked with them successfully, but I'm glad I no longer have to), and anyway surrogate keys are nicer because you can guarantee that they'll never ever change over the life of the record.

> And I will not switch > DB > servers next week, so having the index creation in portable form is not > a necessity.

Never write SQL in Rails unless there's no other way.

If it is feasible in Rails, I do it that way. However, I was explicitly lifting the constraint of doable in Rails for this usecase. If the usecase included dozens of servers and multiple DB server software, then staying in Rails would make sense. I have one production server (with test database) and one or two development setups (with test databases). Setup in the database client program is very doable.

But it's a bad idea since it defeats automation. The right way to do this -- if you *must* -- is to put raw SQL in the migration and set the schema dumper to use SQL. That way you can still use migrations and the schema file.

Fighting to shoehorn a usecase that Rails does not handle into Rails is not productive.

On the contrary, it's very productive. That's how many of the best Rails plugins got developed.

But in this case, all you need to do is add a surrogate key.

Jeffrey

Best,