Optimising code

I'm running a website that displays projects and it takes too long for one page to render. It shows all projects but that takes up to 1 second, a bit too much for a very much visited page. How can I improve the following code snippet without starting to cache?

<% for project in @projects %>   <% invalidproject = (controller.authenticate and ((Connection.count(["project_id = ? and kind = 'p'", project.id]) == 0) or (Connection.count(["project_id = ? and kind = 's'", project.id]) == 0))) %>

Where Project has_many :connections. Can I use eager loading there? This count method generates way to much queries now...

Bart

Have you looked at your logs? Eager loading seems like a good choice,
but it would be great to know whether you're really executing a lot
of SQL prior to making such an optimization.

Bart Braem wrote:

I'm running a website that displays projects and it takes too long for one page to render. It shows all projects but that takes up to 1 second, a bit too much for a very much visited page. How can I improve the following code snippet without starting to cache?

<% for project in @projects %>   <% invalidproject = (controller.authenticate and ((Connection.count(["project_id = ? and kind = 'p'", project.id]) == 0) or (Connection.count(["project_id = ? and kind = 's'", project.id]) == 0))) %>

Where Project has_many :connections. Can I use eager loading there? This count method generates way to much queries now...   

Also, make sure you use pagination. It will make a huge difference.

I'm running a website that displays projects and it takes too long for one page to render. It shows all projects but that takes up to 1 second, a bit too much for a very much visited page. How can I improve the following code snippet without starting to cache?

<% for project in @projects %>   <% invalidproject = (controller.authenticate and ((Connection.count(["project_id = ? and kind = 'p'", project.id]) == 0) or (Connection.count(["project_id = ? and kind = 's'", project.id]) == 0))) %>

First and foremost you can improve this code by pulling it out of the view. At least the counting stuff should be pushed back into the Project model class, the decision whether a project is valid or not belongs exactly there. Regarding authentication, if it involves more than a simple, single method call, hide it away in a helper method.

Where Project has_many :connections. Can I use eager loading there? This count method generates way to much queries now...

When you say that you don't want to use caching, I take it, you're talking of page or fragment caching. Look up counter_cache in the docs for belongs_to, that's a way of caching much better suited to your problem. I would treat the question whether to use eager loading separately. Eager loading is great if you know that you have to access all the loaded objects eventually. Otherwise it is costly.

Michael

Steven Ross wrote:

Have you looked at your logs? Eager loading seems like a good choice, but it would be great to know whether you're really executing a lot of SQL prior to making such an optimization.

I put the server in development mode and tried the page: Completed in 0.52014 (1 reqs/sec) | Rendering: 0.18694 (35%) | DB: 0.24157 (46%) That's not fun, because it shows lots of queries like SQL (0.000585) SELECT count(*) AS count_all FROM assignments WHERE (project_id = 11) There are around 100 projects there so I don't like the idea of hammering the database like that... Can I force a join somewhere to get around those queries?

Thanks for your help, Bart

Michael Schuerig wrote:

I'm running a website that displays projects and it takes too long for one page to render. It shows all projects but that takes up to 1 second, a bit too much for a very much visited page. How can I improve the following code snippet without starting to cache?

<% for project in @projects %>   <% invalidproject = (controller.authenticate and ((Connection.count(["project_id = ? and kind = 'p'", project.id]) == 0) or (Connection.count(["project_id = ? and kind = 's'", project.id]) == 0))) %>

First and foremost you can improve this code by pulling it out of the view. At least the counting stuff should be pushed back into the Project model class, the decision whether a project is valid or not belongs exactly there. Regarding authentication, if it involves more than a simple, single method call, hide it away in a helper method.

The counting is put into the Project model class. But is it enough to just put this method there:     def invalid_for_students       return ((Connection.count(["project_id = ? and kind = 'p'", id]) == 0) or (Connection.count(["project_id = ? and kind = 's'", id]) == 0))     end That doesn't feel very optimized to me, altough I wonder wether it can be improved. (See below also.)

The authentication does not need more method calls as it works based on sessions. I don't a helper method would work here.

Where Project has_many :connections. Can I use eager loading there? This count method generates way to much queries now...

When you say that you don't want to use caching, I take it, you're talking of page or fragment caching. Look up counter_cache in the docs for belongs_to, that's a way of caching much better suited to your problem. I would treat the question whether to use eager loading separately. Eager loading is great if you know that you have to access all the loaded objects eventually. Otherwise it is costly.

Yes I meant page and fragment caching. counter_cache seems nice but due to the nature of the associations it looks difficult. I have to count based on a condition, which does not seem possible with counter caching.

As mentioned in the other post, I now think the following series of queries is the problem: SELECT count(*) AS count_all FROM assignments WHERE (assignments.project_id = 311) But I wonder wether that can be improved. The only reference to assignments in the view is the method project.assignments.size so that triggers it.

I have some fairly complicated SQL queries to get to the list of projects, similar to this: SELECT p.* FROM projects AS p WHERE p.kind like '%somekind%' and p.active = 'y' and (SELECT count(*) FROM connections WHERE project_id = p.id and kind = 'p') > 0 and (SELECT count(*) FROM connections WHERE project_id = p.id and kind = 's') > 0 and (SELECT count(*) FROM assignments WHERE project_id = p.id and year < ? ) = 0 and (SELECT count(*) FROM assignments WHERE project_id = p.id and year >= ? ) <= p.number_of_students ORDER BY title ASC

This is a fairly heavy query and it is repeated for another p.kind with other years too. I did not write it and I do not like it, but it expresses the correct application logic. This seems to be the root cause, but how can I fix it? I'm thinking about splitting the connection table into different kinds, but then there are more tables and models of course. Which makes me wonder wether it's not better splitting out the project table. It's based on a (currently) fixed number of kinds, altough I think the like p.kind part of the query should not be too heavy.

Thanks for your input, this is really interesting!

Bart

Jamey Cribbs wrote:

Also, make sure you use pagination. It will make a huge difference.

I know it would, but that's not possible here. For political reasons I can't use pagination.

Bart

Hey,

okay, you *can* use eager loading to reduce the number of hits to the database (there's a 'but' coming at the end of this:

@projects = Project.find(:all, :conditions => ['your_conditions = ?', :whatever],   :include => [:connections, :assignments])

@projects.each do |project|   # get the count of 'p' kind connections   project.connections.inject(0) {|m,v| m += 1 if v.kind == 'p'; m}   # get the count of 's' kind connections   project.connections.inject(0) {|m,v| m += 1 if v.kind == 'p'; m} end

So... because you've loaded all the records into memory as regular ruby objects you can traverse them and do whatever calculations/tests you need, without hitting the database again.

BUT (told you it was coming) - this may actually be slower than the way you're doing things now. If you have a massive query cache (mysql) *and* these queries are executed repeatedly you will probably find that eager loading with :include is *much* slower. You also need to make sure you've correctly indexed all your join columns if you use eager loading - otherwise it's virtually guaranteed to be slower (even without a big query cache to skew the results).

You seem to be missing one bit of the puzzle which will reduce your hundreds-of-queries to a small handful - namely grouping your counts. Consider this:

@projects = Project.find(:all, :conditions => 'yadda') # NO :include HERE

@p_counts = Connection.count(:group => 'project_id',     :conditions => ['project_id in (?) and kind = ?', @projects.collect(&:id), 'p'])

@s_counts = Connection.count(:group => 'project_id',     :conditions => ['project_id in (?) and kind = ?', @projects.collect(&:id), 's'])

That's 3 queries, no eager loading, but everything you need to figure out the counts for connections of kind p and s for the projects you've loaded.

@p_counts and @s_counts are hashes keyed on project_id, where the value is the counts. Note that if one of the projects has no p or s connections it won't have an entry in the hash. As such you need to use to_i to coerce any nil lookups to be 0. See below:

@projects.each do |project|   # any 'p' connections for this project?   @p_counts[project.id].to_i == 0

  # any 's' connections for this project?   @s_counts[project.id].to_i == 0 end

Something to try out anyhow, and something that you can use to compare against an eager loading solution. I'd be willing to bet a beer that grouped counts will be faster than eager loading (at least 2 X as fast).

Hope this helps, Trevor

Don’t know which DB you use, but I would recommend a) a good book about SQL b) using your database’s performance analyzer (e.g. EXPLAIN PLAN).

There are only a few rules to speed up database access:

  1. optimize your queries
  2. optimize your queries

With Oracle, your query would most likely cause (assuming you have indexes only on the primary and foreign keys):

  • full table scan on projects(1)

    • 2x index scan + table access on connections (for each project found in step (1))
    • 2x index scan + table access on assignments (for each project found in step (1))

So, assuming projects returns 100 rows, this will run 200 times through each of the other tables (bad, when these countain a few million rows).

I would at first get rid of the connection scans (easy - a simple inner join), and then try to speed up the remaining subqueries and the join (perhaps by using indexes on connections.kind and assignments.year).

A quick shot to optimize this thing in Oracle (I’m sure this is not the optimal query for - say - postgresql) would result in:

SELECT p.*

FROM connections c1, connections c2, projects p

WHERE c1.project_id = p.id and c1.kind = ‘p’

and c2.project_id = p.id and c2.kind = ‘s’ and not exists ( SELECT 1 FROM assignments WHERE project_id = p.id AND year < ?)

and p.number_of_students >= ( SELECT count(*) FROM assignments WHERE project_id = p.id and year >= ? )

and p.kind like ‘%somekind%’

and p.active = ‘y’

Have fun!

Thomas

Judging by the shown code, I'd say the original poster has more pressing needs than to optimize queries. Squeezing performance out of ill-conceived code is the wrong approach. The problem appears to be one of algorithm, not micro-optimization.

Recommended reading:

Robert C. Martin Agile Software Development Prentice-Hall 2003

Graeme C. Simsion, Graham C. Witt Data Modelling Essentials MKP 2005 (2nd ed.)

Michael

Michael Schuerig wrote:

Judging by the shown code, I'd say the original poster has more pressing needs than to optimize queries. Squeezing performance out of ill-conceived code is the wrong approach. The problem appears to be one of algorithm, not micro-optimization.

I know that the data modelling is not optimal. This is an inherited project and I don't like the speed right now. I'm strongly thinking of making separate tables for connections with kind p and kind s, that would remove a lot of "select from connections where kind = 's'" queries...

Bart