Optimising code

I'm running a website that displays projects and it takes too long for one
page to render. It shows all projects but that takes up to 1 second, a bit
too much for a very much visited page. How can I improve the following code
snippet without starting to cache?

<% for project in @projects %>
  <% invalidproject = (controller.authenticate and
((Connection.count(["project_id = ? and kind = 'p'", project.id]) == 0) or
(Connection.count(["project_id = ? and kind = 's'", project.id]) == 0))) %>

Where Project has_many :connections. Can I use eager loading there? This
count method generates way to much queries now...

Bart

Have you looked at your logs? Eager loading seems like a good choice,
but it would be great to know whether you're really executing a lot
of SQL prior to making such an optimization.

Bart Braem wrote:

I'm running a website that displays projects and it takes too long for one
page to render. It shows all projects but that takes up to 1 second, a bit
too much for a very much visited page. How can I improve the following code
snippet without starting to cache?

<% for project in @projects %>
  <% invalidproject = (controller.authenticate and
((Connection.count(["project_id = ? and kind = 'p'", project.id]) == 0) or
(Connection.count(["project_id = ? and kind = 's'", project.id]) == 0))) %>

Where Project has_many :connections. Can I use eager loading there? This
count method generates way to much queries now...
  

Also, make sure you use pagination. It will make a huge difference.

I'm running a website that displays projects and it takes too long
for one page to render. It shows all projects but that takes up to 1
second, a bit too much for a very much visited page. How can I
improve the following code snippet without starting to cache?

<% for project in @projects %>
  <% invalidproject = (controller.authenticate and
((Connection.count(["project_id = ? and kind = 'p'", project.id]) ==
0) or (Connection.count(["project_id = ? and kind = 's'",
project.id]) == 0))) %>

First and foremost you can improve this code by pulling it out of the
view. At least the counting stuff should be pushed back into the
Project model class, the decision whether a project is valid or not
belongs exactly there. Regarding authentication, if it involves more
than a simple, single method call, hide it away in a helper method.

Where Project has_many :connections. Can I use eager loading there?
This count method generates way to much queries now...

When you say that you don't want to use caching, I take it, you're
talking of page or fragment caching. Look up counter_cache in the docs
for belongs_to, that's a way of caching much better suited to your
problem. I would treat the question whether to use eager loading
separately. Eager loading is great if you know that you have to access
all the loaded objects eventually. Otherwise it is costly.

Michael

Steven Ross wrote:

Have you looked at your logs? Eager loading seems like a good choice,
but it would be great to know whether you're really executing a lot
of SQL prior to making such an optimization.

I put the server in development mode and tried the page:
Completed in 0.52014 (1 reqs/sec) | Rendering: 0.18694 (35%) | DB: 0.24157
(46%)
That's not fun, because it shows lots of queries like
SQL (0.000585) SELECT count(*) AS count_all FROM assignments WHERE
(project_id = 11)
There are around 100 projects there so I don't like the idea of hammering
the database like that...
Can I force a join somewhere to get around those queries?

Thanks for your help,
Bart

Michael Schuerig wrote:

I'm running a website that displays projects and it takes too long
for one page to render. It shows all projects but that takes up to 1
second, a bit too much for a very much visited page. How can I
improve the following code snippet without starting to cache?

<% for project in @projects %>
  <% invalidproject = (controller.authenticate and
((Connection.count(["project_id = ? and kind = 'p'", project.id]) ==
0) or (Connection.count(["project_id = ? and kind = 's'",
project.id]) == 0))) %>

First and foremost you can improve this code by pulling it out of the
view. At least the counting stuff should be pushed back into the
Project model class, the decision whether a project is valid or not
belongs exactly there. Regarding authentication, if it involves more
than a simple, single method call, hide it away in a helper method.

The counting is put into the Project model class. But is it enough to just
put this method there:
    def invalid_for_students
      return ((Connection.count(["project_id = ? and kind = 'p'", id]) == 0)
or (Connection.count(["project_id = ? and kind = 's'", id]) == 0))
    end
That doesn't feel very optimized to me, altough I wonder wether it can be
improved. (See below also.)

The authentication does not need more method calls as it works based on
sessions. I don't a helper method would work here.

Where Project has_many :connections. Can I use eager loading there?
This count method generates way to much queries now...

When you say that you don't want to use caching, I take it, you're
talking of page or fragment caching. Look up counter_cache in the docs
for belongs_to, that's a way of caching much better suited to your
problem. I would treat the question whether to use eager loading
separately. Eager loading is great if you know that you have to access
all the loaded objects eventually. Otherwise it is costly.

Yes I meant page and fragment caching. counter_cache seems nice but due to
the nature of the associations it looks difficult. I have to count based on
a condition, which does not seem possible with counter caching.

As mentioned in the other post, I now think the following series of queries
is the problem:
SELECT count(*) AS count_all FROM assignments WHERE (assignments.project_id
= 311)
But I wonder wether that can be improved. The only reference to assignments
in the view is the method project.assignments.size so that triggers it.

I have some fairly complicated SQL queries to get to the list of projects,
similar to this:
SELECT p.* FROM projects AS p WHERE p.kind like '%somekind%' and p.active
= 'y' and
(SELECT count(*) FROM connections WHERE project_id = p.id and kind = 'p') >
0 and
(SELECT count(*) FROM connections WHERE project_id = p.id and kind = 's') >
0 and
(SELECT count(*) FROM assignments WHERE project_id = p.id and year < ? ) =
0 and
(SELECT count(*) FROM assignments WHERE project_id = p.id and year >= ? ) <=
p.number_of_students
ORDER BY title ASC

This is a fairly heavy query and it is repeated for another p.kind with
other years too. I did not write it and I do not like it, but it expresses
the correct application logic. This seems to be the root cause, but how can
I fix it?
I'm thinking about splitting the connection table into different kinds, but
then there are more tables and models of course.
Which makes me wonder wether it's not better splitting out the project
table. It's based on a (currently) fixed number of kinds, altough I think
the like p.kind part of the query should not be too heavy.

Thanks for your input, this is really interesting!

Bart

Jamey Cribbs wrote:

Also, make sure you use pagination. It will make a huge difference.

I know it would, but that's not possible here. For political reasons I can't
use pagination.

Bart

Hey,

okay, you *can* use eager loading to reduce the number of hits to the
database (there's a 'but' coming at the end of this:

@projects = Project.find(:all, :conditions => ['your_conditions = ?',
:whatever],
  :include => [:connections, :assignments])

@projects.each do |project|
  # get the count of 'p' kind connections
  project.connections.inject(0) {|m,v| m += 1 if v.kind == 'p'; m}
  # get the count of 's' kind connections
  project.connections.inject(0) {|m,v| m += 1 if v.kind == 'p'; m}
end

So... because you've loaded all the records into memory as regular
ruby objects you can traverse them and do whatever calculations/tests
you need, without hitting the database again.

BUT (told you it was coming) - this may actually be slower than the
way you're doing things now. If you have a massive query cache
(mysql) *and* these queries are executed repeatedly you will probably
find that eager loading with :include is *much* slower. You also need
to make sure you've correctly indexed all your join columns if you use
eager loading - otherwise it's virtually guaranteed to be slower (even
without a big query cache to skew the results).

You seem to be missing one bit of the puzzle which will reduce your
hundreds-of-queries to a small handful - namely grouping your counts.
Consider this:

@projects = Project.find(:all, :conditions => 'yadda') # NO :include HERE

@p_counts = Connection.count(:group => 'project_id',
    :conditions => ['project_id in (?) and kind = ?',
@projects.collect(&:id), 'p'])

@s_counts = Connection.count(:group => 'project_id',
    :conditions => ['project_id in (?) and kind = ?',
@projects.collect(&:id), 's'])

That's 3 queries, no eager loading, but everything you need to figure
out the counts for connections of kind p and s for the projects you've
loaded.

@p_counts and @s_counts are hashes keyed on project_id, where the
value is the counts. Note that if one of the projects has no p or s
connections it won't have an entry in the hash. As such you need to
use to_i to coerce any nil lookups to be 0. See below:

@projects.each do |project|
  # any 'p' connections for this project?
  @p_counts[project.id].to_i == 0

  # any 's' connections for this project?
  @s_counts[project.id].to_i == 0
end

Something to try out anyhow, and something that you can use to compare
against an eager loading solution. I'd be willing to bet a beer that
grouped counts will be faster than eager loading (at least 2 X as
fast).

Hope this helps,
Trevor

Don’t know which DB you use, but I would recommend
a) a good book about SQL
b) using your database’s performance analyzer (e.g. EXPLAIN PLAN).

There are only a few rules to speed up database access:

  1. optimize your queries
  2. optimize your queries

With Oracle, your query would most likely cause (assuming you have indexes only on the primary and foreign keys):

  • full table scan on projects(1)

    • 2x index scan + table access on connections (for each project found in step (1))
    • 2x index scan + table access on assignments (for each project found in step (1))

So, assuming projects returns 100 rows, this will run 200 times through each of the other tables (bad, when these countain a few million rows).

I would at first get rid of the connection scans (easy - a simple inner join), and then try to speed up the remaining subqueries and the join (perhaps by using indexes on connections.kind and assignments.year).

A quick shot to optimize this thing in Oracle (I’m sure this is not the optimal query for - say - postgresql) would result in:

SELECT p.*

FROM connections c1,
connections c2,
projects p

WHERE c1.project_id = p.id
and c1.kind = ‘p’

and c2.project_id = p.id
and c2.kind = ‘s’
and not exists (
SELECT 1
FROM assignments
WHERE project_id = p.id
AND year < ?)

and p.number_of_students >= (
SELECT count(*) FROM assignments WHERE project_id = p.id and year >= ?
)

and p.kind like ‘%somekind%’

and p.active = ‘y’

Have fun!

Thomas

Judging by the shown code, I'd say the original poster has more pressing
needs than to optimize queries. Squeezing performance out of
ill-conceived code is the wrong approach. The problem appears to be one
of algorithm, not micro-optimization.

Recommended reading:

Robert C. Martin
Agile Software Development
Prentice-Hall 2003

Graeme C. Simsion, Graham C. Witt
Data Modelling Essentials
MKP 2005 (2nd ed.)

Michael

Michael Schuerig wrote:

Judging by the shown code, I'd say the original poster has more pressing
needs than to optimize queries. Squeezing performance out of
ill-conceived code is the wrong approach. The problem appears to be one
of algorithm, not micro-optimization.

I know that the data modelling is not optimal. This is an inherited project
and I don't like the speed right now.
I'm strongly thinking of making separate tables for connections with kind p
and kind s, that would remove a lot of "select from connections where kind
= 's'" queries...

Bart