when is it okay to resort to find_by_sql?

When is it appropriate to resort to find_by_sql?

I'm all for portable, easily maintained code. I'm a HUGE fan the new ActiveRecord::Relation model in Rails 3.0 I keenly feel the peril of dropping into find_by_sql().

Question 1: is there's a snowball's chance in Hades for the following code to be written WITHOUT resorting find_by_sql()?

inner_query =<<EOQ      SELECT DATE(st.datetime) AS start_time,             DATE(et.datetime - INTERVAL 1 DAY) AS end_time,             DATEDIFF(et.datetime, st.datetime) AS day_count,             sbs.quantity AS quantity,             sbs.cost AS cost        FROM service_bills as sbs INNER JOIN time_dimensions AS st ON st.id = sbs.start_time_id INNER JOIN time_dimensions AS et ON et.id = sbs.end_time_id       WHERE sbs.metered_service_id IN (#{metered_service_ids}) EOQ

I'm sure it's possible using a ServiceBill.find() construct where each clause is essentially a quoted string from the above, but by that point, I'd be hard pressed to say that is clearer or more maintainable.

Question 2: Nested queries. In my system, inner_query is, in fact, an inner query that appears within this little monster:

outer_query =<<EOQ      SELECT st.id AS start_time_id,             et.id AS end_time_id,             DATE(st.datetime) AS time,             SUM(bills.quantity/bills.day_count) AS daily_quantity,             SUM(bills.cost/bills.day_count) AS daily_cost        FROM time_dimensions AS st INNER JOIN time_dimensions AS et ON DATE(st.datetime + INTERVAL 1 DAY) = DATE(et.datetime) INNER JOIN (#{inner_query}) AS bills       WHERE st.datetime BETWEEN bills.start_time AND bills.end_time         AND st.is_midnight = 1    GROUP BY st.id EOQ

Even if both of these queries could be cast into a find() construct, I don't see how to create the equivalent of nested queries: it seems like I'd have to "pop back up" to the Rails level, which is problematic for really large datasets.

So: If ActiveRecord::Relation.find() really cannot handle queries of this complexity, I'm okay with that. But if it can, I'd welcome pointers or documentation on how to code them.

TIA.

- ff

Fearless Fool wrote:

When is it appropriate to resort to find_by_sql?

When you need something that can't be done any other way.

I'm all for portable, easily maintained code. I'm a HUGE fan the new ActiveRecord::Relation model in Rails 3.0 I keenly feel the peril of dropping into find_by_sql().

Question 1: is there's a snowball's chance in Hades for the following code to be written WITHOUT resorting find_by_sql()?

inner_query =<<EOQ      SELECT DATE(st.datetime) AS start_time,             DATE(et.datetime - INTERVAL 1 DAY) AS end_time,             DATEDIFF(et.datetime, st.datetime) AS day_count,             sbs.quantity AS quantity,             sbs.cost AS cost        FROM service_bills as sbs INNER JOIN time_dimensions AS st ON st.id = sbs.start_time_id INNER JOIN time_dimensions AS et ON et.id = sbs.end_time_id       WHERE sbs.metered_service_id IN (#{metered_service_ids}) EOQ

I'm sure it's possible using a ServiceBill.find() construct where each clause is essentially a quoted string from the above, but by that point, I'd be hard pressed to say that is clearer or more maintainable.

If it is possible to use find instead of find_by_sql, then always do so. If nothing else, ActiveRecord may be able to reason better about the query string.

Question 2: Nested queries. In my system, inner_query is, in fact, an inner query that appears within this little monster:

outer_query =<<EOQ      SELECT st.id AS start_time_id,             et.id AS end_time_id,             DATE(st.datetime) AS time,             SUM(bills.quantity/bills.day_count) AS daily_quantity,             SUM(bills.cost/bills.day_count) AS daily_cost        FROM time_dimensions AS st INNER JOIN time_dimensions AS et ON DATE(st.datetime + INTERVAL 1 DAY) = DATE(et.datetime) INNER JOIN (#{inner_query}) AS bills       WHERE st.datetime BETWEEN bills.start_time AND bills.end_time         AND st.is_midnight = 1    GROUP BY st.id EOQ

Even if both of these queries could be cast into a find() construct, I don't see how to create the equivalent of nested queries: it seems like I'd have to "pop back up" to the Rails level, which is problematic for really large datasets.

There are relatively few cases (not none, but few) where a subquery should not be replaced by a join. Are you sure you need that subquery?

So: If ActiveRecord::Relation.find() really cannot handle queries of this complexity, I'm okay with that. But if it can, I'd welcome pointers or documentation on how to code them.

Can't help there; haven't started playing with Rails 3 yet.

TIA.

- ff

Best,

Marnen Laibow-Koser wrote (among other things...)

Question 2: Nested queries. In my system, inner_query is, in fact, an inner query that appears within this little monster: ...

There are relatively few cases (not none, but few) where a subquery should not be replaced by a join. Are you sure you need that subquery?

Well, then, Manen, help me understand if I need that subquery.

The inner_query construct fetches "monthly" utility bills. (I say "monthly" in quotes because the start and end dates don't generally fall on month boundaries.) The outer_query construct "explodes" the monthly bills into daily bills, dividing by the number of days spanned by the bill, and summing multiple bills that fall on the same day (since they can overlap). I've thought about this for a while, but don't see how to do both of those functions via a join.

I'll sleep on it, but if you have any insights, let me know. Thanks.

- ff

Fearless Fool wrote:

Well, then, Manen, help me understand if I need that subquery.

Fiddlesticks. Marnen, pardon my typo (again) on the spelling of your name.

- Frealess Foole

Fearless Fool wrote:

Marnen Laibow-Koser wrote (among other things...)

Question 2: Nested queries. In my system, inner_query is, in fact, an inner query that appears within this little monster: ...

There are relatively few cases (not none, but few) where a subquery should not be replaced by a join. Are you sure you need that subquery?

Well, then, Manen, help me understand if I need that subquery.

The inner_query construct fetches "monthly" utility bills. (I say "monthly" in quotes because the start and end dates don't generally fall on month boundaries.) The outer_query construct "explodes" the monthly bills into daily bills, dividing by the number of days spanned by the bill, and summing multiple bills that fall on the same day (since they can overlap). I've thought about this for a while, but don't see how to do both of those functions via a join.

I'll have to think about it some more. However, a quick look at your two queries suggest that you may be trying to do too much in the database and too little in the application.

I'll sleep on it, but if you have any insights, let me know. Thanks.

I'll have to spend some more time on it.

- ff

Best,

Marnen Laibow-Koser wrote:

There are relatively few cases (not none, but few) where a subquery should not be replaced by a join. Are you sure you need that subquery?

Marnen: I owe your a debt of gratitude. Your note spurred me to *really* grok SQL joins. After thirty six hours of study and anti-social behavior, I've got it. I think I'll write a blog post called "The Zen of SQL JOIN: a gentle guide for procedural programmers". (Teaser: Use JOINs to generate cartesian products of tables. Use ON, WHERE, and HAVING to filter the combinatorial explosion to only those rows that you want.)

And you are absolutely right: subqueries can (always? often?) be replaced by studlier joins. In my system, I've re-cast what was previously a four-level (!!) nested query with a "flat" join across seven tables. And it now runs like a bat out of hell.

I *STILL* don't know how to ask the ActiveRecord query system to alias a table in a join, as in:   ...   INNER JOIN vertices AS v1 ON v1.id = edge.v1_id   INNER JOIN vertices AS v2 ON v2.id = edge.v2_id   ... so I'm still using find_by_sql(). If someone can suggest a clean ActiveRecord syntax for this kind of query, I'm all ears.

- ff