Proposal: Emit SQL notification with Arel query builder

We have a use case in which we would like to have access to the relation of emitted queries for further inspection. Similar to sql.active_record but with access to the query builder (Arel). Even Arel being a private API would this be desirable ?

One work around that we looked at was to patch the to_sql_binds method in DatabaseStatements and emit a custom notification from there passing the arel object, however this seems a fragile approach.

Can you describe the use case?

I’d like to have the ability to inspect queries to verify if certain attributes are being used. This will help us validate that a given attribute is not being used and serve as input to safely remove it.

(I’m on the same team as Elson)

Receiving the raw SQL alone, without the query builder object, means that one has to parse the SQL to understand what it might be doing.

Instead of using the Arel object, we considered pattern matching on the raw SQL, but we run into the blind spot that Active Record often introduces aliases, making it unreliable.

We could go the route of using an actual SQL parser, but:

  • There doesn’t seem to be many SQL parsers in ruby that are well maintained
  • All the gems that I could find require a java or C binding, making it heavier than I would prefer.
  • It feels wasteful when we already have a perfectly good query builder will all the right information already.
1 Like

Thank you for the examples. Having a builder is probably better than a raw sql, but the query can theoretically be built in a different ways and even via raw SQL using something like Arel.sql(some part of sql), so this still can have complications when trying to get something useful from it.

1 Like

That’s true. We’d still not be able to cover the usage of Arel for raw queries, but we have very limited cases of that in our app, so this wouldn’t be a problem.

While I’ve been talking about wanting to make Arel [moderately] more public for years, I think I would worry about the degree of exposure this creates – it flips around from “[in some/many cases] if you give us an Arel node, we’ll know what to do with it” to more overtly implying, at the very least, that the Arel structure we build internally is a canonical form.

(I also share @fatkodima’s concern that it could easily feel safer / more comprehensive than it is in practice – even without explicit application-level Arel.sql, there have been, and likely still are [hopefully diminishing] circumstances where AR will itself pre-flatten query fragments.)

Do you have a sense of what this change would look like to implement?

In a similar way I’ve needed to be able to find the alias names Arel would choose while walking the AST tree, and have created a thing that monitors as the tree is being walked so that you can know the correlation names involved. For instance, if you have City, Airport, and Flight models set up like this:

Then add The Brick into your project, and when creating an ActiveRecord relation in order to query flights like this:

flights = Flight.joins(departure: :city, arrival: :city).left_joins(alternate_arrival: :city)

Which will generate this kind of SQL:

SELECT "flights".* FROM "flights"
 INNER JOIN "airports" ON "airports"."id" = "flights"."departure_id"
 INNER JOIN "cities" ON "cities"."id" = "airports"."city_id"
 INNER JOIN "airports" "arrivals_flights" ON "arrivals_flights"."id" = "flights"."arrival_id"
 INNER JOIN "cities" "cities_airports" ON "cities_airports"."id" = "arrivals_flights"."city_id"
 LEFT OUTER JOIN "airports" "alternate_arrivals_flights" ON "alternate_arrivals_flights"."id" = "flights"."alternate_arrival_id"
 LEFT OUTER JOIN "cities" "cities_airports_2" ON "cities_airports_2"."id" = "alternate_arrivals_flights"."city_id"

Then you can retrieve a useful hash that indicates all the correlation names that Arel has chosen for the ActiveRecord query like this:

flights.brick_links

  => 
{""=>"flights",
 "departure"=>"airports",
 "departure.city"=>"cities",
 "arrival"=>"arrivals_flights",
 "arrival.city"=>"cities_airports",
 "alternate_arrival"=>"alternate_arrivals_flights",
 "alternate_arrival.city"=>"cities_airports_2"}

Might not completely solve your issue of wanting to know all attributes, but it will give you the thing that is most difficult to find – having a way to know the resulting correlation names.

Does this process (aliasing joined/referenced table columns) happen in a deterministic manner? I ask because one of the earliest things I learned in Rails was that the HABTM join table algorithm was simply 1. take the table names being joined, 2. alphabetize them, 3. snake-case the product. If something equally predictable happens in the Arel join logic, it might be possible to apply it from outside and use the result accurately.

Walter

This gem is as deterministic as Arel itself – only because that’s all it uses – it takes its cues 100% from Arel. No having to maintain an “outside” approach because it doesn’t dream up anything on its own. It listens to exactly what Arel chooses for each generated name for every JOINed item.

The way that Arel does this has changed a little over the years – you won’t see the same names built out under Rails 4.1 as you will under all newer versions. So if you build something with this gem and then upgrade Rails and cause the correlation names to change, your code won’t break. It’s still telling you the exact names that it has chosen this time around.

I have thrown extremely complicated queries at it, and have not been able to break it. Feel free to test it to the extreme!