ActiveRecord/Arel delete with joins?

Hello,

What’s the best/easiest way to write this delete statement using ActiveRecord 3.1?

DELETE e1

FROM events e1

JOIN events e2

WHERE e1.subject_type = e2.subject_type

AND e1.subject_id = e2.subject_id

AND e1.origin_type = e2.origin_type

AND e1.origin_id = e2.origin_id

AND e1.id > e2.id

Thanks for the help.

Not answering the question I am afraid, but I think it is unwise to assume anything about the id sequence. Presumably here you are assuming that id values are assigned in an increasing sequence, but I don't think this is necessarily guaranteed in the general case. I think it might be better to use created_at, if that is what you really mean. On the other hand if in reality you do not care which one you delete and have the id test only to make sure that you delete only one of them then please ignore my comment.

Since you are interested in the best way to code it (rather than just hacking in the sql) then presumably it is something that happens routimnely rather than some tidying up operation that you have to do once. Would it not be possible using validations or similar to ensure that the duplicate record situation does not happen in the first place?

Colin

All of this is true, and you should try these things before the following.

You don't need a join. You're defining all your conditions on one table just fine:

Event.where(:subject_type => e1.subject_type, ... ).where('id > ?', e1.id).destroy_all

Hmm. Interesting. On the one hand, I’m glad people are looking out for each other and advice is given on best practices. On the other hand, I forgot what it’s like to ask for help on the internet and have everything you do under heavy scrutiny… :slight_smile:

Presumption incorrect. It is a one off and not routine code, but that doesn’t stop me from wanting to learn how to better use AR/Arel. Also, I created a unique index as soon as I realized there were dupes and I cleaned them out. I also added validation (which isn’t guaranteed to work, hence the unique index in the db), and test cases/specs for the situation.

About the id vs created_at… I disagree and consciously chose the former. I think either will work fine and it’s ok to make assumptions about the uniqueness (for sure) and order (comfortably sure) of primary keys for a given adapter. I’m familiar with the Postgres and MySQL adapters and I know they create unique, auto incrementing primary keys for each table.

That said, I am open to notion that I’m wrong or do not fully understand something though, so…

Thanks for the help,

– C

> Hello, > What's the best/easiest way to write this delete statement using > ActiveRecord 3.1? > > DELETE e1 > FROM events e1 > JOIN events e2 > WHERE e1.subject_type = e2.subject_type > AND e1.subject_id = e2.subject_id > AND e1.origin_type = e2.origin_type > AND e1.origin_id = e2.origin_id > AND e1.id > e2.id

Not answering the question I am afraid, but I think it is unwise to assume anything about the id sequence. Presumably here you are assuming that id values are assigned in an increasing sequence, but I don't think this is necessarily guaranteed in the general case. I think it might be better to use created_at, if that is what you really mean. On the other hand if in reality you do not care which one you delete and have the id test only to make sure that you delete only one of them then please ignore my comment.

Since you are interested in the best way to code it (rather than just hacking in the sql) then presumably it is something that happens routimnely rather than some tidying up operation that you have to do once. Would it not be possible using validations or similar to ensure that the duplicate record situation does not happen in the first place?

Colin

Hmm. Interesting. On the one hand, I'm glad people are looking out for each other and advice is given on best practices. On the other hand, I forgot what it's like to ask for help on the internet and have everything you do under heavy scrutiny… :slight_smile:

I did start by apologising for not answering the question :slight_smile: I suspect it may be that for maybe 50% of questions asked here the best result for the OP is not to have his question directly answered but to suggest alternative ways of approaching the problem. No I have not done the research to prove that, it is just my feeling. To answer a question by suggesting alternatives is therefore a perfectly valid, and often helpful response.

Presumption incorrect. It is a one off and not routine code, but that doesn't stop me from wanting to learn how to better use AR/Arel. Also, I created a unique index as soon as I realized there were dupes and I cleaned them out. I also added validation (which isn't guaranteed to work, hence the unique index in the db), and test cases/specs for the situation.

OK, I did not realise that this was an academic question. In that case my suggestion is of no use to you. You never know, it may be of use to someone else finding this thread in the future, in which case I have not entirely wasted my time.

About the id vs created_at… I disagree and consciously chose the former. I think either will work fine and it's ok to make assumptions about the uniqueness (for sure) and order (comfortably sure) of primary keys for a given adapter. I'm familiar with the Postgres and MySQL adapters and I know they create unique, auto incrementing primary keys for each table.

Certainly the id values will be unique, there is no question about that. I seem to remember reading about the situation with multiple servers where each server will get given a batch of id values it could use, so that the id values would not necessarily be in the same order as created_at. I may be mistaken however. Also consider the possibility in a few years time of someone migrating the code onto a different db adaptor. The code might then break. I think the point is that Rails does not guarantee that id values will increase monotonically and therefore it is not a good idea to rely on this

That said, I am open to notion that I'm wrong or do not fully understand something though, so… Thanks for the help,

I don't think I have been much help. I can answer part of the question though. You ask for the "best/easiest" way to write the statement. The *easiest* way is just to code in the sql as you already have the sql available, so it is easy. I still don't know whether there is any *better* way though. It is certainly not obvious to me, sorry again.

Colin

This doesn't do the same thing as the original query. This deletes all the events which have the same subject_type etc as a specific event e1, but created after e1, i.e. remove what you might consider duplicates of the specific event e1. The original query on the other hand deletes all duplicates, not only those that are duplicates of a specific even.

Personally I would just use the raw sql in a call to delete_all. A complicated use of arel isn't necessarily "better" or easier to use than sql (portability concerns aside, but then the whole idea that you can move any significantly sized app just by changing a line in database.yml is a bit of a myth anyway).

you could do something like

    class Event < ..       def self.duplicate_events         event_alias = Event.arel_table.alias         scoped.joins(event_alias).where(:subject_id => event_alias[:subject_id], ...).where(Event.arel_table[:id].gt(event_alias[:id]))       end     end

which gives you an Event.duplicate_events scope

unfortunately (at least with the version of ActiveRecord/Arel in rails 3.0.x you can't do Event.duplicate_events.delete_all, because the bit of Arel that deals with deletes always does "delete from blah" and doesn't allow you to say "delete t1 from t1 join t2, ..."

Fred

Personally I would just use the raw sql in a call to delete_all. A

complicated use of arel isn’t necessarily “better” or easier to use

than sql (portability concerns aside, but then the whole idea that you

can move any significantly sized app just by changing a line in

database.yml is a bit of a myth anyway).

you could do something like

class Event < ..

  def self.duplicate_events

    event_alias = Event.arel_table.alias

    scoped.joins(event_alias).where(:subject_id =>

event_alias[:subject_id], …).where(Event.arel_table[:id].gt(event_alias[:id]))

  end

end

That results in the following error:

RuntimeError: unknown class: Arel::Nodes::TableAlias

Distilled down:

Event.joins(Event.arel_table.alias)

Any ideas?

And yeah… it seems like just putting SQL into a delete_all call is the easiest way to go. Still interested in this Arel stuff though.

– C