nested includes in AR and to_json / to_xml

Hi,

I updated my project to Rails 3 (beta 4) and got a problem with
to_json and to_xml.

I have the model structure: Projects has_many Tasks habtm Users

After the line @projects = Project.includes(:tasks
=> :users).where('tasks.id is not null')
mysql returns:

Pr1 | task1 | Peter |
Pr2 | task2 | Peter |
Pr2 | task2 | David |
Pr2 | task2 | Bernd |
TestProject | task3 | Peter |
Pr2 | task4 | David |

When I try to get the xml version with:
@projects.to_xml(:include => {:tasks => {:include => :users}})

I get three times the same task for the project Pr2:

  <project>
    <name>Pr2</name>
    <tasks>
      <task>
        <name>task2</name>
          <users>
            <user> <name>Peter</name> </user>
            <user> <name>David</name> </user>
            <user> <name>Bernd</name> </user>
          </users>
       </task>
      <task>
        <name>task2</name>
          <users>
            <user> <name>Peter</name> </user>
            <user> <name>David</name> </user>
            <user> <name>Bernd</name> </user>
          </users>
       </task>
      <task>
        <name>task2</name>
          <users>
            <user> <name>Peter</name> </user>
            <user> <name>David</name> </user>
            <user> <name>Bernd</name> </user>
          </users>
       </task>
    </tasks>
  </project>

It should return one task with 3 users only, this is the same for
to_json. This code works in Rails 2.3.8 (to_xml and to_json returns
one task with 3 users), after updating to Rails 3 beta 4 the to_xml/
to_json methods doesn't work as expected.

So how can I get the xml/json result from an ActiveRecord::Relation
set (with nested includes)?

Thanks for any help!
David

Hi David,

This is what I found with rails edge.

puts Project.find(2).to_xml(:include => {:tasks => {:include
=> :users}})

<project>
  <name>pr2</name>
  <created-at type="datetime">2010-06-25T03:50:57Z</created-at>
  <updated-at type="datetime">2010-06-25T03:50:57Z</updated-at>
  <id type="integer">2</id>
  <tasks type="array">
    <task>
      <name>task2</name>
      <project-id type="integer">2</project-id>
      <id type="integer">2</id>
      <users type="array">
        <user>
          <name>Peter</name>
          <task-id type="NilClass">2</task-id>
          <id type="integer">1</id>
          <user-id type="NilClass">1</user-id>
        </user>
        <user>
          <name>David</name>
          <task-id type="NilClass">2</task-id>
          <id type="integer">2</id>
          <user-id type="NilClass">2</user-id>
        </user>
        <user>
          <name>Bernd</name>
          <task-id type="NilClass">2</task-id>
          <id type="integer">3</id>
          <user-id type="NilClass">3</user-id>
        </user>
      </users>
    </task>
  </tasks>
</project>

ActiveRecord::Schema.define(:version => 20100625013546) do

  create_table "projects", :force => true do |t|
    t.string "name"
    t.datetime "created_at"
    t.datetime "updated_at"
  end

  create_table "tasks", :force => true do |t|
    t.string "name"
    t.integer "project_id"
  end

  create_table "tasks_users", :id => false, :force => true do |t|
    t.integer "task_id", :null => false
    t.integer "user_id", :null => false
  end

  create_table "users", :force => true do |t|
    t.string "name"
  end

end
class Project < ActiveRecord::Base
  has_many :tasks
end
class Task < ActiveRecord::Base
  belongs_to :project
  has_and_belongs_to_many :users
end
class User < ActiveRecord::Base
  has_and_belongs_to_many :tasks
end

# seeds.rb
task1 = Task.create!(:name => 'task1')
task2 = Task.create!(:name => 'task2')
task3 = Task.create!(:name => 'task3')

pr1 = Project.create!(:name => 'pr1')
pr2 = Project.create!(:name => 'pr2')

task2.project_id = pr2.id
task2.save!

task2.users.create(:name => 'Peter')
task2.users.create(:name => 'David')
task2.users.create(:name => 'Bernd')

Hi Neeraj,

could you try

puts Project.includes(:tasks => :users).where('tasks.id is not
null').to_xml(:include => {:tasks => {:include => :users}})

this will return 3 times the same task (task2).

Without the where('tasks.id is not null'), it returns the correct
values. The differnce is in the sql query:

For Project.includes(:tasks => :users) the sql queries are:

SELECT `projects`.* FROM `projects`
SELECT `tasks`.* FROM `tasks` WHERE (`tasks`.project_id IN (1,2))
SELECT `task_users`.* FROM `task_users` WHERE (`task_users`.task_id =
2)
SELECT `users`.* FROM `users` WHERE (`users`.`id` IN (1,2,3))

Project.includes(:tasks => :users).where('tasks.id is not null')
creates a single sql query:

SELECT `projects`.`id` AS t0_r0, `projects`.`name` AS t0_r1,
`projects`.`created_at` AS t0_r2, `projects`.`updated_at` AS t0_r3,
`tasks`.`id` AS t1_r0, `tasks`.`name` AS t1_r1, `tasks`.`project_id`
AS t1_r2, `tasks`.`created_at` AS t1_r3, `tasks`.`updated_at` AS
t1_r4, `users`.`id` AS t2_r0, `users`.`name` AS t2_r1,
`users`.`created_at` AS t2_r2, `users`.`updated_at` AS t2_r3 FROM
`projects` LEFT OUTER JOIN `tasks` ON `tasks`.`project_id` =
`projects`.`id` LEFT OUTER JOIN `task_users` ON `tasks`.`id` =
`task_users`.`task_id` LEFT OUTER JOIN `users` ON `users`.`id` =
`task_users`.`user_id` WHERE (tasks.id is not null)

I think the problem is, that the single sql query returns 3 rows for
task2 (as described in my first post), since Rails edge the to_json/
to_xml return these 3 rows (and each row includes 3 users).

I am able to reproduce this problem.

Can you open a ticket so that it's documented. We will have rest of
discussion on that ticket.

Notice that first one works fine but the second one fails.

# works fine
puts Project.includes(:tasks).where('tasks.id is not
null').to_xml(:include => {:tasks => {:include => :users}})

# causes duplicates
puts Project.includes(:tasks => :users).where('tasks.id is not
null').to_xml(:include => {:tasks => {:include => :users}})

Ticket:
https://rails.lighthouseapp.com/projects/8994-ruby-on-rails/tickets/4971-nested-includes-in-ar-causes-duplicates-in-to_json-to_xml