Finding Child Records Efficiently

I have a family-tree structure to a person model.

I want to be able to find people by specifying an array that corresponds to the family tree. So if Abe is grandpa, homer is dad and bart is the son, bart's array would be [abe,homer,bart].

To find bart, I can't just use find_by_name("bart") as there might be another person called bart (with differnt parents and therefore a different array). So, to find this particular bart, I want to find abe first, then find homer, then bart.

How can I do an ititial database query of find_by_name("abe") that will find the "abe" record as well as all his descendant records, so that any subsequent filtering is efficient and doesn't hit the database?

Hope that makes sense, cheers

DAZ

DAZ wrote:

I have a family-tree structure to a person model.

I want to be able to find people by specifying an array that corresponds to the family tree. So if Abe is grandpa, homer is dad and bart is the son, bart's array would be

To find bart, I can't just use find_by_name("bart") as there might be another person called bart (with differnt parents and therefore a different array). So, to find this particular bart, I want to find abe first, then find homer, then bart.

How can I do an ititial database query of find_by_name("abe") that will find the "abe" record as well as all his descendant records, so that any subsequent filtering is efficient and doesn't hit the database?

Efficient retrieval of all descendants or all ancestors of a record requires the addition of nested-set capability.

One alternative is to make the ancestor array a string key to each record ("abe|homer|bart"), allowing instant retrieval.

Another would be to build the sql iteratively:

ancestors = %w(abe homer bart).reverse conds = {"people.name" => ancestors.shift} joins = nil ancestors.each_with_index do |name, i|    joins = joins.nil? ? :parent : {:parent => joins}    conds.update("#parents_people#{'_' + (i+1).to_s if i!=0}.name" => name) end

bart = Person.find(:first, :conditions => conds, :joins => joins)

DAZ wrote:

I have a family-tree structure to a person model.

I want to be able to find people by specifying an array that corresponds to the family tree. So if Abe is grandpa, homer is dad and bart is the son, bart's array would be [abe,homer,bart].

Your requirement probably exceeds ActiveRecord's DSL.

Even if you did this...

   class Person      has_many :children      has_many :grand_children, :through => :children

(with the :class_name => Person and whatnot)

...you cannot stitch them together in a find statement. You could of course :include => [:children, :grand_children], but then there's no way to distinguish from three instances of the same table in your :conditions fields. You must hit either 'name' or 'persons.name', but then 'persons' is ambiguous.

And even if all that worked, you then can't handle a variable length array.

The best bet is to count the elements in the array, then build a :joins string that pulls in and numbers the persons table that number of times. Sketch what you need in raw SQL first, such as in a query editor. Then build a :conditions string (possibly using named replacements - :conditions => ['yack yack', {:name1 => name[1], :name2 => name[2], etc}]) that stamps in each name.

One alternative is to make the ancestor array a string key to each record ("abe|homer|bart"), allowing instant retrieval.

This seems like a relatively good idea, could have a string-key called family_tree or something and just do find_by_family_tree("abe|homer| bart") This doesn't quite feel right - it seems like the only info you should need to keep is a person's parent (from which you can then find their parent and so forth). It might also lead to some very long strings eventually!

Another would be to build the sql iteratively:

This is what I'm doing at the moment - I'm using the "betternestedset" plugin, so have access to a "children" method that returns an arrary of children. Will this have all been pre-fetched efficiently? And if so, is the iterative code the way to do it?

   tree = ["abe","homer","bart"]     person = Person.find_by_name(tree[0])       if tree.size > 1         1.upto(tree.size - 1) do |i|           person = person.children.find { |child| child.name == tree [i] }         end       end     @person = person

Thanks for the help,

DAZ

DAZ wrote:

One alternative is to make the ancestor array a string key to each record ("abe|homer|bart"), allowing instant retrieval.

This seems like a relatively good idea, could have a string-key called family_tree or something and just do find_by_family_tree("abe|homer| bart") This doesn't quite feel right - it seems like the only info you should need to keep is a person's parent (from which you can then find their parent and so forth). It might also lead to some very long strings eventually!

I tend to like this idea as well, and is probably what I would do. The caching of the ancestry would seem to be the most efficient. However, there is one change I would make. I would reverse the ancestry so it would be ("bart|homer|abe"). Doing this would make string comparisons more efficient because you are looking for "bart" so having "bart" at the beginning of the string may slightly increase the string comparison efficiency.

This doesn't quite feel right - it seems like the only info you should need to keep is a person's parent (from which you can then find their parent and so forth). It might also lead to some very long strings eventually!

This, however, I don't agree with. You need the full ancestry to whatever the "root" objects is. The idea is to make one comparison with no joins. The only way to do that I can think of is to have all the information available at the row level.

If you end up with a very deep hierarchy, which would result in very long strings, you could consider hashing the strings. This way you would have a consistent length string to compare.

Example MD5 Hash: bart>homer>abe = a92b11363ef020716f1ce3104e0cb0d8 (32 chars) bart>homer>abe>john>william>ted>sam>joeseph>adam>bart>bill>jack = ff2f4ad181a029935b89a36c7cd1dfbe (32 chars)

Obviously in some cases your losing rather than gaining, but at least the string is consistent in length no matter how long the input string becomes.

DAZ wrote:

One alternative is to make the ancestor array a string key to each record ("abe|homer|bart"), allowing instant retrieval.

This seems like a relatively good idea, could have a string-key called family_tree or something and just do find_by_family_tree("abe|homer| bart") This doesn't quite feel right - it seems like the only info you should need to keep is a person's parent (from which you can then find their parent and so forth). It might also lead to some very long strings eventually!

It does require maintenance: whenever a name changes you have to update the string in all children and all ancestors.

Another would be to build the sql iteratively:

This is what I'm doing at the moment - I'm using the "betternestedset" plugin, so have access to a "children" method that returns an arrary of children. Will this have all been pre-fetched efficiently? And if so, is the iterative code the way to do it?

   tree = ["abe","homer","bart"]     person = Person.find_by_name(tree[0])       if tree.size > 1         1.upto(tree.size - 1) do |i|           person = person.children.find { |child| child.name == tree [i] }         end       end     @person = person

First-generation children are efficiently retrieved in both better_nested_set and acts_as_tree via the parent key. better_nested_set also allows you to efficiently retrieve all generations of children using the all_children method.

So your current method will be making one DB call per generation. And you should put the name match into the SQL conditions rather than in a Ruby loop.

But have a look at the code in my last post. It can find the correct Bart using only one DB call by matching up the unique ancestors chain.

Thanks or these replies Robert and Mark!

I guess that keeping the ancestors as a string in the database makes more sense than I first thought. It is interesting that Robert refers to this as 'caching' the family-tree, which obviously it is, but I've not really thought of caching information like this (ie database level relationships) before. Consequently, as Mark says it needs to be updated whenever a name changes, which shouldn't be too difficult to do (cycle through all the descendants and update that row?).

This has the added advantage of acting as a sort of permalink row as well. Thinking about pretty urls, it wouldn't be too hard to use this string to generate:

http://appname/people/abe/homer/bart

This would be the url that took you directly to Bart's url. Could I use the '/' character as a separator at the DB level?

Robert - I really the idea of hashing the string to limit the length, I've never thought of this as a use for hashing before, thanks!

Mark - I'm not sure what you mean by this:

you should put the name match into the SQL conditions rather than in a Ruby loop.

I guess that you mean to use SQL rather than looping through ruby arrays, but I'm afraid my SQL is very poor - I usually only use the Rails helpers to do basic stuff. I'm struggling to follow your example, but I guess that it corresponds to what Philip was alluding to by saying to construct a joins string?

......And if I used this SQL method, which would be the most efficient? I'm erring towards the 'cached' string method at the moment as it easily allows uniqueness validations and as I mentioned could be used to form pretty urls.

Thanks so much again to both of you - I'm really learning lots more than I anticipated from this thread.

cheers,

DAZ

In a tree-based model, you have some tradeoffs if you want to retrieve multiple values efficiently. Usually this is an insert-time vs query- time tradeoff. For example, if you wanted to keep track of the number of descendants of any particular node, you would either have to traverse the node's subtree to count all the children every time you wanted that information, or whenever you insert a child, you add one to the parent's descendant-count field, then the grandparent's, and so on, eventually bubbling this information up to the root. This way every node is guaranteed to have an up-to-date count.

Your situation is similar. You can add a descendants field containing an array of descendant IDs. Using the built-in activerecord serialization mechanism:

class Person < ActiveRecord::Base   serialize :descendants   ... end

Every time you create a new Person, all you need to do is

  descendants = new_person.parent.descendants   descendants << new_person.parent.id

Of course, this will be a bit more complex if you want to include dependants from multiple parents, but the idea is the same. Now every person includes a list of descendant IDs, and you'll need only a single query.

-- Mark.

I forgot you wanted actual names instead of IDs, but the same principle applies. You can serialize strings in the same fashion. Also, in my previous post I said 'descendants', when really 'ancestors' is more appropriate.

-- Mark.

DAZ wrote:

I guess that keeping the ancestors as a string in the database makes more sense than I first thought. It is interesting that Robert refers to this as 'caching' the family-tree, which obviously it is, but I've not really thought of caching information like this (ie database level relationships) before. Consequently, as Mark says it needs to be updated whenever a name changes, which shouldn't be too difficult to do (cycle through all the descendants and update that row?).

Yes, you could use a recursive (tree) or batch (nested-set) traverse. Or could actually do it in one statement, like

os = sanitize("/#{old_name}/") ns = sanitize("/#{new_name}/") Person.update_all "key = replace(key, #{os}, #{ns})",                   ['key like ?', "/path/to/#{old_name}/%"]

This has the added advantage of acting as a sort of permalink row as well. Thinking about pretty urls, it wouldn't be too hard to use this string to generate:

http://appname/people/abe/homer/bart

This would be the url that took you directly to Bart's url. Could I use the '/' character as a separator at the DB level?

Yes, good idea.

Mark - I'm not sure what you mean by this:

you should put the name match into the SQL conditions rather than in a Ruby loop.

I guess that you mean to use SQL rather than looping through ruby arrays, but I'm afraid my SQL is very poor - I usually only use the Rails helpers to do basic stuff. I'm struggling to follow your example, but I guess that it corresponds to what Philip was alluding to by saying to construct a joins string?

person.children.find(:conditions => ['name = ?', name])

Mark Reginald James wrote:

os = sanitize("/#{old_name}/") ns = sanitize("/#{new_name}/")

Whoops, you'd need to add the prefix /path/to on these to ensure you only replace the name in the correct context.