xml-containing models

Because the definitions of certain objects in our models are growing
out of control, we're looking into the possibility of simply defining
some of our objects in XML, rather than using many models and having
to use complicated join queries that make our application quite heavy.

Basically, we're defining machines that have many, many attributes.
Not all machines use the same attribute definitions and some of them
are very different from other machines, but we want to define them all
in one "root" model called "machine". Our application builds "bigger
machines" using these smaller machines, so we want the application to
know specifically what it can and cannot do. In the basic MVC
structure, this would mean we would have to have many models and they
all would be linked one way or another. Putting them together gets you
all you need to know about the machine.

What we would like to do however is define our machines like this:
<Machine>
  <colour>Red</colour>
  <width>200</width>
  <height> ...
  ...
  <touchpad>
    <line id="1">
      <button id="1">Power switch</button>
      <button id="2">Reset switch</button>
    </line>
  </touchpad>
  <bcu type="built-in">1029</bcu>
</Machine>

This is, of course, a simplified example.

We would like to have a model with simply an id (which is the
machine's id within the application) and a field to put this xml into.
So far so good, I of course know how to do this.

However, this would, as far as I can tell, limit the great
possibilities that Rails models have to offer. I would not be able to
simply say:

@machine = Machine.find(params[:id])
num_buttons = @machine.buttons.count
...

I would have to do something with @machine.xml instead and I'm not
very sure what.

So, my question is this: how can I put all that XML-data into a MySQL-
table and get information from it the smart way?

I'm pretty much stuck here. Is there anyone who has an idea? I can't
seem to find anything about this.

Hi,

I'd be inclined to do something with an index type table, that mapped
important bits of the XML into the DB. Maybe with machine_id,
attribute_type, attribute_name, attribute_value as the columns. It can be
a bit more work to keep it up, but you get the full flexibility of XML,
but can access the important attributes using the DB.

Then to get the button count as per your example you'd go

MachineIndex.count(:all, :conditions => {:machine_id => :id,
:attribute_type => 'button'}) or something similar.

Cheers
Simon

Thank you for your response, Simon!

It's an option, but I prefer an option in which I could actually use
one, readable definition of a machine in one single MySQL field.

I know I will have to map all those attributes some way or another,
but I can think of situations in which I do not only have <type
id=1>value</type>, but many more options. I'm just not sure how I can
map the xml-content correctly. It has to be pretty solid and
searchable. I want to be able to, for example, look for red machines
that have six buttons (simplified example, of course).

I know your way will work fine, but I think it would be beautiful to
do it this way, simply because the database will be very readable on
itself. I know that when searching for a machine with certain
attributes, performance will probably be alot higher doing it your
way, but I'm fine with that.

Don't. Just don't. Your database is not an XML parser; see this
article:

http://thedailywtf.com/Articles/JOIN_ON_WTF.aspx

Note that the WTF database described there is almost exactly what
you've described.

On a more serious note, I'd recommend that you take a look at some of
the schemaless databases, like CouchDB, for your situation. They'll
give you the flexibility to define your machines, with more ability to
query parts of the data.

--Matt Jones

But to the OP, I have another question: what exactly is getting out of
control about your schema? There's nothing wrong with having lots of
tables (and lots of model classes) if your data is best modelled that
way. Is this simply a "fear of adding classes" situation, or is there
some other reason for not wanting to do it that way?

Well... It's basically due to the fact that it's very hard to define
these machines. They all have certain limitations, possibilities,
functions that differ from each other quite a bit but are very similar
in another way. Apart from that, the client I'm building this system
for wants to be able to add functionality that has never been defined
before (e.g. 3 years from now, these machines will have evolved quite
a bit) without the need to call me. So we were looking at Zoo (http://
zoo.yootheme.com/) and saw that this was a very versatile way to do
it. However, other suggestions are always welcome and I greatly
appreciate the help!

The other thing is that we depend on MySQL for other parts of the
application.

We also have an application that, in some ways, does the same things
to the same data as we want to, but just isn't as powerful as our
application. We have (limited) access to its database. We've browsed
through it though and I'm very sure that's -not- the way we want to do
this. I'm just unsure about the right way.

The other option (and I doubt it is supported in Rails yet) is Oracle's,
XML support. It is limited, but allows you to do finds by xpath. We've
used it for a JEE project, and it can make this sort of thing a bit more
manageable.

Simon

I have been working with a project which has a similar sort of issue
where there are too many possible sorts of observations a doctor can
make on a scan to create fields for them all. One means of solving
this was to create a 'ObservationType' an ObservationData and
ObservationTypeData (join) table. (Its much much more this this, but
simplified is better). IIt is still easliy searched with the finders
and you can skip a lot of model logic.

Observation has many ObservationDatas through ObservationTypeDatas
(Join Table)
ObservationTypeData belongs to ObservationData, ObservationType
Observation has many ObservationTypes through ObservationDatas
ObservationData has one ObservationType

So, simply put, my user can create an observation type and apply it to
any observation.

You have the idea of a parent relationship going on a well, so you
will need another join table which joins a parent with a child say
Machine has many BiggerMachines, class = Machine, through
MachineParents
MachineParents (join table) belongs to Machine (fk child_id)
MachineParents (join table) belongs to Machine (fk paret_id)

Really this would only require 4 to 6 tables to give you parentchild
with dynamic atributes.

I use this a lot and it works very well when using flex itemreneders,
datgrids, and trees to show and update your data. I personally don't
think it wouldn't be a lot of fun to implement UI for this in ajax/js
but I'm sure its possible! Take a good look at the nested attributes,
or my addition of dangerousnestedattributes and maybe you will see
some ideas there too.

Good luck,
-Josh
http://jibwa.com

The other option (and I doubt it is supported in Rails yet) is Oracle's,
XML support. It is limited, but allows you to do finds by xpath. We've
used it for a JEE project, and it can make this sort of thing a bit more
manageable.

and a few other products/databases do this too eg DB2 pureXML and
marklogic. Probably no rails integration though.

Fred

Right, we've decided today that XML-parsing is not the way to go, as
long as my client can easily add functions to the machines. So I'm
combining all your ideas into a suitable idea (Josh's comments are
especially helpful) and got to this:

A model called "machine" simply to define the unique options of a
single machine:

:name
:product_code

A model called "machine_attribute" to define attributes that have some
overlap between different machines, linked to "machine" with a join
table called "machine_machine_attribute":

:type
:value

A model called "machine_constructor" to define parts of the machine,
for example a touch panel or a display, linked to "machine" with a
join table called "machine_machine_constructor":

:type

A model called "machine_constructor_child" to define the constructors,
for example buttons or the possibilities of a (set of) button(s),
linked to "machine_constructor" by a table
"machine_constructor_machine_constructor_child", but it can also only
be linked by its own parent_id for unlimited depth (this is very
important):

:type
:value
:parent_id
:position # needs to be sortable for consistency

This way we will be able to completely define a machine. Because we
need to have unlimited depth I couldn't go deeper than this. Every
machine will have attributes and have at least one constructor block,
but the constructor blocks are where machines differ from each other.
Some have multiple functions but no buttons, some only have a movement
sensor and some functionality, some constructors are smart and have
some sort of touch screen and so on.

Is this implementable? Say I would like to build the machine with
product_code "207" from the database: how would that work? I know I
will have to define associations for the models, but I'm unsure as to
how I can use the parent_id in the last model to descend or ascend.
For example, if a button is contained by a line and a line is the
root, linked to constructor.type "touch_panel". How do I get from that
constructor to its buttons? Will there be a method similar to this:

@buttons = MachineConstructorChild.find( ??? )

Thank you very much for reading this humongous post! I'm very eager to
hear what you have to say about the model structure as well.

Update. Again...

I've decided to deepen the machine_attribute model to include both
constructors and their children. The machine_attribute model is now
actually the same as my machine_constructor_child, but also has

In the XML-model example in the first post, I specified an attribute
called width. Its type would be "attribute", its name "width" and its
value "200". Through the model, I would set, for a machine:

has_many :machine_machine_attributes
has_many :machine_attributes, :through => :machine_machine_attributes

And for a machine_attribute:

has_many :machine_machine_attributes
has_many :machines, :through => :machine_machine_attributes
has_many :nested_attributes, :class_name => :machine_attribute

The type "attribute" in the width example may seem confusing: I'll
have to come up with a better name. What I mean is I want to be able
to specify whether an attribute is an actual single attribute or a
"special" one like colour, constructor (and inside constructor you'll
have the special ones called "line", "button", "function" et cetera)
or bcu_compatibility. A colour needs to be chosen (type=option,
name=colour, value=red), but a button has to be linked to an object
(e.g. conveyor_belt where id=1), so it's a droppable in my AJAX-layout
(type=droppable, name=button, value=nil). A constructor is a
container, just like a button line, which is no more than a way to be
able to seperate certain things from each other: it contains no real
information. A function is an option, again.

I hope this makes sense. Now my question: how to extend this model
logic to be able to use this in my controller without having to
descend manually into the machine? Is there any way I can tell my
controller (or model) how it should handle an attribute with
type=option or type=container or even type=attribute (which is no more
than a line stating information about a machine.

The objects with type=option need to be grouped by name, the
droppables need to be grouped by their container and containers need
to be grouped by their container or just listed by their positions.

I guess the nested_attributes thingy isn't the way to go, after trying
to build this. Does anyone have an idea?

I think you ought to go read about self-referential relationships...
Social networking is all about people who have friends who have blah
blah blah.

I'll look into it.

After all, a machine is built out of some number of components,
components have attributes and are built from some number of other
components or parts, and parts are, well parts (it has to end somewhere,
parts are components with no subsidiary components?).

Sure, it ends. But at this moment I can't say where it will end, which
is why I keep on stressing that theoretically, there should be endless
depth.

I once worked on a system where we modeled utility rates, and were
tasked with building 'the ultimately flexible rate model' engine.
Basically, a rate had a name and some attributes and N components. Each
Component had attributes and (possibly) some (sub-)components.

Well, that sounds like what I'm trying to do. However, I think our
situations differ in that I want certain attributes to behave
differently than other attributes.

Also, I need to be able to define a scope for the position value, so
that any product can be positioned (using Prototype's
sortable_element) within its containing element (parent element),
along with its children, but not outside its containing element. I
guess this is model logic, but I don't know where to start.