Implementation of Sets in AR

I am at the very early design stages of a web service to implement a "booking engine". I want it to be a generic in so far as the resource being booked could be anything, a tennis court, hotel room, ski chalet or airline seat. It's up to the client (RoR) application calling the webservice to determine what the resource is, who can book it etc etc.

so really all I need is "booking" and "availability" where availability is a set of resources, and a booking is a subset (not necessarily contiguous) of resources. But the point is that a resource is just a unique numeric key, it has no other attribute. So it would be terrific to implement the webservice schema using set theory in Ruby.

require 'set' require 'yaml' availability = Set.new [1,2,3,4,5,6,7,8,9,10,11,12,14] booking = Set.new(4..8).to_a

and then be able to use the standard set operators to check there is availability (is a subset) and all of the others.

So what I need is to store a set of integers as a set within the database (in a column?) to represent a set of resources. That could be done by serializing the Set object but then I lose the ability of database queries (in particular indexing performance gains) to do something like, show me all of the bookings of the resources [11, 14, 16], which is of course another set.

But a relational database is just set theory in disguise I hear you say! But surely my webservice does not need to have a table called Resource with a column called application_id and a column called resource_id (and the extra id created by Active Record).

My gut feel is that I am approaching this in the wrong way. Perhaps Active Record is the wrong approach. I have had a look at Nested Sets but that isn't the functionality I am after. Or am I?

Does anyone have any suggestions as to a sensible design approach?

O.

Owain wrote:

I am at the very early design stages of a web service to implement a "booking engine". I want it to be a generic in so far as the resource being booked could be anything, a tennis court, hotel room, ski chalet or airline seat. It's up to the client (RoR) application calling the webservice to determine what the resource is, who can book it etc etc.

so really all I need is "booking" and "availability" where availability is a set of resources, and a booking is a subset (not necessarily contiguous) of resources. But the point is that a resource is just a unique numeric key, it has no other attribute. So it would be terrific to implement the webservice schema using set theory in Ruby.

require 'set' require 'yaml' availability = Set.new [1,2,3,4,5,6,7,8,9,10,11,12,14] booking = Set.new(4..8).to_a

and then be able to use the standard set operators to check there is availability (is a subset) and all of the others.

So what I need is to store a set of integers as a set within the database (in a column?) to represent a set of resources.

No, you probably don't.

That could be done by serializing the Set object but then I lose the ability of database queries (in particular indexing performance gains) to do something like, show me all of the bookings of the resources [11, 14, 16], which is of course another set.

But a relational database is just set theory in disguise I hear you say! But surely my webservice does not need to have a table called Resource with a column called application_id and a column called resource_id (and the extra id created by Active Record).

No, I think it does. You *want* each resource to be a separate object in your application, and you *want* each resource to be a separate object in your DB, I think. That will give you the queries you mention above, and keep each resource individually addressable.

Why do you think you don't need to do this? It seems like the best way to me.

My gut feel is that I am approaching this in the wrong way.

You absolutely are. You should be using the relational model for what it's good at, not trying to defeat it.

Perhaps Active Record is the wrong approach.

ActiveRecord is the right approach. Serialized arrays are the wrong approac.

I have had a look at Nested Sets but that isn't the functionality I am after. Or am I?

No. Those are for tree structures.

Does anyone have any suggestions as to a sensible design approach?

Yes! Keep each resource as a separate record.

O.

Best,

Marnen

Thank you for taking the trouble to reply.

Owain wrote:

> That could > be done by serializing the Set object but then I lose the ability of > database queries (in particular indexing performance gains) to do > something like, show me all of the bookings of the resources [11, 14, > 16], which is of course another set.

> But a relational database is just set theory in disguise I hear you > say! But surely my webservice does not need to have a table called > Resource with a column called application_id and a column called > resource_id (and the extra id created by Active Record).

No, I think it does. You *want* each resource to be a separate object in your application, and you *want* each resource to be a separate object in your DB, I think. That will give you the queries you mention above, and keep each resource individually addressable.

Why do you think you don't need to do this? It seems like the best way to me.

The thing is that there is no other attribute of the model needed by the booking engine other than 'id'. So imagine that the resources being booked were coloured pencils. A booking would be for red, blue and green. There is no need to have another table for colours. The availability object would hold all of the available colours. I don't really see the point of the having a table with the following rows:

application_id, resource pencil_booker, red pencil_booker, orange pencil_booker, yellow pencil_booker, green pencil_booker, blue pencil_booker, indigo pencil_booker, violet pencil_booker, dayglow_yellow

And maybe this is the difference, this table DOES exist in another form in the RoR application that calls the webservice. So the calling service knows that orange has an id of 1234567 and the booking engine sees that as a resource available and for booking. The booking service is not domain specific, it just implements some booking logic. Taking a real example, using our traditional approach I would need to set up a the resource table with a row for every day that is made available (using its Julian form as an id) with a HABTM relationship through an associative entity table just to be able to be able to find all bookings for a particular day or week?

A sledgehammer to crack a nut?

O.

Owain wrote:

Marnen

Thank you for taking the trouble to reply.

Owain wrote:

No, I think it does. �You *want* each resource to be a separate object in your application, and you *want* each resource to be a separate object in your DB, I think. �That will give you the queries you mention above, and keep each resource individually addressable.

Why do you think you don't need to do this? �It seems like the best way to me.

The thing is that there is no other attribute of the model needed by the booking engine other than 'id'. So imagine that the resources being booked were coloured pencils. A booking would be for red, blue and green. There is no need to have another table for colours. The availability object would hold all of the available colours.

...but wouldn't tell you anything about the unavailable colors, and wouldn't easily let you query the availability of an individual color.

I don't really see the point of the having a table with the following rows:

application_id, resource pencil_booker, red pencil_booker, orange pencil_booker, yellow pencil_booker, green pencil_booker, blue pencil_booker, indigo pencil_booker, violet pencil_booker, dayglow_yellow

The point is that each resource really is a different entity in your application's terms, so it should be treated like one.

And maybe this is the difference, this table DOES exist in another form in the RoR application that calls the webservice. So the calling service knows that orange has an id of 1234567 and the booking engine sees that as a resource available and for booking. The booking service is not domain specific, it just implements some booking logic.

Then perhaps the booking service shouldn't store resources at all, but should just call the actual agency (say, the pencil box) responsible for reservations and return a response code.

Taking a real example, using our traditional approach I would need to set up a the resource table with a row for every day that is made available (using its Julian form as an id) with a HABTM relationship through an associative entity table just to be able to be able to find all bookings for a particular day or week?

No. You probably wouldn't want to consider a day to be a resource. Rather, each booking should have a start time and and an end time.

A sledgehammer to crack a nut?

No, just poor data modeling.

O.

Best,

Owain wrote: > Marnen

> Thank you for taking the trouble to reply.

>> Owain wrote:

>> No, I think it does. You *want* each resource to be a separate object >> in your application, and you *want* each resource to be a separate >> object in your DB, I think. That will give you the queries you mention >> above, and keep each resource individually addressable.

>> Why do you think you don't need to do this? It seems like the best way >> to me.

> The thing is that there is no other attribute of the model needed by > the booking engine other than 'id'. So imagine that the resources > being booked were coloured pencils. A booking would be for red, blue > and green. There is no need to have another table for colours. The > availability object would hold all of the available colours.

...but wouldn't tell you anything about the unavailable colors, and wouldn't easily let you query the availability of an individual color.

> I don't > really see the point of the having a table with the following rows:

> application_id, resource > pencil_booker, red > pencil_booker, orange > pencil_booker, yellow > pencil_booker, green > pencil_booker, blue > pencil_booker, indigo > pencil_booker, violet > pencil_booker, dayglow_yellow

The point is that each resource really is a different entity in your application's terms, so it should be treated like one.

> And maybe this is the difference, this table DOES exist in another > form in the RoR application that calls the webservice. So the calling > service knows that orange has an id of 1234567 and the booking engine > sees that as a resource available and for booking. The booking > service is not domain specific, it just implements some booking > logic.

Then perhaps the booking service shouldn't store resources at all, but should just call the actual agency (say, the pencil box) responsible for reservations and return a response code.

It does need to track booked resources for the yield management functionality of the booking engine. i.e. it does dynamic price calculation based on availability.

> Taking a real example, using our traditional approach I would need to > set up a the resource table with a row for every day that is made > available (using its Julian form as an id) with a HABTM relationship > through an associative entity table just to be able to be able to find > all bookings for a particular day or week?

No. You probably wouldn't want to consider a day to be a resource. Rather, each booking should have a start time and and an end time.

That doesn't cleanly work for non-contiguous periods, which I need.

> A sledgehammer to crack a nut?

No, just poor data modeling.

I think we can easily fall into the trap of confusing data modelling with foisting a data model onto a database implementation. There is a logical model and there is the normalised technical model. I am really at the logical model stage where a resource is just a "thing" and that's all I need to track in the booking engine. It has no other attributes. It looks like MongoDB has support for arrays and therefore sets where you can query against arrays. See http://railstips.org/blog/archives/2009/12/18/why-i-think-mongo-is-to-databases-what-rails-was-to-frameworks/ Picking the right tool for the logical design is not always possible but at this stage I am lucky enough to have a blank(ish) sheet of paper.

I'll put some more effort into the de-normalised approach and see where I get to.

O.

O.

Owain wrote:

>> object in your DB, I think. That will give you the queries you mention

> pencil_booker, green > And maybe this is the difference, this table DOES exist in another > form in the RoR application that calls the webservice. So the calling > service knows that orange has an id of 1234567 and the booking engine > sees that as a resource available and for booking. �The booking > service is not domain specific, it just implements some booking > logic.

Then perhaps the booking service shouldn't store resources at all, but should just call the actual agency (say, the pencil box) responsible for reservations and return a response code.

It does need to track booked resources for the yield management functionality of the booking engine. i.e. it does dynamic price calculation based on availability.

Then it at least needs to store a resource count. Perhaps that's *all* it needs to store, not a separate record for each resource or an array of resource IDs?

> Taking a real example, using our traditional approach I would need to > set up a the resource table with a row for every day that is made > available (using its Julian form as an id) with a HABTM relationship > through an associative entity table just to be able to be able to find > all bookings for a particular day or week?

No. �You probably wouldn't want to consider a day to be a resource. Rather, each booking should have a start time and and an end time.

That doesn't cleanly work for non-contiguous periods, which I need.

It can, with some further tricks.

Can you get into more detail about your use case? It seems like every time I try to simplify your data model, we discover a new wrinkle...

> A sledgehammer to crack a nut?

No, just poor data modeling.

I think we can easily fall into the trap of confusing data modelling with foisting a data model onto a database implementation. There is a logical model and there is the normalised technical model.

The two are interdependent.

I am really at the logical model stage where a resource is just a "thing" and that's all I need to track in the booking engine.

If it is a "thing", it needs a record IMHO.

[...]

I'll put some more effort into the de-normalised approach and see where I get to.

It is the wrong approach. It will be harder to query and will not scale. Don't waste any more time on it.

O.

O.

Best,