Excluding/Lazy loading BLOB fields from normal Rails ActiveRecord operations

Dear Rails enthusiasts,

I have database that has several blob fields. Some of them aren’t exactly photos and I really want to keep them in the database, but some of them are quite “huge” (several hundred Ks).

I can’t use http://www.kanthak.net/opensource/file_column/ plugin because it uses files and I want to store them in the DB.

The main (or the only) problem is performance.

Whenever I do a find, or SomeOtherModel.find( … :include => ‘modelcontainingblob’)… The query will be soooooo slow as expected. Not much because of a lot of rows but mainly because of the unnecessary blob columns loaded.

I’ve written an article about this is http://web2withrubyonrails.gauldong.net/2007/01/18/lazy-loading-blobs-in-active-record/ , the main solutions being

  1. move the blob fields to a separate table
  2. use views

After some time, it seems both approaches have complicated downsides that I can’t bear for a problem “so simple”.

There is this find :select but it seems that I have to use it everywhere, and it can’t be used by find :include (by other models) and I don’t really know what other methods use find.

What I want is that the default find for the model will exclude the BLOB fields (or any fields I specify). And only load/save them if I explicitly request them (probably upon first access). This will mean an increase in the number of queries if I want to access all the BLOB fields on a table/model, but will drastically increase the performance of loading the whole record/records.

I did do some research on Mr. Google and the rails groups and haven’t got satisfactory explanation.

Please help. Thank you.

I have database that has several blob fields. Some of them aren't exactly photos and I really want to keep them in the database, but some of them are quite "huge" (several hundred Ks).

I can't use http://www.kanthak.net/opensource/file_column/ plugin because it uses files and I want to store them in the DB.

The main (or the only) problem is performance.

Whenever I do a find, or SomeOtherModel.find( ....... :include => 'modelcontainingblob')... The query will be soooooo slow as expected. Not much because of a lot of rows but mainly because of the unnecessary blob columns loaded.

I've written an article about this is http://web2withrubyonrails.gauldong.net/2007/01/18/lazy-loading-blobs-in-active-record/, the main solutions being 1. move the blob fields to a separate table 2. use views

After some time, it seems both approaches have complicated downsides that I can't bear for a problem "so simple".

There is this find :select but it seems that I have to use it everywhere, and it can't be used by find :include (by other models) and I don't really know what other methods use find.

What I want is that the default find for the model will exclude the BLOB fields (or any fields I specify). And only load/save them if I explicitly request them (probably upon first access). This will mean an increase in the number of queries if I want to access all the BLOB fields on a table/model, but will drastically increase the performance of loading the whole record/records.

Hrm.. seems that if you don't want them loaded by default then a separate table is the way to go. If you're worried about "table clutter" maybe a mix of STI and polymorphism would help...

class Blob < ActiveRecord::Base    belongs_to :blobable, :polymorphic => true end

class Image < Blob end

class FlashVideo < Blob end

# ... etc ...

class MyThing < ActiveRecord::Base    has_one :image, :as => :blobable end

class MyOtherThing < ActiveRecord::Base    has_one :image, :as => :blobable    has_one :flash_video, :as => :blobable end

-philip

Thank you Philip :slight_smile: That’s a nice workaround indeed.

The blob table(s) would have to be having a lot of records, each doesn’t have any intrinsic meaning as a “record”, or a blob record might have multiple blob columns… that might introduce yet another performance problem as all columns are loaded at once by ActiveRecord. Not to mention that the database become incohesive as now there isn’t a 1-1 association between a model and a table (there are “dummy” blob tables). :frowning:

I’m looking for a more elegant solution like Propel’s (PHP) lazyLoad. Propel is a high-level DBA layer that I used when I was developing with PHP. It’s solution is the cleanest I’ve ever seen:

http://propel.phpdb.org/trac/browser/branches/1.3/generator/projects/bookstore-packaged/media.schema.xml?rev=505

1

<?xml version="1.0" encoding="ISO-8859-1" standalone ="no"?>

2

3

4 5

6

7

[8](http://propel.phpdb.org/trac/browser/branches/1.3/generator/projects/bookstore-packaged/media.schema.xml?rev=505#L8) [15](http://propel.phpdb.org/trac/browser/branches/1.3/generator/projects/bookstore-packaged/media.schema.xml?rev=505#L15) [20](http://propel.phpdb.org/trac/browser/branches/1.3/generator/projects/bookstore-packaged/media.schema.xml?rev=505#L20) [25](http://propel.phpdb.org/trac/browser/branches/1.3/generator/projects/bookstore-packaged/media.schema.xml?rev=505#L25) [30](http://propel.phpdb.org/trac/browser/branches/1.3/generator/projects/bookstore-packaged/media.schema.xml?rev=505#L30) [31](http://propel.phpdb.org/trac/browser/branches/1.3/generator/projects/bookstore-packaged/media.schema.xml?rev=505#L31) [34](http://propel.phpdb.org/trac/browser/branches/1.3/generator/projects/bookstore-packaged/media.schema.xml?rev=505#L34) [35](http://propel.phpdb.org/trac/browser/branches/1.3/generator/projects/bookstore-packaged/media.schema.xml?rev=505#L35)
[36](http://propel.phpdb.org/trac/browser/branches/1.3/generator/projects/bookstore-packaged/media.schema.xml?rev=505#L36)

37 Alright it uses XML which we all used to hate, but the lazyLoad semantic is very useful here and I used it pretty often.

This example uses postgres views to exclude the fields to lazy load - then goes back to the full table when pulling out the specific fields.

http://web2withrubyonrails.gauldong.net/2007/01/18/lazy-loading-blobs-in-active-record/

basically the same as Philip’s solution - but requires one table and one view, postgres (and is pretty messy!)

As well there was talk about an AR extension to ignore certain attributes - but I don’t know what happened to it - I’ve looked around trac but couldn’t find it.

Cheers,

Jodi

General Partner

The nNovation Group inc.

www.nnovation.ca/blog

on-innovation.gif

Dear Jodi,

This example uses postgres views to exclude the fields to lazy load - then goes back to the full table when pulling out the specific fields.

http://web2withrubyonrails.gauldong.net/2007/01/18/lazy-loading-blobs-in-active-record/

basically the same as Philip’s solution - but requires one table and one view, postgres (and is pretty messy!)

Thanks Jodi. I know about that that. I wrote it. :slight_smile: I thought that was good, but, it turns out the views just complicate things further than separate tables. (views ARE good. updatable views + Rails ActiveRecord = very uglily bad, at least in my experience)

As well there was talk about an AR extension to ignore certain attributes - but I don’t know what happened to it - I’ve looked around trac but couldn’t find it.

Really? It does seem that nobody thought this is important. I think using the wrong database design pattern here… :frowning:

What is everybody using in this case? It seems most people just use separate tables (which is easiest, but cluttering the DB) and files (which is much more complex, and helped by existence of Mr. Kanthak’s file_column plugin).

I don’t feel like writing a plugin just for this… especially not to hack ActiveRecord. :frowning: Maybe I’ll have to surrender to separate tables…

lol. the web is so small.

how about find_by_sql?

or check out this ticket : http://dev.rubyonrails.org/ticket/2546

summary, using find(:select), then catch the MissingAttribute exception to lazily load the attribute when object.blob is accessed.

I’d likely take that route.

Jodi

Hrm.. seems that if you don't want them loaded by default then a separate table is the way to go. If you're worried about "table clutter" maybe a mix of STI and polymorphism would help...

class Blob < ActiveRecord::Base    belongs_to :blobable, :polymorphic => true end

class Image < Blob end

class FlashVideo < Blob end

# ... etc ...

class MyThing < ActiveRecord::Base    has_one :image, :as => :blobable end

class MyOtherThing < ActiveRecord::Base    has_one :image, :as => :blobable    has_one :flash_video, :as => :blobable end

-philip

Thank you Philip :slight_smile: That's a nice workaround indeed.

The blob table(s) would have to be having a lot of records, each doesn't have any intrinsic meaning as a "record", or a blob record might have multiple blob columns... that might introduce yet another performance problem as all columns are loaded at once by ActiveRecord. Not to mention that the database become incohesive as now there isn't a 1-1 association between a model and a table (there are "dummy" blob tables). :frowning:

There's only *one* extra table... "blobs". Everything else is typed to that table. And there's no load issue until you do "mything.image"...

And has_many works exactly the same as has_one.

We have 40,000 rows in our "blob" table, and while we don't store the blogs in the database itself, we do use it to store all our various types of media... images, flash, wmv, rm, flv, icons, logos, etc... what's nice is that for icons/logos those inherit from images, but semantically we can treat them different.

-philip

Thanks Philip… By your experience in 40,000 rows, it seems that this is the way to go.

As you said, the typing will come in handy in some situations. There are some gripes I feel about this concept, which is decohesiveness (the actual model and its blobs reside in separate tables) and coupling (the blob table contains lots of data that doesn’t have any meaning, and is coupled with all other tables).

What I wonder is how I’d do this if I have several blob fields in a model having the same “type”? Do I really have to create several different [dummy] classes for this in the blob table?

For example, if Product has “blob fields” left_image, top_image, front_image, that means I have to define LeftImage, TopImage, and FrontImage classes just for the purpose of identifying them in the blob table? (so each product will have 3 records in blob table)

Or I just use a single Image class then use has_many? (so each product will have arbitrary number of records in blob table)

…or both approaches? (each product will have arbitrary blobs each with its own types)

I’m curious about your experience.

Yes, Jodi… find :select is exactly what I want… except that it doesn’t come ‘by default’. (I wanted to put this :select into the model itself, so everybody using find without specifying columns explicitly, will use the default columns I set that omits the BLOBs)

With find :include, matters get worse… since :include doesn’t allow you to specify anything on the “inner find”.

I really would love to have this approach, but for now, Rails seems to force me into hacking it when I have to use this approach. (having to find&replace manually all .find using :select, then consciously avoiding :include … dunno what else )

plugins, plugins, plugins… Where are you when I need one? :wink:

There's only *one* extra table... "blobs". Everything else is typed to that table. And there's no load issue until you do "mything.image"...

And has_many works exactly the same as has_one.

We have 40,000 rows in our "blob" table, and while we don't store the blogs in the database itself, we do use it to store all our various types of media... images, flash, wmv, rm, flv, icons, logos, etc... what's nice is that for icons/logos those inherit from images, but semantically we can treat them different.

-philip

Thanks Philip... By your experience in 40,000 rows, it seems that this is the way to go.

Keep in mind, we don't store the blob data itself in the database... just meta-data about it (file size, url, width, height, duration, etc.)

As you said, the typing will come in handy in some situations. There are some gripes I feel about this concept, which is decohesiveness (the actual model and its blobs reside in separate tables) and coupling (the blob table contains lots of data that doesn't have any meaning, and is coupled with all other tables).

What I wonder is how I'd do this if I have several blob fields in a model having the same "type"? Do I really have to create several different [dummy] classes for this in the blob table?

Depends on if you want to uniquely identify them.

For example, if Product has "blob fields" left_image, top_image, front_image, that means I have to define LeftImage, TopImage, and FrontImage classes just for the purpose of identifying them in the blob table? (so each product will have 3 records in blob table)

I would create the 3 separate classes myself. That makes it easy to refer to them in your product views since you're going to want one of them specifically.

Or I just use a single Image class then use has_many? (so each product will have arbitrary number of records in blob table)

Do this when your products have a series of screenshots (for example) and you don't care what order they come back in.

I'm curious about your experience.

Do both, depending on the need. I recently did a project that had a bunch of articles that belonged to a category. The category had a Logo and an Icon. The article had a PreferredImage and had many Images.

All of those classes were inherited from the Image class so shared in the common methods we have to manipulate images (like thumbnail generation).

-philip

Thank you Philip, your advices are so useful. I’m beginning to consider the merits of having separate tables.

Any plugins you’d like to suggest? (image- or blob- or polymorph- related)

Is there any talk of adding lazy loaded attributes to model objects? I think this is a fundamental piece of functionality missing from ActiveRecord that ORM frameworks like Hibernate and EJB3 have.

If you think so, write a plugin that implements this. As a starting point, I suggest you prototype this functionality by

- using #find(:all, :select => '...') - explicitly implementing accessors for the lazy attributes which load them when needed

Then, if you're satisfied it's actually worth it, look at ActiveRecord::Base#construct_finder_sql. Making it work with included associations will be a bit trickier, however. Nonetheless, I'm sure you will find help in this group.

Michael

Any plugins you'd like to suggest? (image- or blob- or polymorph- related)

Nope. We're not actually using any for this project.

In another project though I really like flex_image for it's ability to resize/dropshadow/watermark and then cache to disk...

-philip

Thank you Philip. I’ll check that out… :slight_smile:

Thank you Philip. I’ll check that out. :slight_smile:

Dear Mario,

Hendy,

Regardless of the issue related to lazy loading of BLOBs have you hit or seen this issue:

http://www.ruby-forum.com/topic/109387

I have not seen anyone with an answer to this. Thanks Mario

I’m less of a Rails ‘hacker’ lately.

Considering the rapid development of Rails and its plugins… I would avoid jumping the hoops just to force making the DBA happy.

100 MB BLOB column? Come on, just use the filesystem. Somebody has heard of S3.

It’s not a “solution” to the real problem. But it’s an alternative that I think should be considered with great care.

There are lots of ups about Rails’ ActiveRecord model, and that’s where it shines. But there are pitfalls that make us have to ‘abandon’ old time-tested DB practices.

For example, rather than making a single tabel span tens of columns and having multiple bloated ones… I now prefer narrow tables. And related models just by using has_one relationship. Relationships are easy in Rails. Sometimes much easier than limiting the returned columns using ‘SELECT’!

With proper indexes and good DBMS I think this can result in more maintainable app while not reducing performance.