Increased memory requirements on 1.2

I just recently upgraded a rails app of mine to run on edge (and the
1-2-pre-release branch) and I noticed my fcgis required roughly 6-8MBs
more memory after just a couple requests.

For example, each fcgi on edge would start around 40MB and rise to
~46MBs after a couple requests.

I downgraded my app back to 1.1.6 and each fcgi would start at around
33MBs and rise to ~38MB.

As a result of the increased memory requirements, my app can no longer
be run on TextDrive using edge since they have a max resident memory
cap of 48MBs which I would quickly hit (and they just kill your fcgi
when you do).

Is this increased memory requirement to be expected?

I would be happy to give you more details about my setup if you think
it may have an impact on why the memory usage is so high in my case.

Thanks,

Tom Davies

http://atomgiant.com
http://gifthat.com

Is this increased memory requirement to be expected?

I would be happy to give you more details about my setup if you think
it may have an impact on why the memory usage is so high in my case.

I've had a sneaking suspicion about this for awhile. Even Beast, my
500 LOC forum starts off with about 40MB.

First I compared Beast on edge vs Beast on rev 4818 (edge rails at the
time of its first release). They both started at about 22MB, but
Beast on edge jumped to about 37MB after making a single request to 4
controllers. I then incremented the revision number until I started
seeing the same memory increases and found it started after changeset
5223, the one that introduced ActiveSupport::Multibyte. Anyone else
seeing similar results?

I ran my app using mongrels -B flag to see if I was leaking memory and
one thing I noticed was the second highest entry is related to
Multibyte:

ActiveSupport::Multibyte::Handlers::Codepoint, 17721

So, it would seem that multibyte may be the culprit.

Tom

Does redefining KCODE, or making String#chars just return self make
any difference?

Nope, the unicode tables are always loaded and frankly take up a lot of space. The reason they're always loaded is that we don't know in advance if someone is going to set KCODE to utf-8.

Manfred

there isn't some kind of ruby event that checks when a constant or a
variable gets assigned?
this way you could do your stuff at the moment KCODE gets set. but it
would not help much for user like me who set it almost all the time.

Couldn't we lazy load the UnicodeDatabase instance?

      # UniCode Database
      UCD = UnicodeDatabase.new

That way instead of chewing up that ram every time, we just chew it up
when using the utf8 handler?

Also, does the utf8_handler_proc implementation need UCD?

Yes, we can lazy load the database, I think that's the best solution...

The _proc implementation wouldn't need the UCD if all the unicode operations we need were implemented in utf8_proc, but unfortunately downcase is broken in utf8_proc at the moment. All the alternative unicode handlers are descendants of utf8_handler so we always have to pure Ruby fallback.

Manfred

Yes, we can lazy load the database, I think that's the best solution...

Awesome, care to make a patch?

The _proc implementation wouldn't need the UCD if all the unicode
operations we need were implemented in utf8_proc, but unfortunately
downcase is broken in utf8_proc at the moment. All the alternative
unicode handlers are descendants of utf8_handler so we always have to
pure Ruby fallback.

Either way, if KCODE is set to something other than 'u', then we
should probably ensure that people don't have to load the big
codepoints table. That way people with memory constrained
environments can still run.

Then if someone really cares about memory usage and String#chars, they
could improve one of the c-based handlers to completely replace the
pure-ruby fallback position.

Yes, we can lazy load the database, I think that's the best solution...

Awesome, care to make a patch?

Sure. I'll get one done before the end of the weekend.

The _proc implementation wouldn't need the UCD if all the unicode
operations we need were implemented in utf8_proc, but unfortunately
downcase is broken in utf8_proc at the moment. All the alternative
unicode handlers are descendants of utf8_handler so we always have to
pure Ruby fallback.

Either way, if KCODE is set to something other than 'u', then we
should probably ensure that people don't have to load the big
codepoints table. That way people with memory constrained
environments can still run.

I agree. I didn't notice it took that much memory. I never expected that a marshaled file of 600k could grow to more than 15 Mb in Ruby.

Then if someone really cares about memory usage and String#chars, they
could improve one of the c-based handlers to completely replace the
pure-ruby fallback position.

That was also on the agenda, but I have to admit I don't know if I have time for that anywhere this year (:

Manfred

Hi all,

I can't take a look at the multibyte support right now but could this
http://redhanded.hobix.com/inspect/theMarshWalker.html be used to
further reduce the memory requirements by lazy loading the codepoints
themselve ? even at the cost of increased load times for the first
pages.

for people needing a bit of international language on a shared hosting
this could be a useful compromise (if at all possible).

jean

Thanks for the interesting suggestion, but loading all the codepoints separately would force the unicode operations to a slow and grinding halt. We _can_ load the unicode database in parts. I will investigate some solutions.

Manfred

Sure. I'll get one done before the end of the weekend.

Excellent. Thanks for your help guys. I am sure everyone running on
shared hosts will thank you as well.

Tom Davies

http://atomgiant.com
http://gifthat.com

Sure. I'll get one done before the end of the weekend.

http://dev.rubyonrails.org/changeset/5476

AS tests all pass. All good? I'll merge this to stable if it is.

All the tests run for me too and I don't see any problems with this solution. Thanks, you just freed up my weekend (:

Manfred