Increased memory requirements on 1.2

I just recently upgraded a rails app of mine to run on edge (and the 1-2-pre-release branch) and I noticed my fcgis required roughly 6-8MBs more memory after just a couple requests.

For example, each fcgi on edge would start around 40MB and rise to ~46MBs after a couple requests.

I downgraded my app back to 1.1.6 and each fcgi would start at around 33MBs and rise to ~38MB.

As a result of the increased memory requirements, my app can no longer be run on TextDrive using edge since they have a max resident memory cap of 48MBs which I would quickly hit (and they just kill your fcgi when you do).

Is this increased memory requirement to be expected?

I would be happy to give you more details about my setup if you think it may have an impact on why the memory usage is so high in my case.

Thanks,

Tom Davies

http://atomgiant.com http://gifthat.com

Is this increased memory requirement to be expected?

I would be happy to give you more details about my setup if you think it may have an impact on why the memory usage is so high in my case.

I've had a sneaking suspicion about this for awhile. Even Beast, my 500 LOC forum starts off with about 40MB.

First I compared Beast on edge vs Beast on rev 4818 (edge rails at the time of its first release). They both started at about 22MB, but Beast on edge jumped to about 37MB after making a single request to 4 controllers. I then incremented the revision number until I started seeing the same memory increases and found it started after changeset 5223, the one that introduced ActiveSupport::Multibyte. Anyone else seeing similar results?

I ran my app using mongrels -B flag to see if I was leaking memory and one thing I noticed was the second highest entry is related to Multibyte:

ActiveSupport::Multibyte::Handlers::Codepoint, 17721

So, it would seem that multibyte may be the culprit.

Tom

Does redefining KCODE, or making String#chars just return self make any difference?

Nope, the unicode tables are always loaded and frankly take up a lot of space. The reason they're always loaded is that we don't know in advance if someone is going to set KCODE to utf-8.

Manfred

there isn't some kind of ruby event that checks when a constant or a variable gets assigned? this way you could do your stuff at the moment KCODE gets set. but it would not help much for user like me who set it almost all the time.

Couldn't we lazy load the UnicodeDatabase instance?

      # UniCode Database       UCD = UnicodeDatabase.new

That way instead of chewing up that ram every time, we just chew it up when using the utf8 handler?

Also, does the utf8_handler_proc implementation need UCD?

Yes, we can lazy load the database, I think that's the best solution...

The _proc implementation wouldn't need the UCD if all the unicode operations we need were implemented in utf8_proc, but unfortunately downcase is broken in utf8_proc at the moment. All the alternative unicode handlers are descendants of utf8_handler so we always have to pure Ruby fallback.

Manfred

Yes, we can lazy load the database, I think that's the best solution...

Awesome, care to make a patch?

The _proc implementation wouldn't need the UCD if all the unicode operations we need were implemented in utf8_proc, but unfortunately downcase is broken in utf8_proc at the moment. All the alternative unicode handlers are descendants of utf8_handler so we always have to pure Ruby fallback.

Either way, if KCODE is set to something other than 'u', then we should probably ensure that people don't have to load the big codepoints table. That way people with memory constrained environments can still run.

Then if someone really cares about memory usage and String#chars, they could improve one of the c-based handlers to completely replace the pure-ruby fallback position.

Yes, we can lazy load the database, I think that's the best solution...

Awesome, care to make a patch?

Sure. I'll get one done before the end of the weekend.

The _proc implementation wouldn't need the UCD if all the unicode operations we need were implemented in utf8_proc, but unfortunately downcase is broken in utf8_proc at the moment. All the alternative unicode handlers are descendants of utf8_handler so we always have to pure Ruby fallback.

Either way, if KCODE is set to something other than 'u', then we should probably ensure that people don't have to load the big codepoints table. That way people with memory constrained environments can still run.

I agree. I didn't notice it took that much memory. I never expected that a marshaled file of 600k could grow to more than 15 Mb in Ruby.

Then if someone really cares about memory usage and String#chars, they could improve one of the c-based handlers to completely replace the pure-ruby fallback position.

That was also on the agenda, but I have to admit I don't know if I have time for that anywhere this year (:

Manfred

Hi all,

I can't take a look at the multibyte support right now but could this http://redhanded.hobix.com/inspect/theMarshWalker.html be used to further reduce the memory requirements by lazy loading the codepoints themselve ? even at the cost of increased load times for the first pages.

for people needing a bit of international language on a shared hosting this could be a useful compromise (if at all possible).

jean

Thanks for the interesting suggestion, but loading all the codepoints separately would force the unicode operations to a slow and grinding halt. We _can_ load the unicode database in parts. I will investigate some solutions.

Manfred

Sure. I'll get one done before the end of the weekend.

Excellent. Thanks for your help guys. I am sure everyone running on shared hosts will thank you as well.

Tom Davies

http://atomgiant.com http://gifthat.com

Sure. I'll get one done before the end of the weekend.

http://dev.rubyonrails.org/changeset/5476

AS tests all pass. All good? I'll merge this to stable if it is.

All the tests run for me too and I don't see any problems with this solution. Thanks, you just freed up my weekend (:

Manfred