Unicorn memory bloat, unsure how to gather useful data to improve

Hi all,

We use ruby 2.0.0 and rails 3.2.19 in production, with OOBGC. We are having problems understanding why our unicorn workers grow from around 400MB to 8-900MB in a number of hours after a deploy. We use New Relic and memory_tracker, but

We see that every so often, the memory_tracker log outputs lines with *** such as:

02-03 06:34:38 pid:17885 rss=320.28 vsize=649.02 *** {:total_allocated_object=>169444, :count=>0, :rss=>12.775423999999987, :heap_used=>2597, :in_use=>866139} /iphone_data/get_article

We aren’t quite sure when memory_tracker chooses to print these lines, but they seem important. I think we can define them as showing the more problematic actions, that grow the heap the most. Here’s an example output of processing the log, showing the number of times a given action triggers these *** lines over a day:

$ zgrep total_allocated_object log/memtracker_urls.log-20150203.gz|awk ‘{print $12}’|sed -r ‘s/[0-9]+(-[^/]+)?/:id/g’|sort|uniq -c|sort -n|tail -n20

 20 /webapp/article/:id
 22 /iphone_data/notification
 24 /users/ajax_get_user_details_card
 26 /articles/get_ajax_alternatives_by_symbol
 28 /account/portfolio/holdings
 31 /iphone_data/sync_portfolios
 32 /account/ajax_authors_portfolio
 34 /iphone_data/get_bookmarks
 35 /news/:id
 46 /user_comments_tracking/ajax_create_notification_cookies
 47 /memcached:id/article_bottom/:id
 57 /etf_hub
 63 /account/portfolio/day_watch
 96 /account/portfolio/news
106 /account/no_cookie_key
110 /api/v:id/notification_inbox/get_last_notifications
147 /article/:id
171 /instablog/:id/:id
419 /iphone_data/get_data

2519 /iphone_data/get_article

It is worth pointing out that in our sample of problematic (three-star) actions, the ratio (number of times action grows heap and prints *** to memory_tracker / number of times action called) is not much different. The actions get_article and get_data simply get called far more than other actions.

Are there any practical solutions to better-understanding what makes these actions grow the heap so much, apart from staring at the code? We feel we lack data / visibility in order to make improvements. Please let me know if this is too vague, and if more details can help provide better responses from the list.


I had to track down an issue like this recently and found rack-mini-profiler very useful - as well as the sorts of numbers you already have it can print out how many objects of each class were allocated this request and in total - this sometimes helps pin down what is being leaked