Hi all,
We use ruby 2.0.0 and rails 3.2.19 in production, with OOBGC. We are having problems understanding why our unicorn workers grow from around 400MB to 8-900MB in a number of hours after a deploy. We use New Relic and memory_tracker, but
We see that every so often, the memory_tracker log outputs lines with *** such as:
02-03 06:34:38 pid:17885 rss=320.28 vsize=649.02 *** {:total_allocated_object=>169444, :count=>0, :rss=>12.775423999999987, :heap_used=>2597, :in_use=>866139} /iphone_data/get_article
We aren’t quite sure when memory_tracker chooses to print these lines, but they seem important. I think we can define them as showing the more problematic actions, that grow the heap the most. Here’s an example output of processing the log, showing the number of times a given action triggers these *** lines over a day:
$ zgrep total_allocated_object log/memtracker_urls.log-20150203.gz|awk ‘{print $12}’|sed -r ‘s/[0-9]+(-[^/]+)?/:id/g’|sort|uniq -c|sort -n|tail -n20
20 /webapp/article/:id
22 /iphone_data/notification
24 /users/ajax_get_user_details_card
26 /articles/get_ajax_alternatives_by_symbol
28 /account/portfolio/holdings
31 /iphone_data/sync_portfolios
32 /account/ajax_authors_portfolio
34 /iphone_data/get_bookmarks
35 /news/:id
46 /user_comments_tracking/ajax_create_notification_cookies
47 /memcached:id/article_bottom/:id
57 /etf_hub
63 /account/portfolio/day_watch
96 /account/portfolio/news
106 /account/no_cookie_key
110 /api/v:id/notification_inbox/get_last_notifications
147 /article/:id
171 /instablog/:id/:id
419 /iphone_data/get_data
2519 /iphone_data/get_article
It is worth pointing out that in our sample of problematic (three-star) actions, the ratio (number of times action grows heap and prints *** to memory_tracker / number of times action called) is not much different. The actions get_article and get_data simply get called far more than other actions.
Are there any practical solutions to better-understanding what makes these actions grow the heap so much, apart from staring at the code? We feel we lack data / visibility in order to make improvements. Please let me know if this is too vague, and if more details can help provide better responses from the list.
Antony