NewRelic causing OutOfMemoryError in Play Framework app

Today I disabled the NewRelic JVM agent on one of my projects. While the Play Framework server was outputting a ZipOutputStream to a client, the NewRelic agent would for some reason gather massive amounts of data and cause the JVM to Garbage Collect continuously until the app became unresponsive, and finally crashed:

Uncaught error from thread [play-akka.actor.default-dispatcher-33] shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[play]
java.lang.OutOfMemoryError: GC overhead limit exceeded
Uncaught error from thread [play-scheduler-1] shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[play]
java.lang.OutOfMemoryError: GC overhead limit exceeded
[ERROR] [02/27/2015 14:45:10.002] [application-scheduler-1] [ActorSystem(application)] exception on LARS? timer thread
java.lang.OutOfMemoryError: GC overhead limit exceeded

Since september I had been using NewRelic’s v3.10.0 agent. The specific zip streaming that was causing the issue was a feature supposed to be used starting Februari 27, but of course the feature was tested before. Both locally and in production the feature seemed to work, for smaller amounts of files. However in production the typical amount of files in the zip would be more than 1000 each of them ranging from several KB to 0.5MB. As soon as we discovered the issues we started delving into what could have caused the symptoms: a server that would not handle anymore requests, using 100% CPU and it maximum allowed memory size (-Xmx1024m). We did refactor the complete logic responsible for serving the zip‘s, but to no avail. Locally the new method seemed better: now the zipping would not continue to use resources after the request would be closed prematurely. We also wrote a test that would simulate zipping random files, this also worked, locally.

Locally however, no NewRelic was installed. How could a service responsible for showing problems ever be the cause of the problems, we thought.

It now being past Februari 27, the zip streaming had been enabled for our users. We saw an immediate increase in downtime: the server would hang, we would ironically get an e-mail from NewRelic, and we would restart the server. Of course the logs directed us to the culprit: the initiation of a zip stream was always the last action before the downtime.

This weekend I decided to investigate, and learned all new tricks. Having never done a Heap Dump before this felt quite tricky at first, but in the end it was very easy:

> ssh server “ps x | grep play”
> ssh server “sudo -u play jmap -dump:file=/dump.hprof <processid>”
> scp server:/dump.hprof
> # Open dump with Eclipse’s MemoryAnalyser

Eclipse Memory Analyser is a free tool that is very easy to use. It starts importing your dump file (which is way faster in Run in Background mode!) and then shows really helpful statistics:

After seeing this analysis I was baffled: how could this be? So I searched and found this thread on the New Relic forum from October 2014. More people had this issue! In December they released version 3.12.1 which has the following release notes:

Improvements

  • Play 2 async activity is no longer tracked when transaction is ignored.
  • Reduced GC overhead when monitoring Play 2 applications.
  • Reduced memory usage when inspecting slowest SQL statements

[..]

… they said. So I updated to a new version of the agent. It did not work. The server would still hang caused by the many Transactions stored in the queue. New statistics:

Too bad that the update did not fix the issue. I hope that NewRelic can fix the issue in the future, as I really liked the assurance that you get mails when your server is down, but also being able to drill down into performance issues. By the way: besides the issue still being there, also the apps performance decreased when using the updated agent:

Web request performance before and after updating from New Relic agent 3.10 to 3.14
Performance decrease from going from New Relic agent 3.10 to 3.14

 

[OS X] High process numbers

Ever experienced high process numbers (20k+) in Finder? That’s probably since some process keeps respawning over and over. Some launchAgents fail to start and launchtcl keeps trying to start them. This is filling your hard disk with logs and keeps your hard disk busy writing, preventing it to spin down.

12-07-12 11:11:25,732 com.apple.launchd: (org.postgresql.postgres[66258]) Exited with code: 1
12-07-12 11:11:25,732 com.apple.launchd: (org.postgresql.postgres) Throttling respawn: Will start in 10 seconds
12-07-12 11:11:34,034 com.apple.launchd: (com.edb.launchd.postgresql-9.1[66265]) getpwnam("postgres") failed

You should probably fix the error that prevents the agent or daemon to start but thats depending on the kind of agent. In my case I didn’t need the agents that were spawning. I didn’t need a PostgreSQL-server and neither the Wiki-server OS X is providing, coming with all kinds of collab* processes. Please proceed only if you don’t need the agent you are going to remove, permanently!

So, how do you stop these annoying little agents? First determine the process-name. Then lookup the launchctl plist files:

$ sudo launchctl list | grep annoyingAgent

You can unload/remove this plist from launchctl by running:

$ sudo launchctl remove 'com.your.annoying.agent.plist'

Please check now that your computer is still functioning. Do this first, as you can return easily until now. Just replace the ‘remove’ with ‘load -w’ to re-add the agent.

When you reboot your machine the processes are sometimes coming back and to permanently disable them run:

$ locate 'com.your.annoying.agent.plist' | while read -r line; do sudo mv $line $line.disabled; done;

Control your home server

Last year I published my Ultimate Single Sign On tool for setting up a perfect Home Server. The work on this project is not yet done and I can use all the help you guys can offer me. Setup is easy: just use VirtualBox, Ubuntu Server and install after a git clone of git://github.com/hermanbanken/Ultimate-Single-Sign-On-Enviroment-Installer.git.

When you have this (or your own working LDAP server) up and running your users can login on Mac OS X and Ubuntu and can use their home directories and stuff. But they can’t change their credentials and user info, till now..

LDAP Control is a control centre for your Home Server where your users can manage their data (feature #2) but can also start and plan backups (feature #1), browse your families Address Book (feature #3), manage your favorite series and movies through the media centre (feature 4), and much more (when you guys help me).
Continue reading Control your home server

iPad 2/JailbreakMe 3.0 FAQ

Since there is a lot of confusion out there, and since I’m repeating myself all the time (which I do not really like), I made this little write up of questions that are continuously being asked (my personal FAQ). Please note that this is a global explanation. Don’t try to argue with me on specific details.
This FAQ has been written by @veeence and is being powered by @hermanbanken.

Continue reading iPad 2/JailbreakMe 3.0 FAQ

New Author, introducing: Sean

In the last few months this blog started to generate a lot more traffic than before. Mostly via the tweets from my fellow student (@veeence) but my site also gained a few returning visitors interested in the Mac server scene. As these folks started to test my scripts posted here they found found some bugs and posted some comments. One of them is Sean and today I’m announcing Sean as a new writer for this site.

Sean will mostly write about Mac server imitation implementations, complete with Single Sign On and Windows support. But of course he can write about whatever he wants if it’s only a little bit relavant for you guys!

So stay tuned and if you like the blog, subscribe for the RSS feed!

WhatsApp on Mac or PC

Update 13/01/13: run WhatsApp on your pc/mac without a phone/VNC!

WhatsApp is a fairly new App for multiple mobile devices. Due to the App Store, Android Market, etc. it grew huge in just a few months. Especially since people started to use them as a replacement for SMS chats and because your identity is based on your mobile number.

I think WhatsApp as a service has one disadvantage: it only works on mobile devices. Although it might be the reason it so popular. At the moment it’s a replacement for SMS and services that offer equal features like Google Talk work on multiple devices. So the service is not thought of as some kind of replacement for SMS. The nice thing is that every one who has WhatsApp has a smartphone and therefore can also download Google Talk apps should WhatsApp become less popular.

The nice thing of a full scale qwerty keyboard is that you can type way faster and therefore communicate faster. So the other day I was having a conversation on WhatsApp and messages where being send almost faster than they could be read. In order to keep up with the chat I really needed a full size keyboard. Then I came up with the following idea: install a VNC server on my iPhone and connect to it from my mac!

So here is the manual to do so:

  1. Jailbreak iPhone
  2. Install Veency
  3. Install VNC client on mac like “Chicken of VNC” or “JollysFastVNC“.
  4. Connect to your iPhone

Until somebody decides to write a app that can use WhatsApp as a server – I might get that idea in my head and end up writing it myself, so stay tuned – we’ll have to use this epic work-around or continue using our phones keyboard..