We recently made the move off heroku to a more basic box because we ran into a couple of issues. Although I don’t think that these are issues you will face in many other places, it proved to be worth moving.
First, it’s worth noting a bit of context for the application we were building. Although being web-based, we are building a prototype application connecting up a pre-designed user interface to a number of backend systems. We do not own the backend systems, we simply integrate with them. Our goal is to quickly see how difficult some of the implementation of an interface such as one envisaged would be, and also to discover gaps or contradictions in the data structures we get back by working through a real example.
Our First Problem
The first problem we had is a hard timeout, limited to 30 seconds and enforced by the routing layer by Heroku. They document it very well here. As they mention in their document, “the timeout value is not configurable” and is a reasonable constraint on normal web applications that you want to scale. Given our current design and architecture, moving away from a synchronous, stateless application to a model of asynchronous, polling would certainly add more application complexity to our code.
Our Second Problem
The second problem we had was strange behaviour with a java application deployed. When connected up to other external systems, the java application seemed to continue growing in its memory use. Restarts, new redeploys didn’t seem to fix it. We tried this labs feature (log-runtime-metrics) to get more insight but could only see memory go up and up. We tried setting the max heap size to a small amount but that didn’t seem to do anything. Concerned we had a real memory leak, we ran the code locally with the same production settings with a very small heap size and could not replicate it.
Given where we are, we wanted to move fast without worrying about the infrastructure taking more development time, or unnecessary application code complexity. So we simply moved off. Spun up a new virtual box and were up and running in about half a day. It probably took another day and half to move all of our continuous deployment tasks to our new platform too.