Socket hang up crashes fixed with node.js domains

Socket hang up crashes fixed with node.js domains

@tomgco and I were hacking late on a new Clock node.js project. The caffeine fueled @tomgco loves pounding browser refresh like a freaking machine gun, then I hear “Oh my web server has crashed!” Developers pummel refresh, it's a fact of life, but it doesn't normally cause the httpServer to crash. Earlier that day we'd upgraded node to 0.8.20 so it didn't take long to turn our attention to the changelog and then on to a tweet that Tom had spotted.

https://twitter.com/nodejs/status/303893363877363712

‘No more leaking memory’; This killer line fills me with mixed emotion. Memory leaks are our new worst enemy since switching to node.js. Sneaking up on us, killing our services at peak times and keeping me up at night reading the dtrace manual. Naturally I’m ecstatic to find there will be less of them, but at the same time, ALL MY NODE APPS ARE LEAKING MEMORY and the fix requires a code change. Dang!

After some googling and testing we confirmed the following fix in 0.8.20 was now causing our development web server to crash:

  http: Raise hangup error on destroyed socket write (isaacs)

Here is the original commit:

https://github.com/isaacs/node/commit/e261156e7386e3d870543bee4218c7f106bfcf22

Pulling down to the stable branch: https://github.com/joyent/node/pull/4775

and found issues were already coming in: https://github.com/ether/etherpad-lite/issues/1541 https://github.com/LearnBoost/socket.io/issues/1160

In case you missed it, this isn’t going to get fixed properly in 0.8

“The proper fix is to treat ECONNRESET correctly. However, this is a behavior/semantics change, and cannot land in a stable branch. So, the full-of-sad bandaid fix is to not put data into the output buffer if the socket is destroyed, and also remove anything that is in the output buffer when the HTTP request sees that it closes.”- issacs

We just needed a ‘bandaid’ on our 0.8 apps and I was actually glad to have good reason to retro fit Domains around our apps.

The Problem

Below is a simple web server that waits 5 seconds before responding. This will error in 0.8.20 when the client connection hangs up.

var http = require('http')

http.createServer(function (req, res) {

  // Wait 5 seconds before responding
  setTimeout(function () {
    res.writeHead(200, {'Content-Type': 'text/plain'})
    res.end('Hello World\n')
  }, 5000)

}).listen(1337, '127.0.0.1')

setInterval(function () {
  console.log(process.memoryUsage().rss)
}, 2000)

console.log('Server running at http://127.0.0.1:1337/')

Running this server pre 0.8.20 you can:

  curl http://127.0.0.1:1337/ & ; sleep 2 && killall curl

Which will kill the connection atfer 2 seconds and you won't see any errors from the server but instead get a memory leak.

Switch to 0.8.20. (We use nave) to quickly switch node versions:

  nave use 0.8.20

Run the server, then connect run the curl oneliner

  curl http://127.0.0.1:1337/ & ; sleep 2 && killall curl

You'll see the server errors and dies.

timers.js:103
            if (!process.listeners('uncaughtException').length) throw e;
                                                                      ^
Error: socket hang up
    at createHangUpError (http.js:1360:15)
    at ServerResponse.OutgoingMessage._writeRaw (http.js:507:26)
    at ServerResponse.OutgoingMessage._send (http.js:476:15)
    at ServerResponse.OutgoingMessage.write (http.js:740:18)
    at ServerResponse.OutgoingMessage.end (http.js:882:16)
    at Object._onTimeout (/socket-hangup/server.js:8:9)
    at Timer.list.ontimeout (timers.js:101:19)

Our Solution

Wrap the request and response in a domain.

var http = require('http')
  , domain = require('domain')
  , serverDomain = domain.create()

// Domain for the server
serverDomain.run(function () {

  http.createServer(function (req, res) {

    var reqd = domain.create()
    reqd.add(req)
    reqd.add(res)

    // On error dispose of the domain
    reqd.on('error', function (error) {
      console.error('Error', error, req.url)
      reqd.dispose()
    })

    // Wait 5 seconds before responding
    setTimeout(function () {
      res.writeHead(200, {'Content-Type': 'text/plain'})
      res.end('Hello World\n')
    }, 5000)

  }).listen(1337, '127.0.0.1')

})


setInterval(function () {
  console.log(process.memoryUsage().rss)
  if (typeof gc === 'function') {
    gc()
  }
}, 2000)

console.log('Server running at http://127.0.0.1:1337/')

Express

If you are using express 3 you can apply a fix like this

var http = require('http')
  , domain = require('domain')
  , serverDomain = domain.create()
  , express = require('express')
  , app = express()

app.get('/', function (req, res) {

  // Wait 5 seconds before responding
  setTimeout(function () {
    res.send('Hello World')
  }, 5000)

})

// Domain for the server
serverDomain.run(function () {

  http.createServer(function (req, res) {

    var reqd = domain.create()
    reqd.add(req)
    reqd.add(res)

    // On error dispose of the domain
    reqd.on('error', function (error) {
      console.error('Error', error.code, error.message, req.url)
      reqd.dispose()
    })

    // Pass the request to express
    app(req, res)

  }).listen(1337, '127.0.0.1')

})

setInterval(function () {
  console.log(process.memoryUsage().rss)
  if (typeof gc === 'function') {
    gc()
  }
}, 2000)

console.log('Server running at http://127.0.0.1:1337/')

We’ve not got this in production yet but this patch looks like it is going to get us by. If you have a better solution please let us know.

Want to discuss a project?

Got a question? Want some more detail? We'd love to hear from you, contact Jason on +44 (0)1923 261166 or jason.treloar@clock.co.uk.

Related

databreach_istock__matej_moderc_thumb800.jpgRead
Article
26 May 2016

How to stop your customers' data being stolen

If we, as an industry, take anything from the data leaks at TalkTalk, British Gas and Morrisons, it should be that we must take every measure we can to secure customer data. Offering customers a more personalised experience means providing an environment where they are confident that the information they provide will be safe. Collecting and storing customer data and finding out more about your users is key to generating leads and gaining customer insight. But in the rush to get campaigns out the door and find affordable ways to create your digital products, ensuring third parties don’t risk your customers’ privacy and your reputations can be overlooked.

Screen Shot 2016-05-13 at 12.44.37.pngRead
Article
16 May 2016

How to build, test, share and publish a javascript Hybrid mobile application using Cordova

Mobile Applications (Apps) are something every developer wants to create, however, not every developer wants to have to learn multiple languages to be able to create an App which works across different types of devices, such as Android and iOS. Learning Objective C (or Swift) and Java is probably enough to put most people off the idea of creating a cross-platform App. However, it’s possible to create one using technologies which most developers are familiar with. Good old HTML, CSS and JavaScript is all you need. Well, that and Apache Cordova, the mobile application development framework that allows you to build Apps for multiple platforms using a single code base.

Blog-post-img-01.jpgRead
Article
26 April 2016

MongoDB Performance on ZFS and Linux

Here at Clock we love ZFS, and have been running it in production on our Linux file servers for several years. It provides us with numerous excellent features. With the recent release of Ubuntu Xenial 16.04 official support for ZFS is now here, and we are keen to integrate it fully into our next generation hosting stack.

Come and work for Clock

Clock is made up of bright, hard-working and talented people and we're always on the look out for more. You can browse the current jobs below or follow us @clock for the latest vacancies.

View Latest
Vacancies