Scaling Node.js and MongoDB on Joyent Cloud
Here at Clock, we recently deployed a large Node.js and MongoDB application for a client to Joyent's cloud platform, designed for horizontal scalability. This document gives a brief overview of the configuration, and observations in the setup.
We are not explaining how to design a scalable application; it's assumed that any Node instance can correctly handle any request at any time, and therefore sessions (or any other shared data) are available to all of them. We store the session data in the MongoDB replica set to achieve this.
All deployments are on SmartMachine appliances at the same data centre. These are instances of SmartOS, which is a fork of OpenSolaris. As a Linux house, there was some frustration getting used to the Solaris commands, but resources such as the Rosetta Stone for Unix proved useful.
Within Joyent's cloud, there is a possibility that other clients will have access to the private interfaces of your instances. Therefore it's critical that they are correctly firewalled, and this should of course be done first. Joyent provide documentation for configuring the firewall. On each server, after initial setup of the ipfilter service, we configured /etc/ipf/ipf.conf as follows:
1 line per server in the cluster like this:
pass in quick from <remote private IP> to <this machine's private IP>A final line to block all other IPs:
block in from any to <this machine's private IP>Then a final command to enable the new rules:
svcadm restart ipfilterThis is clearly a poor (long-winded, error-prone, etc.) way to configure firewall rules, and we are actively looking into ways to manage them better.
Node.js deploymentThe application is deployed and hosted on instances of the "Node.js SmartMachine", version 1.3.3. This comes with git-deploy and Node.js v0.6.8. Once setup, deployment boils down to a simple "git push". Their git hooks will then clone the repo into a timestamped directory under /home/node/node-service/releases/, compile any npm modules (if they haven't been compiled before) and start the application. If anything fails, it will roll-back to the previous deployment. We've found the process to be effective and straightforward.
MongoDB configurationWe set up multiple instances of the "MongoDB SmartMachine Database Appliance", version 1.0.6. These are SmartOS instances which come preconfigured with MongoDB 2.0.1. MongoDB is, of course, delightfully simple to configure in a replica set. There is good documentation, but this is an overview of the observations and changes we made to the Joyent appliances:
- The whole of MongoDB (including config) is installed in /mongodb.
- The database is configured to start automatically and run as the “mongodb” user.
- /mongodb/mongodb.conf was edited to add or amend these lines:
bind_ip = 127.0.0.1,<this machine's private IP> replSet = <your set name> rest = true
- Restart the service with "sudo svcadm restart mongodb".
MongoDB arbiterA minimal MongoDB replica set ideally needs 3 servers (for promotion to primary in the case of failure). If you start with only 2 Mongo appliances, the 3rd can be configured as an arbiter on one of the web nodes. Joyent provide simple instructions to install MongoDB on a Node.js SmartMachine. However, presently, the install script leaves you with MongoDB running as root (!). To avoid that, add a new user, and edit the SMF manifest which is embedded in mongodbnode.sh before running it. To enable the arbiter to be part of the replica set, we replaced the following line in the SMF manifest within the mongodbnode.sh:
<exec_method name='start' type='method' exec='/root/local/bin/mongod --bind_ip=127.0.0.1,<this machine's private IP> --replSet=<your set name> --journal --dbpath /root/local/var/mongodb &' timeout_seconds='60'/>
It's worth noting that the script strangely installs Mongo in /root/local. To run the shell you need to specify the full path: /root/local/bin/mongo.
Now the cluster is at a stage where the replica set can be initialised by following 10gen's documentation.
Load-balancing the Node.js instances
Node.js code changes
We use Christian Kvalheim's MongoDB Native Node.js Driver, which already has replica set support. From the documentation you can see that it's very simple to use, so switching from a single Mongo instance is trivial. Be sure to enable the "read_secondary" option to take advantage of reading from all servers. (This option is referred to as slaveOk by Mongo itself.)
For coders, it's useful to develop against a test replica set. To aid this, we wrote mongo-replset-test, a simple bash script to create throw-away replica sets on the local machine.
As you can see, on the whole we've found the configuration of this site on Joyent's SmartMachines fairly easy and straightforward. In terms of fail-over, failure of either MongoDB or Node.js instances should not impact the running of the cluster. However it's worth understanding the location of the servers. When you create a SmartMachine, Joyent's algorithm will try to locate the new instance in a cabinet not shared by your other servers. If it can't, then it will try to ensure that it's on a different server. If that still can't be achieved, you could find two of your instances running on the same physical server. That last case is unlikely but clearly undesirable, in terms of fail-over. The only way to verify the location of your instances is to raise a support ticket, and a Joyent technician will confirm for you.
In regard to expanding the site's capacity, we can now horizontally scale the web nodes by adding more instances. It's a very quick process, but the firewalling setup is an ongoing issue which we need to address.
The MongoDB replica set can also be scaled easily in terms of database reads. All writes must go through a single server (which is dynamically elected by the cluster, and automatically recognised by the Node.js driver). If this is a limitation, a sharding configuration is required to scale further.