MongoDB Performance on ZFS and Linux

MongoDB Performance on ZFS and Linux

Here at Clock we love ZFS, and have been running it in production on our Linux file servers for several years. It provides us with numerous excellent features, such as lightweight snapshotting, incremental send/receive, and transparent compression. With the recent release of Ubuntu Xenial 16.04 official support for ZFS is now here, and we are keen to integrate it fully into our next-generation hosting stack.

As a Node.js and MongoDB house, one of our main concerns has been how MongoDB will perform on ZFS on Linux, especially after reading about potential problems other people have faced. There really isn’t much data out there to put our minds at rest.

We decided to setup a method of benchmarking MongoDB with the supported EXT4 and XFS, then compare against ZFS with different options enabled. The idea being that we can hopefully figure out how ZFS compares, and if there are any options we can set that will impact the performance in any noticeable way.

There are a few caveats to our testing, so we are aware that these results need to be taken with a pinch of salt. They are aimed at just providing an indicator as to the performance between the filesystems, not being a definitive guide to which is best to use.

Setup

The main variable that may affect the results was the hardware that we chose to use. We spun up a 4GB Linode instance with four cores, and four virtual disks. One for the latest Ubuntu 16.04 image, and one disk for each of the filesystems that we intended to test: EXT4, XFS and ZFS.

The issue with this approach is that the system is running on a virtualised machine with shared hardware, so there may be variations in the performance available to the machine. In an ideal world we would run this on a physical machine with identical disks, but that wasn’t feasible for this investigation.

We decided to use the latest stable version of MongoDB 3.2.5, and ZFS was at version 0.6.5.6 provided by the zfsutils-linux package for Xenial.

To benchmark the performance we investigated a few options, such as YCSB, and even writing our own benchmark based on examples of our real world data and queries. However we settled on using a Java-based tool, sysbench-mongodb. This made it easy to configure and run consistent and repeatable tests that would push the database to its limits.

Methodology

First the drives were mounted to directories reflecting their filesystems. This made it easy to switch the filesystem that MongoDB was using.

Filesystem  Type  Size  Used  Avail  Use%  Mounted on
/dev/sda    ext4  7.7G  1.8G  5.5G   25%   /
/dev/sdc    ext4   20G  44M   19G    1%    /ext4
/dev/sdd    xfs    20G  33M   20G    1%    /xfs
tank        zfs    19G  0M    19G    0%    /zfs

The drives were setup and formatted with the default options of mkfs.ext4 and mkfs.xfs and zpool create. I then wrote a script, which can be found here, to utilise these disks with the sysbench-mongodb utility, and log the results. If you want to see the specific commands that we used, please have a look at the script.

The way the script works is by destroying and recreating the ZFS volume with the option that is being tested, then starting a mongod instance, using the filesystem's mount point as the dbpath. For example mongod --directoryperdb --dbpath /zfs. Then we simply run the sysbench-mongo script, and pull out the results of the run.

The parameters we decided to test for ZFS are listed below:

Defaults (ashift = auto, recordsize = 128K, compression = off)
Defaults & ashift = {9,12}        
Defaults & recordsize = {8KB,64KB}        
Defaults & compression = {lz4,gzip}        

We also edited the sysbench-mongodb config ever so slightly. We opted to use the FSYNC_SAFE write-concern to ensure that the data was getting written to disk, and not just stored in RAM. We also reduced the number of documents per collection to 1,000,000 tenfold less than the default 10,000,000. This was simply to save time on each “Load” step, something we aren’t too concerned with as our applications are principally read-heavy.

After running the benchmarks for each of the separate filesystems ten times and recording the last cumulative average of the inserts per second for the “Load” stage of the benchmark, and the last cumulative average of the transactions per second for the “Execute” stage, we could create a representative average of the performance for each filesystem.

Results

 

You can download the raw data here if you want to perform your own analysis of the results.

As we suspected, ZFS doesn’t perform quite as well as the other filesystems, but it is worth noting that with the default settings it is only slightly slower. Most importantly, we didn’t uncover any show-stopping performance issues, as hinted in the discussions that are linked above. Unless you are looking to get the utmost performance in your queries, then ZFS certainly looks to be a viable option. Moreover, we feel that the benefits gained by utilising ZFS are more than worth the minor performance penalties.

This investigation has been far from definitive, but hopefully has provided you with a rough overview of how these filesystems perform. If you know of ways to help us improve the results, or performance of MongoDB on ZFS, please do let us know. We are keen to hear your experiences!

Want to discuss a project?

Got a question? Want some more detail? We'd love to hear from you, contact Jason on +44 (0)1923 261166 or jason.treloar@clock.co.uk.

Related

databreach_istock__matej_moderc_thumb800.jpgRead
Article
26 May 2016

How to stop your customers' data being stolen

If we, as an industry, take anything from the data leaks at TalkTalk, British Gas and Morrisons, it should be that we must take every measure we can to secure customer data. Offering customers a more personalised experience means providing an environment where they are confident that the information they provide will be safe. Collecting and storing customer data and finding out more about your users is key to generating leads and gaining customer insight. But in the rush to get campaigns out the door and find affordable ways to create your digital products, ensuring third parties don’t risk your customers’ privacy and your reputations can be overlooked.

Screen Shot 2016-05-13 at 12.44.37.pngRead
Article
16 May 2016

How to build, test, share and publish a javascript Hybrid mobile application using Cordova

Mobile Applications (Apps) are something every developer wants to create, however, not every developer wants to have to learn multiple languages to be able to create an App which works across different types of devices, such as Android and iOS. Learning Objective C (or Swift) and Java is probably enough to put most people off the idea of creating a cross-platform App. However, it’s possible to create one using technologies which most developers are familiar with. Good old HTML, CSS and JavaScript is all you need. Well, that and Apache Cordova, the mobile application development framework that allows you to build Apps for multiple platforms using a single code base.

node blog post.jpgRead
Article
9 March 2016

A Simple Website in Node.js – 2016 Edition

Four years ago I wrote a post on how to build a simple website in Node.js. Seeing as it’s still the most popular article on this blog, I thought I’d take a look at how things have evolved, what I would do differently today, and some additional nuggets of advice.

Come and work for Clock

Clock is made up of bright, hard-working and talented people and we're always on the look out for more. You can browse the current jobs below or follow us @clock for the latest vacancies.

View Latest
Vacancies