How to test HDD & SSD for journal / metadata compatibility?

You have read a lot about fast SSDs which do not perform well in a ceph cluster?
SSDs which die fast in a ceph cluster?
Shall you prefer HDDs over SSDs?
You have no clue how to determine what is true or what is a myth?

To understand this article fully you shall read also this both articles:

Available tools under linux for our tests

Under linux we are pretty happy to have a bunch of tools, which help us to test read in depth our drives. Well known tools are

  • fio
    flexible I/O tester
  • dd
    disk dump

Controller dependent impacts on your tests

Linux use disk caches, so also do hardware controllers. These caches will cause wrong tests, since we will test the speed of the RAM instead the speed of the drive.

Disable the linux cache

Linux uses for all drives a cache. Here we need first to disable the write-cache of linux on our drive

sudo hdparm -W 0 /dev/hda 0

In the above example we disable the write cache for the drive /dev/hda

Disable HP controller cache

HP controllers have typically RAID array management, which itself requires also controller based caches to optimize the write and read performance. This caches we also need to disable!

sudo hpacucli ctrl slot=2 modify dwc=disable
sudo hpacucli controller slot=2 logicaldrive 1 modify arrayaccelerator=disable

The above command assumes, that your HP controller is in slot 2 and your drive is the number 1:

Using fio

For the more thorough tests we need to use fio. This is the sample command for our tests:

sudo fio --filename=/dev/<yourDevice> --direct=1 --sync=1 --rw=write --bs=4k --numjobs=<numOfConcurrentThreads> --iodepth=1 --runtime=600 --time_based --group_reporting --name=journal-test

Let us dive into the command options:

  • –filename
    the device we want to test.
    E.g. /dev/sda or /dev/nvme0n1
  • –direct
    here we tell fio to work with O_DIRECT (details see Performance considerations for journal / metadata disks)
  • –sync
    here we teill fio to work with O_DSYNC (details see Performance considerations for journal / metadata disks)
  • –rw
    what IO pattern to use. In our case we use write for sequential writes as journal/metadata writes are always sequential (details see Performance considerations for journal / metadata disks))
  • –bs
    this is the block size. With the blocksize we simulate the object size handed over from a client to our OSD.. In our example here, we are submitting 4K objects.
    4k is is probably a worst case scenario. If you know your workload in terms of block size, you can here modify it to your needs
  • –numjobs
    here we simulate the amount of concurrent client accesses to our ceph OSD. number of threads that will be running, think this has ceph-osd daemons writing to the journal
  • –iodepth
    we are submitting IO one by one.
  • –runtime
    job duration in seconds. Here you shall test different durations to ensure that all optimizations of the HDD/SSD vendor gets overridden to get down to the real performance values of your drive.
    The drive optimizations are typically tiering of different storage types. So this tiers are helping to increase the overall performance. Means something like memory, caches and on SSDs faster SSD cells, as the cells for the final data storage on SSDs.
  • –time_based
    if you have a fast drive, the test could finish the test before reaching the “runtime” limit. So this ensures to run the test for the specified runtime. So basically it reruns the operation over an over again until the total runtime is reached.
  • –group_reporting
    This tells fio to report a overall value. If you have multiple threads (numjobs) usually fio will report each job independently. But the independent report does not help us here. So we group the results into one overall result with this parameter.
  • –name
    name of this run/test

IMPORTANT NOTICE

The above command will overwrite your data on the drive! So be careful to take the correct drive, else you operating system or even your data is getting overwritten!

Tests to saturate your fastest drives

Your disks can be so fast that you need multiple tests to max out (saturate) the disks. We can try this with the runtime or with the numjobs parameters.

Test via long running tests

As named already above you need to test different runtimes with –runtime parameter to saturate the caches or drive tunings. Start with 60 seconds and increase it step wise to 120, 180. My tests are often this:

  • 60
    start test
  • 180
    second test
  • 600
    third test

If the 600 seconds test has nearly the same values as the previous tests, you either have already the saturation or we cannot reach itdo not reach a saturation

Test via parallel running jobs

If you have really fast drives, the above named test will not saturate your drive. To saturate the drive we have to run more parallel threads against the drive with the parameter –numjobs. My tactic here is to try first one thread and to get up by one increment.

  • –numjobs=1
  • –numjobs=2
  • –numjobs=3
  • –numjobs=4

Using dd

Sometimes you do not have the fio on you system nor the option to install it. In this case you can use also dd. This is the sample command for our dd tests. First let us generate a random test file and we ensure that the file is synced to disk

sudo dd if=/dev/urandom of=randomtestfile bs=1M count=1024
sudo sync

now lets run the test against our drive

sudo dd if=randomtestfile of=/dev/<yourdrive> bs=4k count=100000 oflag=direct,dsync

Let us dive into the command options:

  • if
    the input file, which is either our test file or the random device
  • of
    the output file, which is either our test file or our drive
  • bs
    this is the block size. With the blocksize we simulate the object size handed over from a client to our OSD.. In our example here, we are submitting 4K objects.
    4k is is probably a worst case scenario. If you know your workload in terms of block size, you can here modify it to your needs
  • count
    here we tell the the dd command how many operations to process
  • oflag
    here we define the IO operations like O_DIRECT and O_DSYNC (details see Performance considerations for journal / metadata disks)

Community driven performance measurings

You can submit your own measurings also here to let other users reduce their work too, which you also have profited yourself. So please share your details, it costs you only one minute, which helps others to reduce maybe hours of days of senseless analysis.

Step 1 of 4

Your tested drive

Please choose your drive vendor
960 pro (for the Samsung MZ-V6P512)
MZ-V6P512 (for the Samsung 960pro with 512GB of capacity)
please enter here the drive capacity in GB
The firmware version of your drive

Leave a Comment

Your email address will not be published. Required fields are marked *