Why should a business owner/decider choose a ceph storage cluster?

You may have read that ceph was build with the goal in mind to make the storage easily scaleable in horizontal and in vertical dimension. But what does this really mean for your business?

For what business factors shall you consider a ceph storage cluster?

For your business you typically have following considerations:

  • you consider how to get a reliable solution, in terms of
    • failsafe systems, which can fail in some means, without kicking the full system to a inoperable/unavailable status
    • easy to maintain, so that your teams can easily operate on
    • getting easily support in business critical cases
    • a fair value for the paid price regarding the factors, performance, availability and reliability
    • finally a future proof investment, so you can build your business or systems on top or around the new (storage) system
  • you consider to have a reliable and stable supply chain
    • having multiple supplier, from which you can obtain spare
    • having multiple suppliers from which you can obtain extension parts
    • having multiple vendors, in between you can easily switch based on your requirements and needs
    • having multiple options (vendors and suppliers) to avoid a classical vendor lock-ins, from which typically only the vendor or supplier takes higher profits with losses on your side. (prices go up with the time, and you are locked in without having an option to switch)
  • considering the technology part
    • you want to achieve a solt

And what is the current storage market situation?

The big players on the market, NetApp, Dell EMC, HPE, IBM and many others, have one in common. if you get into their storage systems you are locked-in. You need more disks, no issue, you need to buy special “tested” and “firmwared” standard disks of the common disk vendors. But since the storage arrays only work with the defined firmware and with the special tested drivers, you pay sometimes factors of the price you could buy the drive on the open market.

You want to buy spare parts on the open markets, e.g. disks or disk trays. Often other vendors are blocked intentionally out from the storage vendors, which simply check if the hardware was build from the storage vendor. If it was build from the OEM market it will simply ignore the new disk or the replacement parts.

Or you have the situation, that your hardware is working well and it would work for longer period as you have planned it for. But what will happen here? The vendor will limit it’s support to this hardware and you have to replace the hardware with a new version. And what is the gain for you?

Or consider the software stack. To have a multi-node clustering or even a multi-node clustering over multiple datacenter you have to buy metro-cluster licenses, which exceed the hardware prices on factors.

Also considering the software stack. Each vendor builds it own user interface, which are often also using different terms. So this makes it harder for storage admins to switch from one vendor to another.. And if you consider automation tasks around the storage devices? Here also every vendor cooks his own soup.

Have you faced already the issue, that you have a stable running server, and due to software upgrade requirements, you have to upgrade your OS to the latest version? And have you also faced the issue, that this has failed, due to the issue, that a hardware vendor, e.g. of your RAID-Controller card does not provide anymore support for the given OS version?

It looks like hyper-converged storage systems, which are available also on the most virtualization platforms like Microsoft’s Hyper-V or VMware’s vSphere are helping here. Yes, in some parts the do. You can use underneath your server nodes nearly any storage type you want. But you are here also locked in into one vendor, which is providing you the hyper-converged software (like VMware or Microsoft, or any other)

What would be the best option from the business perspective?

Let us sum up, what is really relevant for a reliable storage from different perspectives.

  • a probable disk failure shall not interrupt the access to your data
  • a disk failure shall never cause a data loss
  • backups and restores shall be easily possible
  • higher demand on more storage space shall be easily possible and ideally without any restrictions
  • higher demand on more bandwidth shall be possible to extend
  • higher demand on more IOPS shall be possible to extend

On enterprise environments we have also this needs

  • role based / department based accesses to data storage
  • local or regional copies of the data for fast data access
    fast IOPS / high bandwidth
  • transparent copies of the data
    often also named metro-cluster in the storage system, helps to keep local storages for each regions
    (admins do not have to maintain the copies, they only configure the environment)

Lets go now down to the technical side.

Storage reliability on the physical level

On the physical side we want to have, that our data is “always” available and never gets lost. Also we want to have the option to recover accidentally deleted or modified data

  • redundancy of the storage targets (disks)
  • redundancy of the wiring to the storage targets
  • redundancy of the stored data on the disks

What options do you have?

You could switch to hyper converged storage systems, which are available also on the most virtualization platforms like Microsoft’s Hyper-V or VMware’s vSphere. But what happens here? Again the things, you have a full vendor lock-in. You cannot easily redecide for you needs.
In this variant you can take all advantages of local storages (NVMe, RAIDs and so on) along with SAN/NAS based storages. But what is the issue here? All things have one in common, you have to count on one hardware vendor to support you with drivers and support.

You could