Australian NCI Adds Ceph Object Storage to Luster Filesystems


Object storage has garnered increasing interest from organizations in recent years as a convenient way to store and manage the increasing amounts of data they accumulate, especially when it can be a mix of structured and unstructured data and lots of machine-generated telemetry.

While Amazon, with its S3 cloud storage service, can take much of the credit for popularizing object stores and their benefits of high scalability and low cost, all organizations do not want to store data in the cloud. A number of vendors have also offered on-premises object storage systems over the years, and several have taken the opportunity to build solutions around open source software storage platforms such as Ceph.

One of those Ceph distributors is SoftIron, which is developing its line of scalable HyperDrive storage appliances, with claimed wire-speed performance and a Storage Manager console that it says makes it much easier to monitor and manage all storage software and hardware, in particular for a wide field of home appliances.

These claims about the ease of use and cost savings that can be achieved through object storage proved compelling enough that Australia’s National Computational Infrastructure (NCI), a high-performance computing and data services facility located at the Australian National University in Canberra, adopts SoftIron storage. appliances to meet some of its new storage requirements.

NCI is one of two national Tier 1 HPC facilities funded by the Australian Federal Government, the other being located in Perth, on the other side of the country. According to NCI partner Andrew Howard, NCI differs from many other HPC sites in the way it tightly integrates its HPC compute and storage with its cloud facilities, which provide edge-based services that don’t fit into a traditional HPC workload environment. director of cloud services.

Australia’s fastest research supercomputer is housed at the NCI, which is the 9.26 petaflop ‘Gadi’ system. Gadi has 3,200 nodes built with Intel “Cascade Lake” Xeon SP processors and Nvidia V100 GPU accelerators, interconnected by a 200 Gb/s InfiniBand HDR fabric using a dragonfly+ topology. The NCI handles a wide range of scientific workloads, according to Howard, while the Pawsey Supercomputing Center in Perth is largely dedicated to ongoing work for the Square Kilometer Array radio observatory.

“We tend to deal with all other science activities other than high-energy physics,” Howard said. The next platform. “So we’re probably one of the biggest consumers of network data on the planet and we consolidate a lot of international datasets so they’re available to our users, and the data can be directly computed on our HPC and our cloud systems,” he said. said.

The HPC compute nodes are served by a storage subsystem consisting of NetApp enterprise-class storage arrays running a Luster parallel file system, connected by 200 Gb/s or 400 Gb/s InfiniBand. However, Howard says that NCI has looked at a number of additional and emerging use cases that are more suited to Ceph-style object storage.

“Typically, this is the type of use case for delivering larger, more static datasets. So that’s where we need read performance for data publishing and where we need to share data between HPC and cloud facilities in an efficient way,” says Howard.

By moving these large static data sets to object storage, NCI can avoid tying up its HPC file system and instead keep it free for demanding, high-intensity applications. Another advantage is that an object-based interface allows the facility to support very long URLs pointing to the data itself, which is published in data catalogs that researchers can access.

“This is particularly important in the field of climate and weather. We store approximately 30 years of all satellite imagery, which is available to Australian climate researchers, the Bureau of Meteorology, to do weather forecasting and ongoing climate research in agriculture,” says Howard. .

Overall, the NCI installation’s storage is effectively divided into four areas, with the Luster HPC file system representing the top tier. The next tier includes systems like SoftIron Ceph storage, which deliver a different kind of performance, according to Howard.

“The biggest differentiator is that Luster is only available on InfiniBand. All of our other storage services are available on Ethernet, so we have a 100 Gbps Ethernet backbone across the entire facility,” says- he.

Another tier is more traditional volume-based storage, sitting on the same level as NCI’s cloud-based storage, while underlying everything is a hierarchical storage system, where up to 30 years of scientific data is kept in the archives of certain disciplines, but which also serves as an ongoing backup facility for data as users continue to generate more and more data and perform more calculations on it.

“One of our typical workflows, and one area where we plan to use SoftIron equipment, is where we receive data from ESA Sentinel satellites, perform quality analysis to ensure consistent data capture Using our NIRIN cloud, let’s publish the data to a collection for longitudinal analysis as part of an HPC workflow, and then publish the results of the analysis to the collection,” says Howard.

Using an S3 interface here – like almost all object storage offerings these days – allows the NCI facility to separate ongoing production data capture from its quarterly maintenance cycle for maintenance work. installation (power, cooling, construction work), he explained.

The NCI also has similar workflows for data received from telescopes, genome sequencers, sensor clouds, and high-resolution agricultural imagery to monitor crop development. All of these share the need for a quality assurance process on the data, followed by a processing step for data augmentation (such as geo-rectification or cloud removal from the satellite imagery), allowing the site to use the most energy-efficient platform for the task. , according to Howard.

Regarding the reasons for choosing SoftIron, Howard says it was not only the ease of use that SoftIron offers in addition to Ceph, but also the modularity and low maintenance effort required by HyperDrive storage devices. .

“Typically, about half of one team member consults SoftIron on an occasional basis, whereas support overhead for Luster is one to two team members who are needed on an ongoing basis. But our investment in Luster is much bigger, over 80 PB at the moment, so that’s kind of the horses for the courses,” he said.

The GUI and additional tools that SoftIron has added to its Ceph implementation include a single pane of glass to monitor service health, and when a disk fails, it shows exactly which disks it is. and what action should be taken.

It boils down to something as simple as a “Ceph button” on each drive enclosure in a HyperDrive appliance, which when pressed indicates that maintenance is going to be performed on those drives. The team member can then remove the enclosure, swap out the failed disk, replace the enclosure, and then press the Ceph button again to tell the appliance that the disks are available again.

“It really allows us to deploy less experienced staff members to maintain SoftIron, because they’re just pointed in exactly the right direction in terms of what they need to do for hardware swapping and maintenance,” explained Howard.

“Anyone can create their own Ceph cluster, and for the most part it works really, really well. But when things go wrong, just being able to find where the faults are and a really nice GUI to help to maintenance, it makes life much easier in terms of being able to deliver a really resilient high performance system for our users,” he added.

While Ceph provides an S3-compatible interface, objects can also be served through an NFS interface, which could be useful for further integration with more traditional HPC applications.

“We have been an HPC installation for about twenty years in different forms. NFS is therefore an integral part of our service offerings. We have workloads that access Luster via NFS in our cloud context, we have other NFS services, some of our legacy services still use NFS, so it still has a place among the most popular storage access protocols that exist between HPC and the cloud, and the fact that SoftIron could also provide an NFS interface was just another tick on the box that if we need it, it’s there, we just need it. ‘activate,’ Howard says.

NCI is initially taking delivery of enough SoftIron HyperDrive appliances to provide 12.5PB of object storage, and these are currently being delivered and installed, once enough space is available in the center data to integrate it.

“Running a data center is a bit like trying to mix up the deck chairs. There’s never enough power, there’s never enough space, and we usually use up all the space we have. So it’s a matter of removing some old equipment and clearing out five racks to fit it in,” says Howard.


About Author

Comments are closed.