Two clusters, two challenges

10/15/2024 | Maximilian Dorzweiler

The first (sub)challange of the day began early at Zurich HB, where we boarded the train to Lugano after nearly having one group member miss it. Luckily though, everybody made it, albeit, some a little tired. As always, though, the scenic train ride didn’t disappoint and 3 hours went by like a breeze.

View from the Gotthard railway
View from the Gotthard railway

Finally at CSCS, we first met with Hussein (a HPC mastermind who works at CSCS and mentors) who laid out the game plan for the day. It consisted of two tasks:

  • Reassemble the competition cluster that just arrived back form ISC24
  • Merge our piora and racklette clusters together

Our Sbrinz cluster returning from ISC24
Our Sbrinz cluster returning from ISC24

We wasted no time and got straight to work on the first task. Off in the warehouse part of the building, there was our ISC cluster, still shelled in its wooden transportation crate. After having freed the cluster from its wooden encasement, we now had to reassemble the nodes. Since some were kindly lended by E4, we had to repackage those and prepare them for shipping. Before doing that though, we took our chance to peek into the (very expensive) nodes to marvel at their complexity and remove some hardware components that belonged to CSCS.

Preparing and installing Sbrinz
Preparing and installing Sbrinz

Having successfully completed our first task of the day we treated ourselves to some lunch at a local Pizzeria with authentic Ticino Pizza and a great view on the surrounding hills. There was no time to bask though, as the most important task still awaited us. Therefore, back at CSCS, we first got to work putting some nodes from our competition cluster back into our stationary cluster in the server room. Thereafter, we rewired the InfiniBand and Ethernet connections between the nodes of piora to be able to connect to the nodes of the racklette cluster. Now, all there was left to do, was to install an updated version of Rocky Linux on the piora cluster and configure it to accommodate for the new changes. This, however, turned into quite a challenge which we intitally blamed on corrupted downloads or faulty USB drives. Only after an extended time of debugging, we came to the conclusion that the issue might potentially be related to the different processor architectures of our clusters...

But the issues didn’t stop there. Coincidentally, upon returning home, we noticed that the train’s display system had some technical problems itself, albeit Windows related. Nevertheless, everyone returned home safe and happy with the progress we made.

Issues after the (presumably) recent upgrade from Windows 98
Issues after the (presumably) recent upgrade from Windows 98