Two clusters, two challenges
10/15/2024 | Maximilian Dorzweiler
The first (sub)challange of the day began early at Zurich HB, where we boarded the train to Lugano after nearly having one group member miss it. Luckily though, everybody made it, albeit, some a little tired. As always, though, the scenic train ride didn’t disappoint and 3 hours went by like a breeze.
Finally at CSCS, we first met with Hussein (a HPC mastermind who works at CSCS and mentors) who laid out the game plan for the day. It consisted of two tasks:
- Reassemble the competition cluster that just arrived back form ISC24
- Merge our
piora
andracklette
clusters together
We wasted no time and got straight to work on the first task. Off in the warehouse part of the building, there was our ISC cluster, still shelled in its wooden transportation crate. After having freed the cluster from its wooden encasement, we now had to reassemble the nodes. Since some were kindly lended by E4, we had to repackage those and prepare them for shipping. Before doing that though, we took our chance to peek into the (very expensive) nodes to marvel at their complexity and remove some hardware components that belonged to CSCS.
Having successfully completed our first task of the day we treated ourselves to some lunch at a
local Pizzeria with authentic Ticino Pizza and a great view on the surrounding hills.
There was no time to bask though, as the most important task still awaited us. Therefore, back at
CSCS, we first got to work putting some nodes from our competition cluster back into our
stationary cluster in the server room. Thereafter, we rewired the InfiniBand and Ethernet
connections between the nodes of piora
to be able to connect to the nodes of the racklette
cluster. Now, all there was left to do, was to install an updated version of Rocky Linux on the piora
cluster and
configure it to accommodate for the new changes. This, however, turned into quite a challenge which we intitally blamed on corrupted downloads or faulty USB drives. Only after an extended time of debugging, we came to the conclusion that the issue might potentially be related to the different processor architectures of our clusters...
But the issues didn’t stop there. Coincidentally, upon returning home, we noticed that the train’s display system had some technical problems itself, albeit Windows related. Nevertheless, everyone returned home safe and happy with the progress we made.