Student Challenge 2025-2026
Benchmarking AI Factories on MeluXina supercomputer
The objective of this challenge is to prepare students for the upcoming AI Factories in the European Union. These AI Factories will harness the power of next-generation HPC and AI systems to revolutionise data processing, analytics, and model deployment. Through this challenge, students will gain practical skills in AI benchmarking, system monitoring, and real-world deployment scenarios—equipping them to design and operate future AI Factory workflows at scale.
MeluXina Supercomputer: Global description & history
MeluXina is Luxembourg’s flagship supercomputer, built as part of the EuroHPC Joint Undertaking to strengthen Europe’s edge in high-performance computing. It’s been up and running since 2021 and is based on the EVIDEN BullSequana XH2000 platform. With 18 PetaFlops of computing power and 20 PetaBytes of storage, its architecture is tailored to meet the demanding needs of advanced
computational tasks, from AI/ML workloads to scientific simulations.
What makes MeluXina stand out is its modular design, which means it can scale flexibly to support both traditional HPC needs and modern AI applications. Since going live, it’s become an important driver for research, industrial innovation, and digital transformation in Luxembourg and across Europe.
Global plan of the challenge
The challenge will span 4 months, with students organised into teams. It follows these steps:
- Onboarding
- Introduction to MeluXina and best practices for research and commercial HPC use.
- Familiarisation with Slurm, storage systems, and monitoring tools.
- Exploration & Adoption
- In-depth exploration of the assigned topic.
- Define objectives, identify tools and methodologies, and clarify performance metrics.
- Prototyping
- Development of applications, monitoring dashboards, or benchmarking scripts.
- Iterative testing and validation.
- Evaluation & Testing
- Deployment on MeluXina at realistic scales.
- Performance measurements, resource usage profiling, and scalability testing.
- Report Building
- Documentation of methodologies, results, and recommended best practices.
- Creation of comprehensive final reports.
- Defense
- Each team will present their results and defend their findings in a final session.
- Q&A and feedback for improvement.
Challenge topics: Developing a global benchmarking framework for AI Factory workloads
Objectives:
- Design and implement a unified benchmarking framework to evaluate end-to-end performance for critical AI Factory components.
- Include benchmarks for:
- File storage, relational databases (e.g., PostgreSQL), and object storage (e.g., S3)
- Inference servers (vLLM, Triton, etc.)
- Vector databases (Chroma, Faiss, Milvus, Weaviate)
- Enable reproducible, modular benchmarking scenarios using Slurm orchestration.
- Provide comparative insights, performance guidelines, and scalability assessments.
Timeline
- Month 1: Analyse MeluXina’s architecture; survey APIs and services for storage, inference, and retrieval; design benchmark framework architecture.
- Month 2: Develop modular benchmark components:
- Generic services deployment : Storage, Inference, Vector DB
- Load generators based on Dask/Spark/Slurm for inference and retrieval tasks
- Common data schema and metrics collection interface
- Month 3: Execute benchmarks using Slurm; collect throughput, latency, resource usage, and scaling metrics across all components.
- Month 4: Integrate results; generate dashboards and comparisons; finalise documentation and present findings.
Tools & stacks :
- Modular framework using Python, and Slurm
- Python DB drivers (e.g., psycopg2), S3 SDK for storage benchmarks
- GPU-accelerated inference servers in containerised environments
- Dockerised vector DB deployments for scalable search testing
- Prometheus & Grafana for unified monitoring
- Slurm for orchestrated, synchronised benchmark execution
Supervision & Mentoring
Supervision by Dr. Farouk Mansouri:
- Dr. Mansouri Farouk will oversee the challenge, providing strategic and technical supervision with a load of 4 hours per month.
- Responsibilities:
- Overall coordination and alignment with AI Factory vision.
- Weekly progress reviews.
- Technical deep-dives on HPC practices and system optimisation.
Mentoring:
- Technical support and best practices.
- Guidance on tool selection, deployment, and optimisation.
- Assistance with debugging, benchmarking analysis, and report writing.
- Preparation for the final defense.