Reducing Language Model Inference Latency using CPU-Assisted Serving Theodoros Aslanidis, Sokol Kosta, Raffaele Montella, Spyros Lalis, Dimitris Chatzopoulos Euromlsys 2026 Proceedings of the 2026 the 6th European Workshop on Machine Learning and Systems, 2026 The growing demand for language model (LM) inference is placing significant strain on datacenter resources, particularly GPUs, which are costly and often scarce. This leads service operators to face long request queues or to throttle users to cope with limited GPU availability. The conventional response is to scale out GPU-equipped servers, but this incurs substantial capital and operational expenses. In this work, we propose an alternative strategy that leverages idle CPU nodes, a resource commonly available in modern datacenter clusters. Our approach exploits GPU virtualization to forward GPU API calls from CPU-only nodes to remote GPUs, while performing CPU-intensive computations locally. For LMs where the primary bottleneck is CPU execution rather than GPU utilization, this mechanism allows idle CPUs to effectively augment serving capacity without requiring additional GPUs. Assuming high-speed interconnects typical of modern datacenters, the overhead of remote CPU-GPU communication is amortized, yielding improvements in job completion time and overall throughput. By converting idle CPUs into cost-free contributors to LM serving, our method reduces request queueing delays and provides a practical pathway to increase service efficiency without incurring additional GPU provisioning costs or sacrificing model accuracy, thereby saving on operational expenses. Extensive experimentation on a testbed with ten popular LMs and across four widely used datasets demonstrates that our ready-to-use open-source system can reduce LM inference-serving delays by up to 98%.
Streaming I/O for scientific workflow engine acceleration Simone Perrotta, Ciro Giuseppe De Vita, Gennaro Mellone, Marco Edoardo Santimaria, Massimo Torquati, Javier Garcia Blas, Raffaele Montella Future Generation Computer Systems, 2026 Scientific workflows are increasingly characterized by complex task dependencies and large-scale data exchanges, which place significant pressure on the input/output (I/O) systems of traditional Workflow Engines (WFEs). These challenges are particularly evident in data-intensive and real-time processing contexts, where conventional disk-based I/O mechanisms often become performance bottlenecks. This paper presents an approach to enhancing the DAGonStar scientific workflow engine by integrating CAPIO, a middleware designed to support memory-based streaming I/O. The integration combines DAGonStar’s orchestration capabilities with CAPIO’s efficient data handling to better support workflows operating on continuous or large-scale datasets. We describe the architectural modifications introduced to enable this collaboration and provide an analysis of the resulting system. The proposed solution aims to improve the responsiveness and flexibility of scientific workflows by streamlining data transfers and simplifying task coordination. This work contributes to the evolution of workflow systems toward more efficient and scalable models for scientific computing. • Integration of memory-based streaming I/O into scientific workflow engines. • Automated generation of synchronization rules through workflow dependency analysis of DAGonStar. • Enhanced pipeline’s tasks execution efficiency via system call interception through the usage of CAPIO. • Benchmark evaluation showing up to 33% reduction in execution time with DAGonCAPIO. • Support for both local batch and SLURM-based distributed executions.
DAGonStore: Reliable Data Management for Workflows on the Computing Continuum with DynoStore and DAGonStar Dante D. Sanchez-Gallegos, J. L. Gonzalez-Compean, Jesus Carretero, Raffaele Montella Proceedings of 2025 Workshops of the International Conference on High Performance Computing Network Storage and Analysis Sc 2025 Workshops, 2025 The computing continuum has emerged as a promising paradigm for decentralized data processing. This approach brings computation closer to data sources, reducing latency and enabling faster insights. However, managing such distributed systems introduces new challenges, particularly in ensuring the availability and reliability of data across heterogeneous and failure-prone environments. In this paper, we focus on addressing these challenges by introducing DAGonStore as a novel component of the DAGonStar workflow engine, integrating it with the DynoStore wide-area storage system to provide resilient and location-transparent data access. DAGonStore implements reliability and availability schemes based on erasure codes and utilization-aware load-balancing to guarantee that input and output data remain accessible and consistent, even in the presence of storage node failures or disconnections. We validate our approach through different tests, demonstrating that DAGonStore enables scalable and fault-tolerant workflow execution across the computing continuum with minimal user intervention.
Advanced Data Elaboration Angelo Ciaramella, Raffaele Montella Digital Platforms from Technical Foundations to Legal and Economic Implications Volume 1, 2025
Federated Learning for Distributed Weather Forecasting: A Practical Approach on Real Multidimensional Georeferenced Data Ceur Workshop Proceedings, 2025
G-Litter Marine Litter Dataset Augmentation with Diffusion Models and Large Language Models on GPU Acceleration Gennaro Mellone, Ciro Giuseppe De Vita, Emanuel Di Nardo, Giuseppe Coviello, Diana Di Luccio, Pietro Patrizio Ciro Aucelli, Angelo Ciaramella, Raffaele Montella Proceedings 33rd Euromicro International Conference on Parallel Distributed and Network Based Processing Pdp 2025, 2025 Marine litter detection is crucial for environmental monitoring, yet the imbalance in existing datasets limits model performance in identifying various types of waste accurately. This paper presents an efficient data augmentation pipeline that combines generative diffusion models (e.g., Stable Diffusion) and Large Language Models (LLMs) to expand the G-Litter dataset, a marine litter dataset designed for autonomous detection in heterogeneous environments. Leveraging scalable diffusion models for image generation and Alpaca LLMs for diverse prompt generation, our approach augments underrepresented classes by generating over 200 additional images per class, significantly improving the dataset’s balance. Training G-Litter augmented dataset using YOLOv8 for object detection demonstrated an increase in detection performance, improving recall by 7.82% and mAP50 by 3.87% (compared with baseline results). This study emphasizes the potential for combining generative AI with HPC resources to automate data augmentation on large-scale, unstructured datasets, particularly in edge computing contexts for real-time marine monitoring. The models were tested on real videos captured during simulated missions, demonstrating a superior ability to detect submerged objects in dynamic scenarios. These results highlight the potential of generative AI techniques to improve dataset quality and detection model performance, laying the foundation for further expansion in real-time marine monitoring.
Requirements analysis in ARCAD-IA project Ceur Workshop Proceedings, 2025
Preserving and improving the legacy of eScience: the GLOBO experience Carmine Coppola, Guido Davoli, Federico Fabiano, Ciro Giuseppe De Vita, Diana Di Luccio, Pasquale Corvino, Antonella Pirozzi, Andrea Alessandri, Raffaele Montella Proceedings 2025 IEEE International Conference on E Science Escience 2025, 2025
AI and HPC for intense rain event early warning leveraging real-time weather radar Diana Di Luccio, Ciro Giuseppe De Vita, Gennaro Mellone, Dante D. Sánchez-Gallegos, Pasquale Corvino, Mario Di Sarno, Pasquale De Luca, Emanuel Di Nardo, Vincenzo Capozzi, Vincenzo Bucciero, Raffaele Montella Proceedings 2025 IEEE International Conference on E Science Escience 2025, 2025
Message from the General Chair Proceedings 2024 32nd Euromicro International Conference on Parallel Distributed and Network Based Processing Pdp 2024, 2024
Adaptive HPC Input/Output Systems Jesus Carretero, Javier Garcia-Blas, André Brinkmann, Marc Vef, Jean-Baptiste Besnard, Massimo Torquati, Yi Ju, Raffaele Montella Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2024
Safeguarding the Marine and Coastal Environment with Artificial Intelligence Ceur Workshop Proceedings, 2024
Re-assessing the Usability of FGPE Programming Learning Environment with SUS and UMUX Proceedings of the Information Systems Education Conference Isecon, 2024
GAMAI, an AI-Powered Programming Exercise Gamifier Tool Raffaele Montella, Ciro Giuseppe De Vita, Gennaro Mellone, Tullio Ciricillo, Dario Caramiello, Diana Di Luccio, Sokol Kosta, Robertas Damasevicius, Rytis Maskeliunas, Ricardo Queiros, Jakub Swacha Communications in Computer and Information Science, 2024
Malleability Techniques for HPC Systems Jesus Carretero, David Exposito, Alberto Cascajo, Raffaele Montella Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2023
Message from the General Chairs: PDP 2023 Raffaele Montella, Angelo Ciaramella, Marco Lapegna, Marco Danelutto, Dora Blanco Heras Proceedings 2023 31st Euromicro International Conference on Parallel Distributed and Network Based Processing Pdp 2023, 2023
Message from the Organizing Committee Chairs: PDP 2023 Raffaele Montella, Angelo Ciaramella, Marco Lapegna, Marco Danelutto, Dora Blanco Heras Proceedings 2023 31st Euromicro International Conference on Parallel Distributed and Network Based Processing Pdp 2023, 2023
AI-based Monitoring of Coastal and Marine Environments Ceur Workshop Proceedings, 2023
Parallel and hierarchically-distributed Shoreline Alert Model (SAM) Ciro Giuseppe de Vita, Gennaro Mellone, Aniello Florio, Catherine Alessandra Torres Charles, Diana Di Luccio, Marco Lapegna, Guido Benassai, Giorgio Budillon, Raffaele Montella Proceedings 2023 31st Euromicro International Conference on Parallel Distributed and Network Based Processing Pdp 2023, 2023
Artificial Intelligence for mussels farm quality assessment and prediction Ciro Giuseppe de Vita, Gennaro Mellone, Francesca Barchiesi, Diana Di Luccio, Angelo Ciaramella, Raffaele Montella 2022 IEEE International Workshop on Metrology for the Sea Learning to Measure Sea Health Parameters Metrosea 2022 Proceedings, 2022
AIQUAM: Artificial Intelligence-based water QUAlity Model Ciro Giuseppe De Vita, Gennaro Mellone, Diana Di Luccio, Sokol Kosta, Angelo Ciaramella, Raffaele Montella Proceedings 2022 IEEE 18th International Conference on E Science Escience 2022, 2022
The Italian research on HPC key technologies across EuroHPC Marco Aldinucci, Giovanni Agosta, Antonio Andreini, Claudio A. Ardagna, Andrea Bartolini, Alessandro Cilardo, Biagio Cosenza, Marco Danelutto, Roberto Esposito, William Fornaciari, Roberto Giorgi, Davide Lengani, Raffaele Montella, Mauro Olivieri, Sergio Saponara, Daniele Simoni, Massimo Torquati Proceedings of the 18th ACM International Conference on Computing Frontiers 2021 Cf 2021, 2021
A Roadmap to Gamify Programming Education Jakub Swacha, Ricardo Queirós, José Carlos Paiva, José Paulo Leal, Sokol Kosta, et al. Openaccess Series in Informatics, 2020
Evidences of atmospheric pressure drop and sea level alteration in the Ligurian Sea 2019 Imeko Tc19 International Workshop on Metrology for the Sea Learning to Measure Sea Health Parameters Metrosea 2019, 2020
StormSeeker: A machine-learning-based mediterranean storm tracer Raffaele Montella, Diana Di Luccio, Angelo Ciaramella, Ian Foster Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2019
DagOn∗: Executing Direct Acyclic Graphs as Parallel Jobs on Anything Raffaele Montella, Diana Di Luccio, Sokol Kosta Proceedings of Works 2018 13th Workshop on Workflows in Support of Large Scale Science Held in Conjunction with Sc 2018 the International Conference for High Performance Computing Networking Storage and Analysis, 2018
Shoreline rotation analysis of embayed beaches in the Central Thyrrenian Sea Guido Benassai, Diana Di Luccio, Luigi Mucerino, Gianluigi Di Paola, Carmen Maria Rosskopf, Giovanni Pugliano, Umberto Robustelli, Raffaele Montella 2018 IEEE International Workshop on Metrology for the Sea Learning to Measure Sea Health Parameters Metrosea 2018 Proceedings, 2018
Processing of crowd-sourced data from an internet of floating things Raffaele Montella, Diana Di Luccio, Livia Marcellino, Ardelio Galletti, Sokol Kosta, Alison Brizius, Ian Foster Proceedings of Works 2017 12th Workshop on Workflows in Support of Large Scale Science Held in Conjunction with Sc 2017 the International Conference for High Performance Computing Networking Storage and Analysis, 2017
Numerical and implementation issues in food quality modeling for human diseases prevention A. Galletti, R. Montella, L. Marcellino, A. Riccio, D. Di Luccio, A. Brizius, I. Foster Healthinf 2017 10th International Conference on Health Informatics Proceedings Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies Biostec 2017, 2017
Enabling android-based devices to high-end GPGPUs Raffaele Montella, Carmine Ferraro, Sokol Kosta, Valentina Pelliccia, Giulio Giunta Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2016
Virtualizing CUDA enabled GPGPUs on ARM clusters Raffaele Montella, Giulio Giunta, Giuliano Laccetti, Marco Lapegna, Carlo Palmieri, Carmine Ferraro, Valentina Pelliccia Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2016
Applications of the FACE-IT portal and workflow engine for operational food quality prediction and assessment: Mussel farm monitoring in the Bay of Napoli, Italy Ceur Workshop Proceedings, 2016
FACE-IT: A science gateway for food security research Raffaele Montella, David Kelly, Wei Xiong, Alison Brizius, Joshua Elliott, Ravi Madduri, Ketan Maheshwari, Cheryl Porter, Peter Vilter, Michael Wilde, Meng Zhang, Ian Foster Concurrency and Computation Practice and Experience, 2015
SOLE: Linking research papers with science objects Quan Pham, Tanu Malik, Ian Foster, Roberto Di Lauro, Raffaele Montella Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2012
State of posidonia oceanica meadows around the sardinian coast Proceedings of the 7th International Conference on the Mediterranean Coastal Environment Medcoast 2005, 2005
RECENT SCHOLAR PUBLICATIONS
Reducing Language Model Inference Latency using CPU-Assisted Serving T Aslanidis, S Kosta, R Montella, S Lalis, D Chatzopoulos Proceedings of the Sixth European Workshop on Machine Learning and Systems … , 2026 2026
Advanced Data Elaboration A Ciaramella, R Montella Digital Platforms-From Technical Foundations to Legal and Economic … , 2026 2026
New digital services and goods M Agovino, A Albanese, G Andreotti, C Angrisani, O Ardovino, E Arezzo, ... Digital Platforms-From Technical Foundations to Legal and Economic Implications , 2026 2026
Digital Platforms-From Technical Foundations to Legal and Economic Implications M Agovino, A Albanese, G Andreotti, C Angrisani, O Ardovino, E Arezzo, ... Springer , 2026 2026
DAGonStore: Reliable Data Management for Workflows on the Computing Continuum with DynoStore and DAGonStar DD Sanchez-Gallegos, JL Gonzalez-Compean, J Carretero, R Montella Proceedings of the SC'25 Workshops of the International Conference for High … , 2025 2025
Computational resources for LLM computing: The need for GPU virtualization TH Cheng, R Montella, KE Skouby, S Kosta 2025
AI and HPC for intense rain event early warning leveraging real-time weather radar D Di Luccio, CG De Vita, G Mellone, DD Sánchez-Gallegos, P Corvino, ... 2025 IEEE International Conference on eScience (eScience), 315-316 , 2025 2025
Preserving and improving the legacy of eScience: the GLOBO experience C Coppola, G Davoli, F Fabiano, CG De Vita, D Di Luccio, P Corvino, ... 2025 IEEE International Conference on eScience (eScience), 329-330 , 2025 2025
Directed Acyclic Graph on Cross-Application Programmable I/O: Adding streaming flavour to scientific workflows S Perrotta, ME Santimaria, CG De Vita, M Torquati, D Di Luccio, P Corvino, ... 2025 IEEE International Conference on eScience (eScience), 313-314 , 2025 2025
A coupled Lagrangian-AI hierarchical and heterogeneous model for predicting bacteria contamination in farmed mussels CG De Vita, G Mellone, D Di Luccio, J Garcia-Blas, F Barchiesi, ... Future Generation Computer Systems, 108108 , 2025 2025 Citations: 1
Streaming I/O for scientific workflow engine acceleration S Perrotta, CG De Vita, G Mellone, ME Santimaria, M Torquati, JG Blas, ... Future generation computer systems, 107978 , 2025 2025 Citations: 3
Euro-Par 2024: Parallel Processing Workshops: Euro-Par 2024 International Workshops, Madrid, Spain, August 26–30, 2024, Proceedings, Part I S Caino-Lores, D Zeinalipour, TD Doudali, DE Singh, GEM Garzón, ... Springer Nature , 2025 2025
Exploring the Effectiveness of Slot Attention-Based Classifier in Detecting Underwater Marine Litter: A Study G Mellone, E Di Nardo, CG De Vita, R Montella, PPC Aucelli, ... Advanced Neural Artificial Intelligence: Theories and Applications, 203-212 , 2025 2025
G-Litter Marine Litter Dataset Augmentation with Diffusion Models and Large Language Models on GPU Acceleration G Mellone, CG De Vita, E Di Nardo, G Coviello, D Di Luccio, PPC Aucelli, ... 2025 33rd Euromicro International Conference on Parallel, Distributed, and … , 2025 2025 Citations: 2
Federated Learning for Distributed Weather Forecasting: A Practical Approach on Real Multidimensional Georeferenced Data A Di Vicino, G Fiorillo, L Galluccio, R Montella 2025
Requirements analysis in ARCAD-IA project PPC Aucelli, F Camastra, A Ciaramella, E Di Nardo, A Ferone, A Maratea, ... 2025
Certamen Artificialis Intelligentia: Evaluating AI in Solving AI-generated Programming Exercises C Coppola, S Perrotta, C Giuseppe De Vita, G Mellone, D Di Luccio, ... 2025
In-Browser C++ Interpreter for Lightweight Intelligent Programming Learning Environments T Blažauskas, A Rauba, J Swacha, R Montella, R Maskeliunas 6th International Computer Programming Education Conference (ICPEC 2025), 14 … , 2025 2025
Developing a GIS Framework for Effective Weather Routing in the Tyrrhenian Sea E Alcaras, G Budillon, Y Cotroneo, D Di Luccio, R Montella, C Parente 2024 IEEE International Workshop on Metrology for the Sea; Learning to … , 2024 2024 Citations: 1
A high-performance, parallel, and hierarchically distributed model for coastal run-up events simulation and forecasting: D. Di Luccio et al. D Di Luccio, CG De Vita, A Florio, G Mellone, CA Torres Charles, ... The Journal of Supercomputing 80 (15), 22748-22769 , 2024 2024 Citations: 7
MOST CITED SCHOLAR PUBLICATIONS
A GPGPU transparent virtualization component for high performance computing clouds G Giunta, R Montella, G Agrillo, G Coviello Euro-Par 2010-Parallel Processing, 379-391 , 2010 2010 Citations: 319
Wave run-up prediction and observation in a micro-tidal beach D Di Luccio, G Benassai, G Budillon, L Mucerino, R Montella, ... Natural Hazards and Earth System Sciences 18 (11), 2841-2857 , 2018 2018 Citations: 62
Coastal marine data crowdsourcing using the internet of floating things: improving the results of a water quality model D Di Luccio, A Riccio, A Galletti, G Laccetti, M Lapegna, L Marcellino, ... IEEE Access 8, 101209-101223 , 2020 2020 Citations: 58
Rip current evidence by hydrodynamic simulations, bathymetric surveys and UAV observation G Benassai, P Aucelli, G Budillon, M De Stefano, D Di Luccio, G Di Paola, ... Natural Hazards and Earth System Sciences 17 (9), 1493-1503 , 2017 2017 Citations: 56
DagOn*: Executing direct acyclic graphs as parallel jobs on anything R Montella, D Di Luccio, S Kosta 2018 IEEE/ACM Workflows in Support of Large-Scale Science (WORKS), 64-73 , 2018 2018 Citations: 52
On the virtualization of CUDA based GPU remoting on ARM and X86 machines in the GVirtuS framework R Montella, G Giunta, G Laccetti, M Lapegna, C Palmieri, C Ferraro, ... International Journal of Parallel Programming 45 (5), 1142-1163 , 2017 2017 Citations: 45
Monitoring and modelling coastal vulnerability and mitigation proposal for an archaeological site (Kaulonia, Southern Italy) D Di Luccio, G Benassai, G Di Paola, CM Rosskopf, L Mucerino, ... Sustainability 10 (6), 2017 , 2018 2018 Citations: 43
WaComM: A parallel Water quality Community Model for pollutant transport and dispersion operational predictions R Montella, D Di Luccio, P Troiano, A Riccio, A Brizius, I Foster 2016 12th International Conference on Signal-Image Technology & Internet … , 2016 2016 Citations: 43
Accelerating Linux and Android applications on low‐power devices through remote GPGPU offloading R Montella, S Kosta, D Oro, J Vera, C Fernández, C Palmieri, D Di Luccio, ... Concurrency and Computation: Practice and Experience 29 (24), e4286 , 2017 2017 Citations: 40
Shoreline rotation analysis of embayed beaches by means of in situ and remote surveys D Di Luccio, G Benassai, G Di Paola, L Mucerino, A Buono, CM Rosskopf, ... Sustainability 11 (3), 725 , 2019 2019 Citations: 39
A fast, secure, reliable, and resilient data transfer framework for pervasive IoT applications R Montella, M Ruggieri, S Kosta IEEE INFOCOM 2018-IEEE conference on computer communications workshops … , 2018 2018 Citations: 39
FACE‐IT: A science gateway for food security research R Montella, D Kelly, W Xiong, A Brizius, J Elliott, R Madduri, ... Concurrency and Computation: Practice and Experience 27 (16), 4423-4436 , 2015 2015 Citations: 39
pPOM: A nested, scalable, parallel and Fortran 90 implementation of the Princeton Ocean Model G Giunta, P Mariani, R Montella, A Riccio Environmental Modelling & Software 22 (1), 117-122 , 2007 2007 Citations: 39
A grid computing based virtual laboratory for environmental simulations I Ascione, G Giunta, P Mariani, R Montella, A Riccio Euro-Par 2006 Parallel Processing, 1085-1094 , 2006 2006 Citations: 39
An efficient pattern-based approach for workflow supporting large-scale science: The DagOnStar experience DD Sánchez-Gallegos, D Di Luccio, S Kosta, JL Gonzalez-Compean, ... Future Generation Computer Systems 122, 187-203 , 2021 2021 Citations: 38
Virtualizing high-end GPGPUs on ARM clusters for the next generation of high performance cloud computing R Montella, G Giunta, G Laccetti Cluster computing 17 (1), 139-152 , 2014 2014 Citations: 37
Virtualizing general purpose GPUs for high performance cloud computing: an application to a fluid simulator R Di Lauro, F Giannone, L Ambrosio, R Montella 2012 IEEE 10th International Symposium on Parallel and Distributed … , 2012 2012 Citations: 37
Using GPGPU accelerated interpolation algorithms for marine bathymetry processing with on-premises and cloud based computational resources L Marcellino, R Montella, S Kosta, A Galletti, D Di Luccio, V Santopietro, ... International Conference on Parallel Processing and Applied Mathematics, 14-24 , 2017 2017 Citations: 36
SOLE: linking research papers with science objects Q Pham, T Malik, I Foster, R Di Lauro, R Montella International Provenance and Annotation Workshop, 203-208 , 2012 2012 Citations: 36
Workflow-based automatic processing for internet of floating things crowdsourced data R Montella, D Di Luccio, L Marcellino, A Galletti, S Kosta, G Giunta, ... Future generation computer systems 94, 103-119 , 2019 2019 Citations: 35