Edixon Parraga

@uab.cat

Universitat Autonoma de Barcelona

RESEARCH, TEACHING, or OTHER INTERESTS

Computer Engineering, Hardware and Architecture, Artificial Intelligence
7

Scopus Publications

Scopus Publications

  • Parallel I/O analysis in distributed deep learning applications on high-performance computing
    Edixon Parraga, Betzabeth Leon, Sandra Mendez, Dolores Rexachs, Emilio Luque
    Journal of Supercomputing, 2025
    Distributed deep learning (DDL) applications generate heavy input/output (I/O) workloads that can create bottlenecks in high-performance computing (HPC) systems. Their optimal I/O configuration depends on factors such as access patterns, storage hardware, dataset size, and execution scale. This study proposes a systematic methodology for characterizing and optimizing I/O behavior in DDL applications, represented through the deep learning I/O benchmark (DLIO), and validated with the real DeepGalaxy application. We evaluate access modes, file formats, and Lustre file system configurations, demonstrating that stripe counts optimized for the access pattern and application scale can reduce I/O and execution times, achieving up to 18 GiB/s of bandwidth and a 5X increase in IOPS. HDF5 provides balanced performance, while TFRecord stands out in bandwidth-intensive scenarios. Shared access minimizes contention and improves scalability in multi-node executions. The results are consolidated into configuration guidelines that offer practical recommendations for practitioners to tune DDL applications for efficient execution in HPC environments.
  • Deep learning data handling: exploring file formats and access strategies
    Edixon Parraga, Betzabeth Leon, Sandra Mendez, Dolores Rexachs, Daniel Franco, Emilio Luque
    Cluster Computing, 2025
    Accessing large volumes of data presents a significant challenge when finding the best strategies to manage the data efficiently. Deep learning applications require the processing of massive amounts of data, which implies a considerable access Input/Output (I/O) load on computer systems. During training, interaction with the I/O system intensifies as files are continuously accessed to read data sets. This persistent access could overload the file system, which, in turn, adversely impacts application performance and efficient storage system utilization. Several factors influence the I/O of these applications, and one of the most relevant is the variety of file formats in which datasets can be stored. The choice of file format depends on the use case, as each format defines how information is stored. Some file formats have features that promote efficient access to datasets during the training phase, which can improve the performance of deep learning applications. Likewise, it is also important that the format adapts to the context, in this case, to an HPC system with a parallel file system. We will propose an image preprocessing method for cases where performance improves with parallel file access. This method will transform image data sets from their original JPEG format to the more efficient HDF5 format. Thus, our research will focus on the importance of understanding the mode of data access, spatial and temporal patterns, and the level of parallelism in file access to determine whether it is advisable to change the storage format.
  • An Empirical Method for Processing I/O Traces to Analyze the Performance of DL Applications
    Edixon Parraga, Betzabeth Leon, Sandra Mendez, Dolores Rexachs, Remo Suppi, Emilio Luque
    Communications in Computer and Information Science, 2025
  • Analyzing the Influence of File Formats on I/O Patterns in Deep Learning
    Betzabeth Leon, Edixon Parraga, Sandra Mendez, Dolores Rexachs, Remo Suppi, Emilio Luque
    Communications in Computer and Information Science, 2025
  • A Methodical Approach to Parallel IO Analysis in Distributed Deep Learning Applications
    Edixon Parraga, Betzabeth Leon, Sandra Mendez, Dolores Rexachs, Remo Suppi, Emilio Luque
    Communications in Computer and Information Science, 2025
  • File Access Patterns of Distributed Deep Learning Applications
    Edixon Parraga, Betzabeth Leon, Sandra Mendez, Dolores Rexachs, Emilio Luque
    Communications in Computer and Information Science, 2022
  • Analyzing the I/O Patterns of Deep Learning Applications
    Edixon Párraga, Betzabeth León, Román Bond, Diego Encinas, Aprigio Bezerra, Sandra Mendez, Dolores Rexachs, Emilio Luque
    Communications in Computer and Information Science, 2021