Marcelo Vinicius Cysneiros Aragao

Scopus Publications

A Sample-Based, Multistage Machine Learning Pipeline for Scalable IoT Threat Detection
Marcelo V. C. Aragão, Tiago de M. Pereira, Felipe A. P. de Figueiredo, Samuel B. Mafra
IEEE Embedded Systems Letters, 2026
The rapid growth of IoT devices demands scalable and efficient threat detection solutions. This paper introduces a sample-based, multi-stage machine learning (ML) pipeline for IoT threat detection using the CICIoT2023 dataset, integrating feature selection, data balancing, and hyperparameter optimization to improve detection accuracy while reducing the computational overhead associated with training. We evaluate, and across binary, multiclass, and fine-grained tasks, showing that with 10% sampling achieves the best trade-off between accuracy and efficiency. Compared to prior methods, our approach eliminates GPU dependence, maintains low latency, and preserves state-of-the-art performance while enabling scalable training for high generalization capacity. Additionally, we provide model selection guidelines based on dataset complexity and computational constraints. The results show that training with a sample-based approach enables effective threat detection on large datasets, producing models that generalize well to diverse IoT attack scenarios, thus ensuring practical applicability in real-world deployments.
Large-Scale Benchmarking of Intrusion Detection Datasets With GPU-Accelerated Data Pipelines, Complexity Analysis, and Model Evaluation
Marcelo V. C. Aragão, Felipe A. P. de Figueiredo, Samuel B. Mafra
International Journal of Intelligent Systems, 2026
Intrusion detection systems (IDSs) are critical for identifying malicious activity in computer networks; however, the evaluation of machine learning (ML)–based IDS remains inconsistent and fragmented. Many existing studies rely on outdated datasets, neglect computational complexity, or use limited performance metrics. Additionally, few works leverage the full potential of modern graphics processing unit (GPU) acceleration. The objective of this study is to establish a scalable, reproducible, and standardized benchmarking framework for intrusion detection. We present an end‐to‐end, GPU‐accelerated pipeline that integrates automated data preprocessing, intrinsic dataset complexity analysis, and multiobjective hyperparameter optimization (HPO) across more than 70 publicly available datasets. Our numerical findings demonstrate that stratified sampling rates of 10% are sufficient to maintain statistical signal integrity, with class probability deviations remaining below 0.01 relative to the full population. Furthermore, feature‐reduced configurations decrease the model size by a median of 60% while maintaining weighted F 1 scores within 0.01 of the baseline. Finally, experimental complexity analysis reveals that the GPU‐accelerated modeling stages achieve empirical time‐invariance ( O (1)), reducing training latency by up to two orders of magnitude compared with traditional central processing unit (CPU) workflows. These contributions offer a rigorous quantitative view of the performance‐efficiency trade‐offs essential for next‐generation IDS evaluation.
A practical evaluation of AutoML tools for binary, multiclass, and multilabel classification
Marcelo V. C. Aragão, Augusto G. Afonso, Rafaela C. Ferraz, Rairon G. Ferreira, Sávio G. Leite, Felipe A. P. de Figueiredo, Samuel B. Mafra
Scientific Reports, 2025
Selecting the most suitable Automated Machine Learning (AutoML) tool is pivotal for achieving optimal performance in diverse classification tasks, including binary, multiclass, and multilabel scenarios. The wide range of frameworks with distinct features and capabilities complicates this decision, necessitating a systematic evaluation. This study benchmarks sixteen AutoML tools, including AutoGluon, AutoSklearn, TPOT, PyCaret, and Lightwood, across all three classification types using 21 real-world datasets. Unlike prior studies focusing on a subset of classification tasks or a limited number of tools, we provide a unified evaluation of sixteen frameworks, incorporating feature-based comparisons, time-constrained experiments, and multi-tier statistical validation. We also compared our findings with four representative prior benchmarks to contextualize our results within the existing literature. A key contribution of our study is the in-depth assessment of multilabel classification, exploring both native and label-powerset representations and revealing that several tools lack robust multilabel capabilities. Our findings demonstrate that AutoSklearn excels in predictive performance for binary and multiclass settings, albeit at longer training times, while Lightwood and AutoKeras offer faster training at the cost of predictive performance on complex datasets. AutoGluon emerges as the best overall solution, balancing predictive accuracy with computational efficiency. Our statistical analysis—at per-dataset, across-datasets, and all-datasets levels—confirms significant performance differences among tools, highlighting accuracy-speed trade-offs in AutoML. These insights underscore the importance of aligning tool selection with specific problem characteristics and resource constraints. The open-source code and reproducible experimental protocols further ensure the study’s value as a robust resource for researchers and practitioners.
Dynamic-Balancing AutoML for Imbalanced Tabular Data With Adaptive Resampling and Complexity-Aware Analysis
Marcelo V. C. Aragão, Tiago de M. Pereira, Mateus de F. Carvalho, Felipe A. P. de Figueiredo, Samuel B. Mafra
International Journal of Intelligent Systems, 2025
Handling class imbalance is a fundamental challenge in supervised learning, particularly in real‐world scenarios where minority classes are critical yet underrepresented. This paper presents a novel dynamic‐balancing pipeline that enhances automated machine learning (AutoML) performance on imbalanced tabular datasets. The proposed approach integrates both traditional and generative resampling techniques with adaptive, class‐specific thresholds, enabling automated and dataset‐sensitive balancing strategies. To assess its generalizability, the pipeline is applied uniformly across binary, multiclass, and multilabel classification tasks. Each configuration is evaluated within an AutoML framework using performance and efficiency metrics, with outcomes validated through statistical testing and effect size analysis. The study also incorporates dataset complexity measures—including feature‐label dependency and class overlap—to investigate how structural characteristics affect balancing efficacy. By combining principled resampling, exhaustive grid search, and rigorous evaluation, the pipeline enables more robust and efficient AutoML workflows. This work contributes a flexible and reproducible framework for addressing class imbalance, particularly in multilabel contexts, and establishes a foundation for scalable, complexity‐aware resampling in automated model development.
Interactive Control System for Automated Guide Vehicles
João Paulo Carvalho Henriques, Daniel Nunes Teixeira, Matheus Brandani Mendes Rosa, Miguel José Abdala Ribeiro, Egídio Raimundo Neto, Marcelo Vinicius Cysneiros Aragão, João Pedro Maglhães de Paula Paiva
2025 13th International Conference on Control Mechatronics and Automation Iccma 2025, 2025
This work aims to present a new mapping methodology for AGV (Automated Guided Vehicle) systems. The goal is to achieve vehicle positioning through a PID (Proportional, Integral, and Derivative) controller, without requiring physical interaction with the path to be followed. The article provides a broad overview of data processing and applying techniques related to achieving accurate positioning without external feedback. This article presents satisfactory results, even though some sensor limitations must be considered.
A Study and Evaluation of Classifiers for Anti-Spam Systems
Marcelo V. C. Aragao, Isaac C. Ferreira, Edvard M. Oliveira, Bruno T. Kuehne, Edmilson M. Moreira, Otavio A. S. Carpinteiro
IEEE Access, 2021
The volume of e-mails has been increasing in recent years. However, since 2005, at least half of these e-mails have been made up of spam. This massive traffic of unwanted messages causes losses to users, such as the excessive and unnecessary use of the bandwidth of their networks, loss of productivity, exposure of inappropriate content to inappropriate audiences etc. This paper proposes the study and the application of machine learning models to the classification of e-mails in existing anti-spam systems and, in particular, in the new anti-spam system Open-MaLBAS. After carrying out many experiments on different data sets, it was possible both to prove the feasibility of the proposal and to develop a powerful combination of techniques, methods, and models that can be successfully applied to the classification of e-mails in anti-spam systems.
The Development of the Open Machine-Learning-Based Anti-Spam (Open-MaLBAS)
Isaac C. Ferreira, Marcelo V. C. Aragão, Edvard M. Oliveira, Bruno T. Kuehne, Edmilson M. Moreira, Otávio A. S. Carpinteiro
IEEE Access, 2021
Spam e-mails are unsolicited e-mails received by users of the e-mail service. Spam e-mails cause serious harm to organizations, for they waste, among other things, their computational and networking resources. To reduce the damage caused by them, organizations use anti-spams. Anti-spams are software systems that classify e-mails in order to separate legitimate from spam e-mails. The best current commercial and open-source anti-spams, and in particular the well-known commercial anti-spam CanIt-PRO, make use of various techniques, such as blacklists and/or SMTP extensions, to classify e-mails. Unfortunately, both blacklists and SMTP extensions have serious drawbacks, such as low scalability and high computational and network costs. This paper introduces the Open Machine-Learning-Based Anti-Spam (Open-MaLBAS). Unlike the best current anti-spams, Open-MaLBAS does not make use of blacklists and SMTP extensions, but only of machine learning models for e-mail classification. Open-MaLBAS was compared to CanIt-PRO in a series of experiments on a database composed of 862,227 real e-mails, collected over three months at the Federal University of Itajubá, Brazil. The e-mails were previously classified by CanIt-PRO. From the experiments, it was observed that Open-MaLBAS was able to correctly classify 81.48% and 98.13% of the e-mails in the database, using, respectively, the two models — Multi-Layer Perceptron and Random Forest — evaluated. In addition, it managed to obtain times of up to 88% shorter than those of CanIt-PRO to classify all e-mails in the database. Open-MaLBAS is implemented in Java language, under free software license, for free use. It is available on GitHub.
Factorial design analysis applied to the performance of SMS anti-spam filtering systems
Marcelo V.C. Aragão, Edielson Prevato Frigieri, Carlos A. Ynoguti, Anderson P. Paiva
Expert Systems with Applications, 2016

RECENT SCHOLAR PUBLICATIONS

Large‐Scale Benchmarking of Intrusion Detection Datasets With GPU‐Accelerated Data Pipelines, Complexity Analysis, and Model Evaluation
MVC Aragão, FAP Figueiredo, SB Mafra
International Journal of Intelligent Systems 2026 (1), 9925751 , 2026
2026.0
Interactive Control System for Automated Guide Vehicles
JPC Henriques, DN Teixeira, MBM Rosa, MJA Ribeiro, ER Neto, ...
2025 13th International Conference on Control, Mechatronics and Automation … , 2025
2025.0
A practical evaluation of AutoML tools for binary, multiclass, and multilabel classification
MVC Aragão, AG Afonso, RC Ferraz, RG Ferreira, SG Leite, ...
Scientific Reports 15 (1), 17682 , 2025
2025.0
Citations: 21
A Sample-Based, Multi-Stage Machine Learning Pipeline for Scalable IoT Threat Detection
MVC Aragão, TM Pereira, FAP de Figueiredo, SB Mafra
IEEE Embedded Systems Letters , 2025
2025.0
Citations: 4
Dynamic‐Balancing AutoML for Imbalanced Tabular Data With Adaptive Resampling and Complexity‐Aware Analysis
MVC Aragão, TM Pereira, MF Carvalho, FAP Figueiredo, SB Mafra
International Journal of Intelligent Systems 2025 (1), 3986105 , 2025
2025.0
Citations: 2
Enhancing AutoML performance for imbalanced tabular data classification: A self-balancing pipeline
MVC Aragão, M de Freitas Carvalho, T de Morais Pereira, ...
2024.0
Citations: 3
ML-based novelty detection and classification of security threats in IoT networks
MVC Aragão, GP Ambrósio, FAP de Figueiredo
presented at the Simpósio Bras. Telecomun. Process. Sinais, São José dos … , 2023
2023.0
Citations: 2
Análise de tráfego de rede com machine learning para identificaçao de ameaças a dispositivos IoT
MVC Aragão, S Mafra, FAP de Figueiredo
Proceedings of the 40th Brazilian Symposium on Telecommunications and Signal … , 2022
2022.0
Citations: 4
A study and evaluation of classifiers for anti-spam systems
MVC Aragao, IC Ferreira, EM Oliveira, BT Kuehne, EM Moreira, ...
IEEE Access 9, 157482-157498 , 2021
2021.0
Citations: 3
The development of the open machine-learning-based anti-spam (open-malbas)
IC Ferreira, MVC Aragão, EM Oliveira, BT Kuehne, EM Moreira, ...
IEEE Access 9, 138618-138632 , 2021
2021.0
Citations: 6
Factorial design analysis applied to the performance of SMS anti-spam filtering systems
MVC Aragao, EP Frigieri, CA Ynoguti, AP Paiva
Expert Systems with Applications 64, 589-604 , 2016
2016.0
Citations: 24
Otimizando o treinamento ea topologia de um decodificador de canal baseado em redes neurais
MVC Aragão, SB Mafra, FAP de Figueiredo
Polar 2, 1 , 0
Citations: 2

MOST CITED SCHOLAR PUBLICATIONS

Factorial design analysis applied to the performance of SMS anti-spam filtering systems
MVC Aragao, EP Frigieri, CA Ynoguti, AP Paiva
Expert Systems with Applications 64, 589-604 , 2016
2016.0
Citations: 24
A practical evaluation of AutoML tools for binary, multiclass, and multilabel classification
MVC Aragão, AG Afonso, RC Ferraz, RG Ferreira, SG Leite, ...
Scientific Reports 15 (1), 17682 , 2025
2025.0
Citations: 21
The development of the open machine-learning-based anti-spam (open-malbas)
IC Ferreira, MVC Aragão, EM Oliveira, BT Kuehne, EM Moreira, ...
IEEE Access 9, 138618-138632 , 2021
2021.0
Citations: 6
A Sample-Based, Multi-Stage Machine Learning Pipeline for Scalable IoT Threat Detection
MVC Aragão, TM Pereira, FAP de Figueiredo, SB Mafra
IEEE Embedded Systems Letters , 2025
2025.0
Citations: 4
Análise de tráfego de rede com machine learning para identificaçao de ameaças a dispositivos IoT
MVC Aragão, S Mafra, FAP de Figueiredo
Proceedings of the 40th Brazilian Symposium on Telecommunications and Signal … , 2022
2022.0
Citations: 4
Enhancing AutoML performance for imbalanced tabular data classification: A self-balancing pipeline
MVC Aragão, M de Freitas Carvalho, T de Morais Pereira, ...
2024.0
Citations: 3
A study and evaluation of classifiers for anti-spam systems
MVC Aragao, IC Ferreira, EM Oliveira, BT Kuehne, EM Moreira, ...
IEEE Access 9, 157482-157498 , 2021
2021.0
Citations: 3
Dynamic‐Balancing AutoML for Imbalanced Tabular Data With Adaptive Resampling and Complexity‐Aware Analysis
MVC Aragão, TM Pereira, MF Carvalho, FAP Figueiredo, SB Mafra
International Journal of Intelligent Systems 2025 (1), 3986105 , 2025
2025.0
Citations: 2
ML-based novelty detection and classification of security threats in IoT networks
MVC Aragão, GP Ambrósio, FAP de Figueiredo
presented at the Simpósio Bras. Telecomun. Process. Sinais, São José dos … , 2023
2023.0
Citations: 2
Otimizando o treinamento ea topologia de um decodificador de canal baseado em redes neurais
MVC Aragão, SB Mafra, FAP de Figueiredo
Polar 2, 1 , 0
Citations: 2
Large‐Scale Benchmarking of Intrusion Detection Datasets With GPU‐Accelerated Data Pipelines, Complexity Analysis, and Model Evaluation
MVC Aragão, FAP Figueiredo, SB Mafra
International Journal of Intelligent Systems 2026 (1), 9925751 , 2026
2026.0
Interactive Control System for Automated Guide Vehicles
JPC Henriques, DN Teixeira, MBM Rosa, MJA Ribeiro, ER Neto, ...
2025 13th International Conference on Control, Mechatronics and Automation … , 2025
2025.0

Marcelo Vinicius Cysneiros Aragao

RESEARCH, TEACHING, or OTHER INTERESTS

Scopus Publications

RECENT SCHOLAR PUBLICATIONS

MOST CITED SCHOLAR PUBLICATIONS