Distilling Apple DepthPro for RGB-LiDAR depth estimation Manuel Abreu, Luís Garrote, Urbano J. Nunes Robotics and Autonomous Systems, 2026 This work presents a two-stage autoencoder architecture for improving depth estimation in Autonomous Mobile Robot (AMR) applications by distilling Apple’s DepthPro model and integrating LiDAR data. The work addresses critical limitations in existing depth estimation technologies, particularly when applied to warehouse robotics, where accurate depth perception is essential for tasks like pallet picking and placing. The two-stage autoencoder combines the strengths of RGB-based depth estimation with sparse but accurate LiDAR measurements. The first stage involves knowledge distillation of the Apple DepthPro model to maintain structural integrity while creating a more efficient architecture suitable for mobile robots (ResNet18, ResNet50, MobileNetV2, Swin-T, ViT-B-16, and MobileNetV3-S). The second stage incorporates LiDAR point clouds projected to image space, in the loss function, to align depth estimation with real-world geometric measurements while preserving the structural integrity from the first stage. The two-stage architecture explores three variants of autoencoder designs with different multimodal fusion strategies: Variant I uses three independent encoders processing RGB, depth, and segmentation data simultaneously; Variant II employs two encoders handling bimodal pairs (RGB with depth or RGB with segmentation); and Variant III serves as a single encoder baseline using only RGB or depth data. Each variant is evaluated with both direct concatenation and attention-based feature fusion mechanisms. Evaluation was carried out with real-world data collected in a warehouse environment, where various combinations of architecture variants, fusion strategies, and loss function combinations were evaluated. The reported results demonstrate improvements in accuracy, perceptual quality, and robustness across varying scenes and lighting conditions, using the proposed two-stage approach. • Two-Stage depth estimation autoencoder architecture. In the first stage the depth estimation model is distilled from Apple’s DepthPro for structural geometry integrity. Second stage refines the estimated depth with accurate metric (LiDAR) values via fine-tuning, considering depth consistency and structural geometry. • Real-world evaluation using a dataset acquired with an AMR with onboard LiDAR and camera sensors, in a industrial warehouse.
Generalization of Machine and Deep Learning Models for Brain-Computer Interfaces Across Sessions and Paradigms in a Completely Locked-In Patient Luís Garrote, Rute Bettencourt, João Perdiz, Gabriel Pires, Urbano J. Nunes IEEE International Workshop on Robot and Human Communication Ro Man, 2025 Brain-Computer Interfaces (BCIs) are one of the few remaining communication options for individuals in a Completely Locked-In State (CLIS), where all voluntary motor functions are lost. However, decoding electroencephalographic (EEG) signals in CLIS is particularly challenging due to low signal-to-noise ratios, high intra- and inter-session variability, and cognitive fluctuations. In this study, we systematically evaluate classical and deep learning-based (DL) classification methods on a longitudinal P300-based BCI dataset acquired from a CLIS patient over ten months, comprising seven different stimulation paradigms.A systematic approach is followed to assess model generalization across BCI sessions and paradigms. Overall, more than 40 approaches are compared, including spatial filters for feature extraction with standard classifiers, as well as DL methods based on CNNs and Attention-based architectures. All methods are evaluated with raw input data and three different normalization strategies. Additionally, SMOTE data augmentation is applied to upsample the minority class. The results show high generalization performance across sessions and paradigms, with some approaches achieving nearly 100% performance. Normalization strategies significantly influence performance, while SMOTE often leads to performance degradation. These findings offer valuable insights for designing more robust BCI systems tailored to CLIS users, showing that collecting data across sessions and multiple BCI paradigms can improve BCI performance, while reducing or eliminating the need for per session calibration. Despite the very promising results, they are based on offline analysis. Thus, the best-performing approaches now require online validation for deployment in real-world CLIS scenarios.
A deep learning-based global and segmentation-based semantic feature fusion approach for indoor scene classification Ricardo Pereira, Tiago Barros, Luís Garrote, Ana Lopes, Urbano J. Nunes Pattern Recognition Letters, 2024 This work proposes a novel approach that uses a semantic segmentation mask to obtain a 2D spatial layout of the segmentation-categories across the scene, designated by segmentation-based semantic features (SSFs). These features represent, per segmentation-category, the pixel count, as well as the 2D average position and respective standard deviation values. Moreover, a two-branch network, GS2F2App, that exploits CNN-based global features extracted from RGB images and the segmentation-based features extracted from the proposed SSFs, is also proposed. GS2F2App was evaluated in two indoor scene benchmark datasets: the SUN RGB-D and the NYU Depth V2, achieving state-of-the-art results on both datasets.
Multimodal Human Detection using RGB, Thermal and LiDAR modalities for Robotic Perception Kennedy O. S. Mota, Luís Garrote, Cristiano Premebida IEEE International Conference on Automation Science and Engineering, 2024 People detection is a relevant research topic in artificial perception with wide range of applications from security, surveillance, robotics to autonomous driving. Overcoming challenges in this field involves advanced algorithms, combination of machine learning approaches, as well as the use of sensory data e.g., from cameras and LiDARs. This work addresses the problem of people detection using YOLO, a state-of-the-art object detection method, trained on three distinct data sources LiDAR, RGB (color) and ‘thermal’ (long-wave infra-red) images. The rationale for combining multiple-sensory representation relies on the assumption that each sensor has its own advantages and disadvantages, but together they normally complement each other - specially in real-world conditions. LiDAR contributes to a physically-interpretable mapping of the environment, providing precise information regarding size/dimension and location of the objects, while RGB and thermal provide relevant textural features. The sensors have been calibrated w.r.t. each other thus, allowing the LiDAR’s point-clouds to be projected into the image plane, followed by an up-sampling step, to create dense-depth maps (DM) that enable direct use of the YOLO framework. To support the experiments, a new multi-sensory dataset has been collected using a mobile robot. Besides single-modality models, this paper also explores early and late-fusion strategies. Finally, the new dataset has been made available in a Github repository 1.
Exploiting 3D Grids for Indoor SLAM in Featureless Scenarios Luís Garrote, Ulisses Reverendo, Urbano J. Nunes 2024 IEEE International Conference on Autonomous Robot Systems and Competitions Icarsc 2024, 2024 Accurate multi-sensor localization is a challenging task in the navigation of AMRs. Precise localization strategies are essential for AMRs to be able to perform with safety their missions in their surrounding environments. This work proposes a novel ROS-based modular 3D grid-based particle filter-based framework that can be used for Simultaneous Localization and Mapping (SLAM) or as a standalone robust localization strategy. The framework uses odometry and 3D LiDAR data as inputs for localization and SLAM. To further improve localization and representation alignment, a pose refinement stage is employed using Levenberg-Marquardt minimization. The refinement stage considers keypoints in the environment to improve localization and uses the raw 3D point cloud for map maintenance. A pyramid-like 3D grid resolution is used to aid the refinement of the representation, improving pose estimates in featureless scenarios. Experimental validation was carried out with data acquired using an in-house platform, in a set of indoor and semi-structured scenarios comprised of critical featureless areas. The obtained results highlight the robustness of the proposed framework in both SLAM and localization tasks. The code (ROS package) is made available in a GitHub repository 1.
Multimodal Human Detection Using YOLO and Representation Learning for Robot Perception Kennedy O. S. Mota, Diogo S. De Oliveira, Luís Garrote, Cristiano Premebida 2024 7th Iberian Robotics Conference Robot 2024, 2024 This work concentrates on the problem of multisensor people detection using YOLO trained on four distinct modalities: depth and intensity LiDAR-maps, RGB, and ‘thermal’ images. RGB cameras, ubiquitous in this application domain, offer great resolution but struggle with adverse lighting conditions resulting in overexposed or underexposed images which then impact negatively on the performance of the algorithms. Thermal (long-wave infrared) cameras are more resilient against varying light conditions and provide complementary textural features, although with lower resolution when compared to RGB cameras. LiDAR sensors, while having a significantly low resolution, contribute to a physically interpretable mapping of the environment providing precise information regarding size/dimension and location of the objects. The main goal of this work is to tackle people detection using deep-models trained on single and multi-modality representations. To support the experimental part this work introduces a new multimodal dataset (called MID-3K). MID-3K allows the development of data fusion strategies by leveraging four modalities (obtained from three distinct exteroceptive sensors mounted on a mobile robot). Leveraging on a single-modality YOLO framework, we propose a multimodal representation learning approach to improve the baseline performance and to capture more relevant features across all input modalities. The evaluation of the proposed detection pipeline is conducted on the MID-3K dataset, where the reported results are grounded on state-of-the-art performance measures. The new dataset is available in a GitHub repository1 1MID-3K dataset: https://kennedyk1.github.io/MID-3K/.
Two-Stream Architecture with Contrastive and Self-Supervised Attention Feature Fusion for Error-related Potentials Classification Luís Garrote, João Perdiz, Mine Yasemin, Gabriel Pires, Urbano J. Nunes IEEE International Workshop on Robot and Human Communication Ro Man, 2024 Error-related potentials (ErrPs) extracted from electroencephalographic signals hold potential for application in Brain-Machine Interfaces, in contexts such as robot teleoperation or shared control in assistive platforms. Due to difficulties in signal classification, in part caused by its non-stationary and noisy nature, their use has not been fully realized yet.This work proposes a new approach to ErrP classification based on a two-stream deep learning architecture with three training stages. Its first stage is a self-supervised autoencoder architecture with a multi-head attention layer providing relevant latent features. The second stage comprises a supervised contrastive learning approach considering two backbone networks, where one inherits weights from the first stage and the other is updated by considering the feature embeddings distribution. The final stage comprises supervised classification, where the two backbones are fused and used to classify the input EEG signal. At the end of the three stages, a data-driven two-stream ErrP model is obtained.Twenty-five variants of the proposed approach using the Deep Convolutional Network, Shallow Convolutional Network and EEGNet backbones were tested in an ablation study and benchmarked against a large number of classical classification methods, using data from the BNCI dataset intended to assess cross subject generalization capabilities. The proposed approach obtained the best results overall, highlighting the approach’s capabilities in capturing relevant representations of the EEG signal.
DepthCN: Vehicle detection using 3D-LIDAR and ConvNet Alireza Asvadi, Luis Garrote, Cristiano Premebida, Paulo Peixoto, Urbano J. Nunes IEEE Conference on Intelligent Transportation Systems Proceedings ITSC, 2017
Attention-Based Multimodal Fusion for Robust 6D Pose Estimation in Cluttered Industrial Environments M Abreu, E Borges, J Perdiz, L Garrote, A Mendes, UJ Nunes 2026 IEEE International Conference on Autonomous Robot Systems and … , 2026 2026
Distilling apple DepthPro for RGB-LiDAR depth estimation M Abreu, L Garrote, UJ Nunes Robotics and Autonomous Systems, 105437 , 2026 2026 Citations: 1
Generalization of Machine and Deep Learning Models for Brain-Computer Interfaces Across Sessions and Paradigms in a Completely Locked-In Patient L Garrote, R Bettencourt, J Perdiz, G Pires, UJ Nunes 2025 34th IEEE International Conference on Robot and Human Interactive … , 2025 2025
Multimodal Human Detection Using YOLO and Representation Learning for Robot Perception KOS Mota, D S. de Oliveira, L Garrote, C Premebida 7th Iberian Robotics Conference (ROBOT2024) , 2024 2024 Citations: 2
Multimodal 6D Detection of Industrial Pallets, in Real and Virtual Environments, with Applications in Industrial AMRs J Lourenço, G Arsénio, L Garrote, UJ Nunes Proceedings of the 21st International Conference on Informatics in Control … , 2024 2024
A Modular Multimodal Multi-Object Tracking-by-Detection Approach, with Applications in Outdoor and Indoor Environments E Borges, L Garrote, UJ Nunes Proceedings of the 21st International Conference on Informatics in Control … , 2024 2024
Pointnetpgap-slc: A 3d lidar-based place recognition approach with segment-level consistency training for mobile robots in horticulture T Barros, L Garrote, P Conde, MJ Coombes, C Liu, C Premebida, ... IEEE Robotics and Automation Letters 9 (11), 10471-10478 , 2024 2024 Citations: 9
Multimodal human detection using RGB, thermal and LiDAR modalities for robotic perception KOS Mota, L Garrote, C Premebida 2024 IEEE 20th International Conference on Automation Science and … , 2024 2024 Citations: 1
Two-Stream Architecture with Contrastive and Self-Supervised Attention Feature Fusion for Error-related Potentials Classification L Garrote, J Perdiz, M Yasemin, G Pires, UJ Nunes 2024 33rd IEEE International Conference on Robot and Human Interactive … , 2024 2024 Citations: 2
Exploiting 3d grids for indoor slam in featureless scenarios L Garrote, U Reverendo, UJ Nunes 2024 IEEE International Conference on Autonomous Robot Systems and … , 2024 2024 Citations: 3
2024 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC) C Santos, E Pedrosa, JL Lima, L Garrote, L Louro, P Fonseca, S Paiva, ... 2024 Citations: 1
Exploiting object-based and segmentation-based semantic features for deep learning-based indoor scene classification R Pereira, L Garrote, T Barros, A Lopes, UJ Nunes arXiv preprint arXiv:2404.07739 , 2024 2024 Citations: 4
A deep learning-based global and segmentation-based semantic feature fusion approach for indoor scene classification R Pereira, T Barros, L Garrote, A Lopes, UJ Nunes Pattern Recognition Letters 179, 24-30 , 2024 2024 Citations: 27
DeepRL-Based Robot Local Motion Planning in Unknown Dynamic Indoor Environments G Gonçalves, D Palaio, L Garrote, UJ Nunes Robot 2023: Sixth Iberian Robotics Conference: Advances in Robotics, Volume … , 2024 2024
TReR: A lightweight transformer re-ranking approach for 3D LiDAR place recognition T Barros, L Garrote, M Aleksandrov, C Premebida, UJ Nunes 2023 IEEE 26th International Conference on Intelligent Transportation … , 2023 2023 Citations: 6
Late-fusion multimodal human detection based on rgb and thermal images for robotic perception E Sousa, KOS Mota, IP Gomes, L Garrote, DF Wolf, C Premebida 2023 European Conference on Mobile Robots (ECMR), 1-6 , 2023 2023 Citations: 11
Costmap-based local motion planning using deep reinforcement learning L Garrote, J Perdiz, UJ Nunes 2023 32nd IEEE International Conference on Robot and Human Interactive … , 2023 2023 Citations: 3
Orchnet: A robust global feature aggregation approach for 3d lidar-based place recognition in orchards T Barros, L Garrote, P Conde, MJ Coombes, C Liu, C Premebida, ... arXiv preprint arXiv:2303.00477 , 2023 2023 Citations: 3
Attdlnet: Attention-based deep network for 3d lidar place recognition T Barros, L Garrote, R Pereira, C Premebida, UJ Nunes Iberian Robotics conference, 309-320 , 2022 2022 Citations: 36
Dynamic environment-based visual user interface system for intuitive navigation target selection for brain-actuated wheelchairs R Pereira, A Cruz, L Garrote, G Pires, A Lopes, UJ Nunes 2022 31st IEEE International Conference on Robot and Human Interactive … , 2022 2022 Citations: 7
MOST CITED SCHOLAR PUBLICATIONS
Multimodal vehicle detection: fusing 3D-LIDAR and color camera data A Asvadi, L Garrote, C Premebida, P Peixoto, UJ Nunes Pattern Recognition Letters 115, 20-29 , 2018 2018 Citations: 305
Sort and deep-sort based multi-object tracking for mobile robotics: Evaluation with new data association metrics R Pereira, G Carvalho, L Garrote, UJ Nunes Applied Sciences 12 (3), 1319 , 2022 2022 Citations: 160
DepthCN: Vehicle detection using 3D-LIDAR and ConvNet A Asvadi, L Garrote, C Premebida, P Peixoto, UJ Nunes 2017 IEEE 20th international conference on intelligent transportation … , 2017 2017 Citations: 142
High-resolution lidar-based depth mapping using bilateral filter C Premebida, L Garrote, A Asvadi, AP Ribeiro, U Nunes 2016 IEEE 19th international conference on intelligent transportation … , 2016 2016 Citations: 92
Attdlnet: Attention-based deep network for 3d lidar place recognition T Barros, L Garrote, R Pereira, C Premebida, UJ Nunes Iberian Robotics conference, 309-320 , 2022 2022 Citations: 36
Autonomous electric vehicle: Steering and path-following control systems M Silva, L Garrote, F Moita, M Martins, U Nunes 2012 16th IEEE Mediterranean electrotechnical conference, 442-445 , 2012 2012 Citations: 35
Place recognition survey: An update on deep learning approaches T Barros, R Pereira, L Garrote, C Premebida, UJ Nunes arXiv preprint arXiv:2106.10458 , 2021 2021 Citations: 32
Real-time deep convnet-based vehicle detection using 3d-lidar reflection intensity data A Asvadi, L Garrote, C Premebida, P Peixoto, UJ Nunes Iberian Robotics conference, 475-486 , 2017 2017 Citations: 30
An RRT-based navigation approach for mobile robots and automated vehicles L Garrote, C Premebida, M Silva, U Nunes 2014 12th IEEE International Conference on Industrial Informatics (INDIN … , 2014 2014 Citations: 29
A deep learning-based global and segmentation-based semantic feature fusion approach for indoor scene classification R Pereira, T Barros, L Garrote, A Lopes, UJ Nunes Pattern Recognition Letters 179, 24-30 , 2024 2024 Citations: 27
Test and evaluation of connected and autonomous vehicles in real-world scenarios J Pereira, C Premebida, A Asvadi, F Cannata, L Garrote, UJ Nunes 2019 IEEE Intelligent Vehicles Symposium (IV), 14-19 , 2019 2019 Citations: 24
3D point cloud downsampling for 2D indoor scene modelling in mobile robotics L Garrote, J Rosa, J Paulo, C Premebida, P Peixoto, UJ Nunes 2017 IEEE international conference on autonomous robot systems and … , 2017 2017 Citations: 24
Modular software architecture for human-robot interaction applied to the InterBot mobile robot R Cruz, L Garrote, A Lopes, UJ Nunes 2018 IEEE International Conference on Autonomous Robot Systems and … , 2018 2018 Citations: 22
Deep-learning based global and semantic feature fusion for indoor scene classification R Pereira, N Gonçalves, L Garrote, T Barros, A Lopes, UJ Nunes 2020 IEEE international conference on autonomous robot systems and … , 2020 2020 Citations: 20
Mobile robot localization with reinforcement learning map update decision aided by an absolute indoor positioning system L Garrote, M Torres, T Barros, J Perdiz, C Premebida, UJ Nunes 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems … , 2019 2019 Citations: 18
Robot-assisted navigation for a robotic walker with aided user intent L Garrote, J Paulo, J Perdiz, P Peixoto, UJ Nunes 2018 27th IEEE international symposium on robot and human interactive … , 2018 2018 Citations: 18
A Deep Learning-based Indoor Scene Classification Approach Enhanced with Inter-Object Distance Semantic Features R Pereira, L Garrote, T Barros, A Lopes, UJ Nunes 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems … , 2021 2021 Citations: 17
Reinforcement learning aided robot-assisted navigation: A utility and RRT two-stage approach L Garrote, J Paulo, UJ Nunes International Journal of Social Robotics 12 (3), 689-707 , 2020 2020 Citations: 17
A reinforcement learning assisted eye-driven computer game employing a decision tree-based approach and CNN classification J Perdiz, L Garrote, G Pires, UJ Nunes IEEE Access 9, 46011-46021 , 2021 2021 Citations: 15
Absolute indoor positioning-aided laser-based particle filter localization with a refinement stage L Garrote, T Barros, R Pereira, UJ Nunes IECON 2019-45th Annual Conference of the IEEE Industrial Electronics Society … , 2019 2019 Citations: 14