Interpretable Vision Transformers in Monocular Depth Estimation via SVDA Vasileios Arampatzakis, George Pavlidis, Nikolaos Mitianoudis, Nikos Papamarkos Mathematics, 2026 Monocular depth estimation is a central problem in computer vision with applications in robotics, augmented reality, and autonomous driving, yet the self-attention mechanisms used by modern Transformer architectures remain opaque. In this work, we integrate SVD-Inspired Attention (SVDA) into the Dense Prediction Transformer (DPT), introducing a spectrally structured attention formulation for dense prediction that decouples directional alignment from spectral modulation through a learnable diagonal matrix embedded in normalized query–key interactions. Experiments on KITTI and NYU-v2 show that SVDA preserves competitive predictive performance while enabling intrinsic interpretability: on KITTI, AbsRel improves from 0.058 to 0.056 and δ1 from 0.976 to 0.979, while on NYU-v2, AbsRel improves from 0.133 to 0.124 and δ1 from 0.865 to 0.872. This is achieved with only 0.01% additional parameters, at the cost of a measurable runtime overhead associated with the added normalization and spectral modulation. More importantly, SVDA enables six spectral indicators that quantify entropy, rank, sparsity, alignment, selectivity, and robustness, revealing consistent cross-dataset and depth-wise patterns in how attention organizes during training. These properties make the model easier to inspect and better suited to applications where transparency and reliability are important, such as robotics and autonomous navigation.
On Segment-Aware Monocular Depth Estimation Using Vision Transformers Vasileios Arampatzakis, George Pavlidis, Nikolaos Mitianoudis, Nikos Papamarkos Information Switzerland, 2026 Monocular Depth Estimation (MDE) infers per-pixel scene geometry from a single RGB image. Despite recent progress, global MDE models often blur depth discontinuities at object boundaries and fail to capture object-level structure. Segment-aware depth estimation addresses this limitation by exploiting semantic segmentation to decompose depth prediction into simpler, class-specific subproblems. In this work, we study semantic-aware MDE in a multi-branch design where each semantic class is handled by a lightweight Vision Transformer (ViT) branch that predicts dense depth for its class while suppressing interference from other regions. We further examine fusion strategies that merge the branch outputs into a single prediction: (i) a learnable cross-attention fusion module that predicts depth from the stack of per-class proposals and masks, and (ii) a parameter-free stitched summation that sums mask-gated outputs. The proposed architecture is simple, scalable, end-to-end trainable, and compatible with arbitrary transformer backbones. Experiments on Virtual KITTI 2, where ground-truth depth and semantic labels are available, show that segment-aware modeling produces sharper depth boundaries and improves standard error metrics compared to a single-branch baseline (AbsRel 0.243→0.152; RMSE 11.952→9.101). Finally, we find that the parameter-free summation matches, and in most cases improves upon, the accuracy of learned fusion while adding no computational overhead.
Interpretable Vision Transformers in Image Classification via SVDA Vasileios Arampatzakis, George Pavlidis, Nikolaos Mitianoudis, Nikos Papamarkos IEEE Access, 2026 Vision Transformers (ViTs) have achieved state-of-the-art performance in image classification, yet their attention mechanisms often remain opaque and exhibit dense, non-structured behaviors. In this work, we adapt our previously proposed SVD-Inspired Attention (SVDA) mechanism to the ViT architecture, introducing a geometrically grounded formulation that enhances interpretability, sparsity, and spectral structure. We apply the use of interpretability indicators—originally proposed with SVDA—to monitor attention dynamics during training and assess structural properties of the learned representations. Experimental evaluations on four widely used benchmarks—CIFAR-10, FashionMNIST, CIFAR-100, and ImageNet-100—together with an additional pretrained fine-tuning study in a standard ViT setting show that SVDA preserves competitive classification behavior in our experimental settings while providing descriptive diagnostics of attention structure. In the pretrained setting, we integrate the exact SVDA operator into the late transformer blocks of a standard pretrained ViT and fine-tune on ImageNet-100, providing additional evidence that the proposed mechanism remains viable beyond compact from-scratch training. While the current framework offers descriptive insights rather than prescriptive guidance, our results establish SVDA as a comprehensive and informative tool for analyzing and developing structured attention models in computer vision. This work lays the foundation for future advances in explainable AI, spectral diagnostics, and attention-based model design.
Geometry Meets Attention: Interpretable Transformers via SVD Inspiration Vasileios Arampatzakis, George Pavlidis, Nikolaos Mitianoudis, Nikos Papamarkos IEEE Access, 2025 Self-attention is a cornerstone of modern deep learning, yet its dense dot-product formulation offers limited interpretability and lacks explicit structural constraints. We propose <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">SVD-inspired Attention</i> (SVDA), a novel self-attention mechanism that introduces normalized query/key projections and a learnable diagonal spectral modulation, drawing direct motivation from the structure of Singular Value Decomposition (SVD). This formulation separates directional alignment from spectral emphasis, offering a geometrically grounded and interpretable variant of attention. We formalize SVDA within a standard multi-head Transformer architecture and introduce a suite of structure-aware indicators—such as spectral entropy, effective rank, and selectivity—that quantify interpretability and sparsity in attention dynamics. Our analysis highlights SVDA’s capacity for structured, energy-aware attention without compromising architectural compatibility or expressiveness. This work provides a theoretical foundation and diagnostic framework for structured attention models aimed at interpretability, compression, and semantic transparency.
Towards Explainability in Monocular Depth Estimation Vasileios Arampatzakis, George Pavlidis, Kyriakos Pantoglou, Nikolaos Mitianoudis, Nikos Papamarkos Communications in Computer and Information Science, 2025
Monocular Depth Estimation: A Thorough Review Vasileios Arampatzakis, George Pavlidis, Nikolaos Mitianoudis, Nikos Papamarkos IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024 Estimation of depth in two-dimensional images is among the challenging topics in Computer Vision. This is a well-studied but also an ill-posed problem, which has long been the focus of intense research. This paper is an in-depth review of the topic, presenting two aspects, one that considers the mechanisms of human depth perception, and another that includes the various Deep Learning approaches. The methods are presented in a compact and structured way that outlines the topic and categorizes the approaches according to the line of research followed in the recent decade. Although there has been significant advancement in the topic, it was without any connection with human depth perception and the potential benefits from this sector.
Towards memristive crossbar-based neuromorphic HW accelerators for signal processing I. Vourkas, A. Abusleme, N. Vasileiadis, G. Ch. Sirakoulis, N. Papamarkos 2017 6th International Conference on Modern Circuits and Systems Technologies Mocast 2017, 2017 Research progress in neuromorphic hardware, capable of biological perception and cognitive information processing, is leading the way towards a revolution in computing technology. Current research efforts have focused mainly on resistive switching devices, the electronic analog of synapses in artificial neural networks (ANNs), and the crossbar nanoarchitecture, for its huge connectivity and maximum integration density. In this context, this work presents the design and simulation of a memristive crossbar-based ANN for text recognition tasks, implementing a novel computing algorithm. In such case study, important issues during the application mapping process are identified and properly addressed at device and circuit level. The computing capabilities of the proposed system are highlighted through SPICE-level circuit simulations, which show excellent agreement with theoretical simulation results.
A system for restoration and structural retrieval of documents Maria Ntonti, Nikos Papamarkos Proceedings of the IASTED International Conference on Modelling Identification and Control, 2017 This paper describes a new technique for document retrieval using a web camera in an office environment. The architecture of the aforementioned method consists of three main stages: segmentation, restoration and retrieval. Firstly, the document image is taken with a web camera and the segmentation process is applied to locate the four corners of the document and isolate it. Then we proceed to document restoration by applying filtering, skew and curvature correction as well as removal of any redundant object that does not belong to the document. In the third and final stage, a feature vector is extracted and is compared with the documents of the database.
Applying conformal geometry for creating a 3D model spatial-consistent texture map George Ioannakis, Christodoulos Chamzas, Anestis Koutsoudis, Nikolaos Papamarkos, Ioannis Pratikakis, Fotis Arnaoutoglou, Nikolaos Mitianoudis, Thomas Sgouros 2016 Digital Media Industry and Academic Forum Dmiaf 2016 Proceedings, 2016 The aim of this research is to achieve spatial consistency of the UV map. We present an approach to produce a fully spatially consistent UV mapping based on the planar parameterisation of the mesh. We apply our method on a 3D digital replica of an ancient Greek Lekythos vessel. We parameterise the mesh of a 3D model onto a unit square 2D plane using computational conformal geometry techniques. The proposed method is genus independent, due to an iterative 3D mesh cutting procedure. Having now the texture of a 3D model depicted on a spatially continuous two dimensional structure enables us to efficiently apply a vast range of image processing based techniques and algorithms.
Real time hand detection in a complex background Ekaterini Stergiopoulou, Kyriakos Sgouropoulos, Nikos Nikolaou, Nikos Papamarkos, Nikos Mitianoudis Engineering Applications of Artificial Intelligence, 2014
Conversion of color documents to grayscale Iliana Papamarkou, Nikos Papamarkos 2013 21st Mediterranean Conference on Control and Automation MED 2013 Conference Proceedings, 2013
Guidance, navigation, and control of an unmanned hovercraft Kilsoo Kim, Young-Ki Lee, Sehwan Oh, David Moroniti, Dimitri Mavris, George J. Vachtsevanos, Nikos Papamarkos, George Georgoulas 2013 21st Mediterranean Conference on Control and Automation MED 2013 Conference Proceedings, 2013
img(Anaktisi): A web content based image retrieval system Konstantinos Zagoris, Savvas A. Chatzichristofis, Nikos Papamarkos, Yiannis S. Boutalis 2009 2nd International Workshop on Similarity Search and Applications Sisap 2009, 2009
Color quantization based on PCA and kohonen SOFM D. Mavridis, N. Papamarkos Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2009
Estimation of proper parameter values for document binarization Proceedings of the 10th IASTED International Conference on Computer Graphics and Imaging Cgim 2008, 2008
An evaluation technique for binarization algorithms Journal of Universal Computer Science, 2008
DEVELOPING document image retrieval system Mccsis 08 Iadis Multi Conference on Computer Science and Information Systems Proceedings of Computer Graphics and Visualization 2008 and Gaming 2008 Design for Engaging Experience Soc Interaction, 2008
Skew correction in documents with several differently skewed text areasskew correction in documents with several differently skewed text areas Visapp 2007 2nd International Conference on Computer Vision Theory and Applications Proceedings, 2007
Adaptive document binarization : A human vision approach Visapp 2007 2nd International Conference on Computer Vision Theory and Applications Proceedings, 2007
Color segmentation of complex document images Visapp 2006 Proceedings of the 1st International Conference on Computer Vision Theory and Applications, 2006
Text localization in color documents Visapp 2006 Proceedings of the 1st International Conference on Computer Vision Theory and Applications, 2006
Color reduction by using a new self-growing and self-organized neural network Institute of Mathematics and Its Applications Vision Video and Graphics 2005 Vvg 2005, 2005
Automatic evaluation of document binarization results E. Badekas, N. Papamarkos Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2005
A window-based gray-scale inverse Hough transform algorithm and its applications on gray-scale line filtering IEEE International Conference on Image Processing, 2001
A new adaptive color quantization technique Antonios Atsalakis, Nikos Papamarkos, Charalambos Strouthopoulos Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2001
Separation of overlapping characters N. Papamarkos, T. Koutalianos Proceedings of the IEEE International Conference on Electronics Circuits and Systems, 1999
Document block identification using a neural network International Conference on Digital Signal Processing DSP, 1997
Off-line signature verification using multiple neural network classification structures International Conference on Digital Signal Processing DSP, 1997
Sampling on polar coordinates International Conference on Digital Signal Processing DSP, 1997
Image segmentation and linear feature identification using rectangular block decomposition Proceedings of the IEEE International Conference on Electronics Circuits and Systems, 1996
Determination of run-length smoothing values for document segmentation Proceedings of the IEEE International Conference on Electronics Circuits and Systems, 1996
Time domain design of 1-D IIR digital filters with coefficients of finite word length AMSE Review Association for the Advancement of Modelling and Simulation Techniques in Enterprises, 1990
On the approximation of the magnitude response of 1-D IIR digital filters using linear programming AMSE Review Association for the Advancement of Modelling and Simulation Techniques in Enterprises, 1990
ON THE APPROXIMATION OF THE MAGNITUDE RESPONSE OF 2-D IIR DIGITAL FILTERS USING LINEAR PROGRAMMING. Proceedings IEEE International Symposium on Circuits and Systems, 1985
Interpretable Vision Transformers in Image Classification via SVDA V Arampatzakis, G Pavlidis, N Mitianoudis, N Papamarkos IEEE Access , 2026 2026
Interpretable Vision Transformers in Monocular Depth Estimation via SVDA V Arampatzakis, G Pavlidis, N Mitianoudis, N Papamarkos Mathematics 14 (8), 1272 , 2026 2026
On Segment-Aware Monocular Depth Estimation Using Vision Transformers V Arampatzakis, G Pavlidis, N Mitianoudis, N Papamarkos Information 17 (2), 145 , 2026 2026
Geometry meets attention: Interpretable transformers via SVD inspiration V Arampatzakis, G Pavlidis, N Mitianoudis, N Papamarkos IEEE Access , 2025 2025 Citations: 3
A Dilated MultiRes Visual Attention U-Net for historical document image binarization N Detsikas, N Mitianoudis, N Papamarkos Signal Processing: Image Communication 122, 117102 , 2024 2024 Citations: 6
Monocular depth estimation: A thorough review V Arampatzakis, G Pavlidis, N Mitianoudis, N Papamarkos IEEE Transactions on Pattern Analysis and Machine Intelligence 46 (4), 2396-2414 , 2023 2023 Citations: 128
Towards explainability in monocular depth estimation V Arampatzakis, G Pavlidis, K Pantoglou, N Mitianoudis, N Papamarkos Joint European Conference on Machine Learning and Knowledge Discovery in … , 2023 2023 Citations: 3
Automatic classification of earthquake-induced building damages E Vrochidou, I Andreadis, N Papamarkos, M Zervakis LAP LAMBERT Academic Publishing , 2018 2018
Towards memristive crossbar-based neuromorphic HW accelerators for signal processing I Vourkas, Á Abusleme, N Vasileiadis, GC Sirakoulis, N Papamarkos 2017 6th International Conference on Modern Circuits and Systems … , 2017 2017
Applying conformal geometry for creating a 3d model spatial-consistent texture map G Ioannakis, C Chamzas, A Koutsoudis, N Papamarkos, I Pratikakis, ... 2016 Digital Media Industry & Academic Forum (DMIAF), 117-120 , 2016 2016 Citations: 2
A New Sharpening Technique for Medical Images using Wavelets and Image Fusion. P Zafeiridis, N Papamarkos, S Goumas, I Seimenis Journal of Engineering Science & Technology Review 9 (3) , 2016 2016 Citations: 16
Object-Panorama using SIFT/SURF descriptors and Tamura texture features G Ioannakis, A Koutsoudis, N Papamarkos, C Chamzas Ioannis Liritzis University of the Aegean, GR Arne Flaten Ball State … , 2015 2015
Document image binarization using local features and Gaussian mixture modeling N Mitianoudis, N Papamarkos Image and Vision Computing 38, 33-51 , 2015 2015 Citations: 71
A dynamic gesture and posture recognition system K Sgouropoulos, E Stergiopoulou, N Papamarkos Journal of Intelligent & Robotic Systems 76 (2), 283-296 , 2014 2014 Citations: 26
Multi-spectral document image binarization using image fusion and background subtraction techniques N Mitianoudis, N Papamarkos 2014 IEEE international conference on image processing (ICIP), 5172-5176 , 2014 2014 Citations: 20
Microcalcification oriented content-based mammogram retrieval for breast cancer diagnosis L Tsochatzidis, K Zagoris, M Savelonas, N Papamarkos, I Pratikakis, ... 2014 IEEE International Conference on Imaging Systems and Techniques (IST … , 2014 2014 Citations: 6
Real time hand detection in a complex background E Stergiopoulou, K Sgouropoulos, N Nikolaou, N Papamarkos, ... Engineering Applications of Artificial Intelligence 35, 54-70 , 2014 2014 Citations: 73
Local co-occurrence and contrast mapping for document image binarization N Mitianoudis, N Papamarkos 2014 14th International Conference on Frontiers in Handwriting Recognition … , 2014 2014 Citations: 8
A novel image sharpening technique based on 2D-DWT and image fusion I Papamarkou, N Papamarkos, S Theochari 17th International Conference on Information Fusion (FUSION), 1-8 , 2014 2014 Citations: 5
Distinction between handwritten and machine-printed text based on the bag of visual words model K Zagoris, I Pratikakis, A Antonacopoulos, B Gatos, N Papamarkos Pattern Recognition 47 (3), 1051-1062 , 2014 2014 Citations: 67
MOST CITED SCHOLAR PUBLICATIONS
Hand gesture recognition using a neural network shape fitting technique E Stergiopoulou, N Papamarkos Engineering Applications of Artificial Intelligence 22 (8), 1141-1158 , 2009 2009 Citations: 388
A new signature verification technique based on a two-stage neural network classifier H Baltzakis, N Papamarkos Engineering applications of Artificial intelligence 14 (1), 95-103 , 2001 2001 Citations: 375
A new approach for multilevel threshold selection N Papamarkos, B Gatos CVGIP: Graphical Models and Image Processing 56 (5), 357-370 , 1994 1994 Citations: 213
Adaptive color reduction N Papamarkos, AE Atsalakis, CP Strouthopoulos IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 32 … , 2002 2002 Citations: 184
An Evaluation Technique for Binarization Algorithms. P Stathis, E Kavallieratou, N Papamarkos J. Univers. Comput. Sci. 14 (18), 3011-3030 , 2008 2008 Citations: 160
Skew detection and text line position determination in digitized documents B Gatos, N Papamarkos, C Chamzas Pattern Recognition 30 (9), 1505-1519 , 1997 1997 Citations: 159
Segmentation of historical machine-printed documents using adaptive run length smoothing and skeleton segmentation paths N Nikolaou, M Makridis, B Gatos, N Stamatopoulos, N Papamarkos Image and Vision Computing 28 (4), 590-604 , 2010 2010 Citations: 146
Accurate image retrieval based on compact composite descriptors and relevance feedback information SA Chatzichristofis, K Zagoris, YS Boutalis, N Papamarkos International Journal of Pattern Recognition and Artificial Intelligence 24 … , 2010 2010 Citations: 139
Multithresholding of color and gray-level images through a neural network technique N Papamarkos, C Strouthopoulos, I Andreadis Image and Vision Computing 18 (3), 213-222 , 2000 2000 Citations: 139
Monocular depth estimation: A thorough review V Arampatzakis, G Pavlidis, N Mitianoudis, N Papamarkos IEEE Transactions on Pattern Analysis and Machine Intelligence 46 (4), 2396-2414 , 2023 2023 Citations: 128
Color reduction for complex document images N Nikolaou, N Papamarkos International Journal of Imaging Systems and Technology 19 (1), 14-26 , 2009 2009 Citations: 124
Text identification for document image analysis using a neural network C Strouthopoulos, N Papamarkos Image and Vision Computing 16 (12-13), 879-896 , 1998 1998 Citations: 93
Color reduction and estimation of the number of dominant colors by using a self-growing and self-organized neural gas A Atsalakis, N Papamarkos Engineering Applications of Artificial Intelligence 19 (7), 769-786 , 2006 2006 Citations: 91
Text extraction in complex color documents C Strouthopoulos, N Papamarkos, AE Atsalakis Pattern Recognition 35 (8), 1743-1758 , 2002 2002 Citations: 87
A new approach for the design of digital integrators N Papamarkos, C Chamzas IEEE Transactions on Circuits and Systems I: Fundamental Theory and … , 2002 2002 Citations: 80
On the inverse Hough transform AL Kesidis, N Papamarkos IEEE Transactions on Pattern Analysis and Machine Intelligence 21 (12), 1329 … , 1999 1999 Citations: 80
Optimal combination of document binarization techniques using a self-organizing map neural network E Badekas, N Papamarkos Engineering Applications of Artificial Intelligence 20 (1), 11-24 , 2007 2007 Citations: 74
Real time hand detection in a complex background E Stergiopoulou, K Sgouropoulos, N Nikolaou, N Papamarkos, ... Engineering Applications of Artificial Intelligence 35, 54-70 , 2014 2014 Citations: 73
Document image binarization using local features and Gaussian mixture modeling N Mitianoudis, N Papamarkos Image and Vision Computing 38, 33-51 , 2015 2015 Citations: 71
Gray-level reduction using local spatial features N Papamarkos, A Atsalakis Computer Vision and Image Understanding 78 (3), 336-350 , 2000 2000 Citations: 70