MAGnet: Multiscale Attention Guided Network for Enhanced Road Extraction from Satellite Imagery Nomaiya Bashree, Tareque Bashar Ovi, Hussain Nyeem, Md Abdul Wahed, Faiaz Hasanuzzaman Rhythm, Ayat Subah Alam Iet Image Processing, 2026 Efficient extraction of roads from high‐resolution satellite images is critical for urban planning, disaster management and autonomous navigation, especially in complex urban environments. Existing segmentation techniques require significant manual effort and are prone to low accuracy, algorithms based on convolutional neural networks, such as U‐Net improve upon this. Still, their symmetrical encoder–decoder design fails to capture multi‐scale features, suffers from poor gradient flow and creates a semantic gap between encoded and decoded features. To mitigate these issues, we present MAGnet, a multiscale attention guided network that enhances road extraction by incorporating an attention guided regional feature block for multiscale feature fusion, employing squeeze and excitation for channel refinement, and addressing overfitting in conventional U‐shaped architectures. MAGnet integrates a focus gate system in skip connections to mitigate vanishing gradients and feature redundancy, alongside a tri‐level attention unit to bridge the disparity in information representation between the encoder and decoder through channel, spatial and pixel‐level attention. MAGnet achieves improved performance on benchmark datasets like Massachusetts Roads and DeepGlobe, with a more than 5% increase in dice coefficient and a 3% rise in mean intersection over union over top models. Its computational efficiency is underscored by a parameter count of 14.22M, 55.76 Giga floating‐point operations and 27.86 Giga multiply‐accumulate operations. Furthermore, MAGnet's decision‐making is enhanced by explainable artificial intelligence techniques for better interpretability. These results suggest that MAGnet offers a computationally efficient and interpretable approach to road extraction from high‐resolution satellite imagery.
SEA-Net: Dual Attention U-Net for Bleeding Segmentation in Capsule Endoscopy Images Tareque Bashar Ovi, Nomaiya Bashree, Hussain Nyeem, Md Abdul Wahed, Faiaz Hasanuzzaman Rhythm, Disha Chowdhury International Journal of Imaging Systems and Technology, 2026 Gastrointestinal (GI) bleeding, arising from various conditions, can be critical if untreated. Wireless capsule endoscopy (WCE) is a highly effective method for detecting GI bleeding, offering full visualization of the GI tract. However, the large number of images generated per patient poses challenges for clinicians, leading to prolonged analysis times and increased risk of human error. This emphasizes the need for computer‐aided diagnosis systems. In this study, we introduce SEA‐Net ( S tructured E fficient A ttention Net work), a novel deep learning network for detecting bleeding regions in WCE images. SEA‐Net integrates a Convolutional Block Attention Module (CBAM) with long skip connections to enhance gradient flow and improve blood region localization. The EfficientNet‐B4 encoder improves feature extraction efficiency and generalizability. A five‐fold cross validation demonstrates consistent performance, while generalization tests, including precision‐recall curves, ROC curves, and F1 measure, further validate the model's robustness. Minimal performance degradation was observed when the training data was reduced from 80% to 20%. Experimental results show that SEA‐Net achieves a Dice score of 93.64% and an IoU score of 88.61% on a publicly available WCE dataset, outperforming state‐of‐the‐art models and highlighting its strong potential for clinical application.
High-capacity reversible data hiding with iterative dual pixel value ordering Md Abdul Wahed, Hussain Nyeem Alexandria Engineering Journal, 2025 This paper presents an iterative dual pixel value ordering (I-DPVO) scheme for reversible data hiding (RDH), designed to enhance embedding capacity while preserving image fidelity. Unlike conventional PVO-based methods, I-DPVO introduces a recursive embedding strategy that alternates between horizontal and vertical pixel correlations, allowing it to adapt dynamically to varying capacity requirements. This iterative approach refines embedding flexibility by leveraging multi-directional pixel dependencies, reducing distortion through a structured backward embedding phase. By efficiently redistributing pixel modifications across multiple iterations, I-DPVO mitigates degradation even at higher payloads, maintaining reversibility with minimal impact on visual quality. Experimental evaluations across diverse images validate the effectiveness of the proposed method, demonstrating a more favourable rate–distortion trade-off, with increased embedding capacity and high peak signal-to-noise ratio (PSNR) compared to existing PVO-based schemes. The adaptability and efficiency of I-DPVO make it a promising solution for applications requiring high-capacity embedding with strict fidelity constraints, such as metadata annotation, electronic record management , and tamper detection.
LipBengal: Pioneering Bengali lip-reading dataset for pronunciation mapping through lip gestures Md. Tanvir Rahman Sahed, Md. Tanjil Islam Aronno, Hussain Nyeem, Md. Abdul Wahed, Tashrif Ahsan, R Rafiul Islam, Tareque Bashar Ovi, Manab Kumar Kundu, Jane Alam Sadeef Data in Brief, 2025 The LipBengal dataset represents a significant advancement in Bengali lip-reading and visual speech recognition research, poised to drive future applications and technological progress. Despite Bengali's global status as the seventh most spoken language with approximately 265 million speakers, linguistically rich and widely spoken languages like Bengali have been largely overlooked by the research community. LipBengal fills this gap by offering a pioneering dataset tailored for Bengali lip-reading, comprising visual data from 150 speakers across 54 classes, encompassing Bengali phonemes, alphabets, and symbols. Captured under diverse and uncontrolled conditions, LipBengal stands as the most extensive Bengali lip-reading dataset to date, designed to facilitate robust benchmarking and validation of novel deep learning architectures. Detailed annotations extend from phoneme- level classifications to full sentence constructions, providing a granular and comprehensive dataset. The primary potential of LipBengal lies in its thorough coverage of Bengali phonemes, capturing diverse lip movements linked to distinct sounds. This rich dataset holds promise for training accurate lip-reading models, with implications for improved accessibility, enhanced speech recognition, silent speech interfaces, and linguistic research. The dataset's diversity in speaker backgrounds enhances its utility, ensuring broader representation of Bengali pronunciation patterns. Meticulous annotation and curation further bolster its quality and reliability, making LipBengal a valuable asset for researchers and developers in the field.
Multi-Level Dual Pixel Value Ordering based High-Capacity Reversible Data Hiding Md Abdul Wahed, Hussain Nyeem 2025 International Conference on Electrical Computer and Communication Engineering Ecce 2025, 2025 We introduce a high-capacity Reversible Data Hiding (RDH) scheme with multi-level dual Pixel Value Ordering (DPVO) approach. The DPVO process operates in two phases: forward and backward. While conventional PVO-based embedding is applied to image blocks in the forward phase, the backward phase’s refined embedding uses maximum and minimum pixel sets to apply PVO with PEE selectively, allowing partial restoration of original pixel values. By employing the DPVO approach over the image blocks in horizontal and vertical directions in the successive levels, we exploit the inherent pixel correlations within images, substantially enhancing the data embedding capacity with highly maintained image quality. The proposed scheme demonstrates competitive image quality at low embedding rates and excels in achieving significantly high capacity with minimal quality loss, making it ideal for applications requiring high-capacity data embedding like metadata embedding, digital watermarking, and electronic records, etc., where maintaining image fidelity is also crucial.
DoubleUNet++: Channel-Aware Gated Attention for Road Extraction in Satellite Imagery Faiaz Hasanuzzaman Rhythm, Nomaiya Bashree, Tareque Bashar Ovi, Hussain Nyeem, Md Abdul Wahed 2025 IEEE International Conference on Quantum Photonics Artificial Intelligence and Networking Qpain 2025, 2025 Road extraction from satellite imagery faces critical challenges in capturing contextual spatial relationships and mitigating background interference. We introduce DoubleUNet++, which improves the DoubleUNet architecture with two main enhancements: (1) A channel-aware gated attention (CAGA) mechanism in the skip connections of the second U-Net, dynamically weighting channel features via decoder gating signals, and (2) Squeeze-and-Excitation layers in the attention path for adaptive channel-wise feature recalibration. Evaluated on the CHN6-CUG dataset (<tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathbf{4, 5 1 1}$</tex> annotated images), our model achieves state-of-the-art performance with <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathbf{7 8. 5 2}$</tex> Dice score and <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathbf{6 4. 9 4}$</tex> mIoU - representing 3.32 % and 3.74 % absolute gains over baseline DoubleUNet. Despite these improvements, DoubleUNet++ maintains similar computational complexity (<tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathbf{1 0 9. 3 3}$</tex> GFLOPs, 54.64 GMacs) compared to the baseline (107.41 GFLOPs, 53.68 GMacs). The architecture maintains comparable complexity (<tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathbf{2 9. 5 M}$</tex> parameters) while significantly improving precision (<tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathbf{8 1. 4 5 \%}$</tex> vs <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathbf{7 3. 7 8 \%}$</tex>) through refined spatial focus on roadrelevant regions. Qualitative analysis demonstrates the CAGA mechanism's effectiveness in reducing misclassification from nonroad objects such as vegetation and buildings while preserving road topology. These advancements establish DoubleUNet++ as a robust solution for large-scale road network mapping without requiring additional annotated data.
Enhancing U2Net for Precise Road Extraction from Satellite Images via Channel Refinement Faiaz Hasanuzzaman Rhythm, Tareque Bashar Ovi, Nomaiya Bashree, Hussain Nyeem, Md Abdul Wahed 2025 IEEE International Conference on Quantum Photonics Artificial Intelligence and Networking Qpain 2025, 2025 Road extraction from very high-resolution satellite imagery (VHR) is necessary for remote sensing applications like city management, and navigation. In this study, we introduce SE-U<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup>Net, a novel deep learning model for precise road extraction. Our proposed model integrates the Squeeze and Excitation (SE) network with the U<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> Net architecture for enhanced selective channel refinement and better contextual information capture. SE-U<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup>Net employs Residual U-blocks (RSU), which combine receptive fields of different sizes, enabling the model to capture multiscale contextual data efficiently. The baseline U<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup>Net has 75.06 GFLOPs and 37.51 GMacs, while SE-U<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup>Net maintains this efficiency with 75.07 GFLOPs and 37.51 GMacs, despite its enhanced representational capacity through SE blocks. Emperical outcomes on the DeepGlobe dataset demonstrate that SE-U<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup>Net outperforms the base U<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup>Net model and other state-of-the-art models, achieving superior performance in terms of Dice coefficient, mean Intersection over Union (mIoU), precision, and recall. These results underscore SE-U<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup>Net's potential as a robust tool for high-accuracy road segmentation from VHR satellite imagery.
Generalising Violence Detection with a New Near-Real-World Violence Dataset Mahmudul Haque, Hussain Nyeem, Tareque Bashar Ovi, Al Nahid, Md. Sabbir Hossain Molla, Md. Tanjim Mahmud Tuhin, Fardin Shahab, Ayat Subah Alam, Saadia Binte Alam 2024 3rd International Conference on Advancement in Electrical and Electronic Engineering Icaeee 2024, 2024
Revisiting Deep Learning Models for Road Lane Detection Raiyan Ibne Hafiz, Toaha Bin Faruq, Hussain Nyeem 2021 5th International Conference on Electrical Engineering and Information and Communication Technology Iceeict 2021, 2021
RONI segmentation for medical image watermarking Chowdhury M. Abid Rahman, Hussain Nyeem 2016 3rd International Conference on Electrical Engineering and Information and Communication Technology Iceeict 2016, 2017
Developing a digital image watermarking model Hussain Nyeem, Wageeh Boles, Colin Boyd Proceedings 2011 International Conference on Digital Image Computing Techniques and Applications Dicta 2011, 2011