Highly Accurate Skin Cancer Diagnosis Using HMT-NET and Vision Transformer Models
DOI: 10.18287/JBPE25.11.040308
Abstract
Skin cancer, especially melanoma, is one of the most aggressive and fatal cancers, with rising global incidence driven largely by ultraviolet exposure. Early detection is critical, as visual similarities among lesions complicate diagnosis. Clinical methods include self-examination, dermoscopic, and biopsy, while computational approaches range from traditional Computer-aided design systems to deep learning models like convolutional neural network and Vision Transformers. Although these models enhance accuracy, they face limitations such as high data requirements, limited interpretability, and inconsistent image quality, emphasizing the need for scalable and explainable diagnostic systems in clinical settings. A Hybrid Multi-layer Transformer Network (HMT-NET) and Vision Transformer (ViT)-based model is proposed for accurate skin lesion segmentation and multiclass skin cancer classification. This study presents a hybrid HMT-NET and ViT framework for accurate skin lesion segmentation and classification. Evaluated on HAM10000 and ISIC2019 datasets, the model achieved high Dice score and Jaccard index and 98.75% classification accuracy, demonstrating superior performance and reliability for integration into automated dermatological diagnostic systems.
Keywords
Full Text:
PDFReferences
1. A. C. Green, C. M. Olsen, and D. J. Hunter, Skin Cancer, in Textbook of cancer epidemiology, H.-O. Adami, D. Hunter, D. Trichopoulos (Ed.), 2nd ed., Oxford University Press, New York (2018). ISBN: 978-0-19-531117-4
2. D. Moturi, R. K. Surapaneni, and V. S. G. Avanigadda, “Developing an efficient method for melanoma detection using CNN techniques,” Journal of the Egyptian National Cancer Institute 36(1), 6 (2024).
3. H. AbuAlkebash, R. A. A. Saleh, and H. M. Ertunç, “Automated explainable deep learning framework for multiclass skin cancer detection and classification using hybrid YOLOv8 and vision transformer (ViT),” Biomedical Signal Processing and Control 108, 107934 (2025).
4. N. Melarkode, K. Srinivasan, S. M. Qaisar, and P. Plawiak, “AI-powered diagnosis of skin cancer: a contemporary review, open challenges and future research directions,” Cancers 15(4), 1183 (2023).
5. J. Wang, L. Wei, L. Wang, Q. Zhou, L. Zhu, and J. Qin, “Boundary-aware transformers for skin lesion segmentation,” in Medical Image Computing and Computer Assisted Intervention – MICCAI 2021, M. De Bruijne, P. C. Cattin, S. Cotin, N. Padoy, S. Speidel, Y. Zheng, and C. Essert (Eds.), 12901, Springer, Cham, 206–216 (2021).
6. G. Wang, Q. Ma, Y. Li, K. Mao, L. Xu, and Y. Zhao, “A skin lesion segmentation network with edge and body fusion,” Applied Soft Computing 170, 112683 (2025).
7. M. A. Arshed, S. Mumtaz, M. Ibrahim, S. Ahmed, M. Tahir, and M. Shafi, “Multi-class skin cancer classification using vision transformer networks and convolutional neural network-based pre-trained models,” Information 14(7), 415 (2023).
8. I. Ahmed, B. Bushon Routh, Md. S. Rahman Kohinoor, S. Sakib, M. Mahfuzur Rahman, and F. Azzedin, “Multi-model attentional fusion ensemble for accurate skin cancer classification,” IEEE Access 12, 181009–181024 (2024).
9. E. K. Aghdam, R. Azad, M. Zarvani, and D. Merhof, “Attention swin U-Net: Cross-contextual attention mechanism for skin lesion segmentation,” in IEEE 20th International Symposium on Biomedical Imaging, IEEE, 1–5 (2023).
10. S. Yang, L. Wang, “HMT-Net: Transformer and MLP hybrid encoder for skin disease segmentation,” Sensors 23(6), 3067 (2023)
11. X. Zhang, Y. Liu, G. Ouyang, W. Chen, A. Xu, T. Hara, X. Zhou, and D. Wu, “DermViT: Diagnosis-guided vision transformer for robust and efficient skin lesion classification,” Bioengineering 12(4), 421 (2025).
12. K. Rezaee, H. G. Zadeh, “Self-attention transformer unit-based deep learning framework for skin lesions classification in smart healthcare,” Discover Applied Sciences 6(1), 3 (2024).
13. I. Pacal, M. Alaftekin, and F. D. Zengul, “Enhancing Skin cancer diagnosis using swin transformer with hybrid shifted window-based multi-head self-attention and SwiGLU-based MLP,” Journal of Imaging Informatics in Medicine 37(6), 3174–3192 (2024).
14. Y. Mo, P. Zuo, Q. Zhou, Z. Mo, Y. Fan, S. Zhang, and B. Kang, “PWLT: Pyramid window-based lightweight transformer for image classification,” Computers and Electrical Engineering 116, 109209 (2024).
15. S. Aladhadh, M. Alsanea, M. Aloraini, T. Khan, S. Habib, and M. Islam, “An effective skin cancer classification mechanism via medical vision transformer,” Sensors 22(11), 4008 (2022).
16. G. M. S. Himel, Md. M. Islam, Kh. A. Al-Aff, S. I. Karim, and Md. K. U. Sikder, “Skin cancer segmentation and classification using vision transformer for automatic analysis in dermatoscopy-based noninvasive digital system,” International Journal of Biomedical Imaging 2024, 1–18 (2024).
17. R. Wu, Y. Liu, G. Ning, P. Liang, and Q. Chang, “UltraLight VM-UNet: Parallel Vision Mamba significantly reduces parameters for skin lesion segmentation,” Patterns 6(11), 101298 (2025).
18. A. Alrabai, A. Echtioui, and F. Kallel, “Explainable deep learning approaches for skin cancer diagnosis,” Network Modeling and Analysis in Health Informatics and Bioinformatics Journal 14(1), 57 (2025).
19. H. Amjad, N. Asif, H. Elahi, U. Shahbaz Khan, H. Akbar, A. R. Ansari, and R. Nawaz, “Precision segmentation and binary masking of skin lesions in automated dermatological diagnostics using detectron2,” IEEE Access 12, 187696–187708 (2024).
20. S. Chatterjee, J.-M. Gil, and Y.-C. Byun, “Early detection of multiclass skin lesions using transfer learning-based IncepX-ensemble model,” IEEE Access 12, 113677–113693 (2024).
21. C. Yuan, D. Zhao, and S. S. Agaian, “UCM-NetV2: An Efficient Deep Learning Model for Skin Lesion Segmentation,” Journal of Economy and Technology 3, 251–263 (2025).
22. Z. Ji, X. Wang, C. Liu, Z. Wang, N. Yuan, and I. Ganchev, “EFAM-Net: A multi-class skin lesion classification model utilizing enhanced feature fusion and attention mechanisms,” IEEE Access 12, 143029–143041 (2024).
23. A. Jimi, N. Zrira, O. Guendoul, I. Benmiloud, H. A. Khan, and S. Nawaz, “ESC-UNET: A hybrid CNN and swin transformers for skin lesion segmentation,” Intelligence-Based Medicine 12, 100257 (2025).
24. R. Karthik, R. Menaka, S. Atre, J. Cho, and S. Veerappampalayam Easwaramoorthy, “A hybrid deep learning approach for skin cancer classification using swin transformer and dense group shuffle non-local attention network,” IEEE Access 12, 158040–158051 (2024).
25. A. D. Khalaf, H. Hamdan, A. Abdul Halin, and N. Manshor, “Segmentation and classification of skin cancer diseases based on deep learning: Challenges and future directions,” IEEE Access 13, 90163–90184 (2025).
26. V. Kumar, D. L. Shanthi, T. R. Babu, N. Kumar, and R. K. Godi, “Advanced Skin Lesion Segmentation Using Adaptive Contextual GLCM,” Egyptian Informatics Journal 30, 100706 (2025).
Сontact
34 Moskovskoe shosse, Samara, 443086, Russian Federation
Email: j-bpe@ssau.ru
Phone: +7-846-267-4550
© 2014-2025 J-BPE














