Abstract:
Diabetic Retinopathy (DR) is a medical condition which damages the retina due to high blood sugar levels. If DR is untreated on time, it will lead to vision impairment or even blindness. Therefore, early detection and timely treatment are very important to cope with DR Treatment depends on the severity or grade of DR, which needs to be correctly determined. DR diagnosis is a difficult process as the retinal lesions (e.g., Microaneurysms, Hemorrhages) are smaller in size (covers only a few pixels in fundus images) compared to the size of the retina and they are visually similar. Usually DR is diagnosed manually by human experts, which is time consuming, subjective, and prone to errors. Recently, Deep Learning (DL), especially Convolutional Neural Network (CNN) based automated solutions were proposed, and achieved a remarkable success for DR grading with effective prediction in short time. The main aim of this work is to build a CNN based state-of-the-art approach for DR diagnosis using fundus images. This work investigates three aspects in detail in order to make a CNN model to be successful in DR grading: (i) use of pooling techniques, (ii) loss function to optimise the parameters, and (iii) the CNN architecture. It is observed that there is no review and/or comparative study on pooling techniques to investigate which technique is appropriate in which scenario (e.g., class-specific features cover the entire image, or cover a part of the image). Therefore, a comprehensive review and a comparative study on these techniques were carried out in this thesis which reveals that the pooling technique should be chosen by considering the scale of the class-specific features, and the problem under investigation. In addition, it has been observed that DR grading being an ordinal classification task is greatly influenced by the choice of the loss function which is not thoroughly investigated in the literature. Thus, this work investigates different loss functions such as Cross-Entropy loss, Mean Squared Error (MSE) loss, Soft Ordinal Regression loss, and Quadratic Weighted Kappa (QWK) loss for DR grading. Experiments with single eye evaluated on a large scale widely used Kaggle Eye PACS dataset show that the MSE loss is the most appropriate loss to get a better QWK score (i.e., 0.832) which measures the agreement between the proposed model and human experts. Finally, a Siamese network based CNN architecture is proposed to make use of the patient-level features (i.e., features extracted from both eyes) to boost the single eye based DR diagnosis performance as a fine-grained classification problem. Different approaches such as bilinear pooling were investigated to integrate the features of both eyes. Comparative experiments on Kaggle EyePACS dataset show that the proposed approach leads to the new state-of-the-art results of 0.860 QWK