3 Tips to network properly in an event

We all attend events, at times, especially when going to an event without early notice, they do feel like a waste of time. however, you can generate real value if you approach them in the right way…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Plant Disease Detection

Plant Pathology Challenge 2020

Agriculture is the most important sector for the existence of human beings. It’s also the backbone of economy of most of the developing countries.
Plant disease has been a major factor influencing food production. It’s very important to identify the disease in early stages and diagnose the problem to start the appropriate treatment.
In this Case Study we will try to train a model using leaf image dataset from ‘Plant Pathology Challenge’ which will classify the image into different diseased categories.

Misdiagnosis of diseases impacts agricultural crops and can lead to misuse of chemicals leading to the emergence of resistant pathogen strains, increased input costs, and more outbreaks with significant economic loss and environmental impacts.
Current disease diagnosis based on human scouting is time-consuming and expensive. So here we are challenged to train a model which will classify a given image into one among — Healthy, Multiple Disease, Rust and Scab.

In this we are supposed to classify a given leaf image into one of the 4 categories. Hence this problem can be formulated as a classification problem.
We will use AUC (Area under the ROC Curve) score to judge the performance of our model.

This data is taken from Plant Pathology 2020 Competition. It consist of high-quality, real-life RGB images of multiple apple foliar disease symptoms during the 2019 growing season from commercially grown cultivars in an unsprayed apple orchard at Cornell AgriTech (Geneva, New York, USA).

Sample images from the data set showing symptoms of cedar apple rust , apple scab (B), multiple diseases on a single leaf , and healthy leaves (D)

The Dataset has a total of 1821 labeled images for training purpose and another 1821 unlabeled images for testing the model performance. Each image has height of 1365, and width of 2048. Here is the distribution of different types of images in train dataset-

Distribution of Train Dataset

6. Data Preprocessing
Large sized images require more number of trainable parameters to train a model, resulting in need of more computation power. Hence if we have limited compute power, size reduction of images can be the savior. If the dataset has images with different sizes, it becomes mandatory to resize it to a fixed size. The size to which we resize our image should be chosen carefully. We must make sure that we are not throwing away much of the information by reducing the size, at the same time not over loading our model with too many parameters. The second case would result in very slow training and Resource Exhaust errors. In this case study, I reduced my images to 10% of the original size, and it gave decently good results.

Along with this, its also a good idea to use Image Augmentation. In some cases, Image Augmentation can be the game changer. For people who are not aware, Image augmentation is a process of creating new training examples from the existing ones. Its done by applying operations like Scaling, Cropping, Flipping, Padding and many more on Images. This idea is useful especially in the case when we have limited dataset.

In this case study, Image Augmentation helped me get a better AUC Score.

Convolutional Neural Network (CNN) model has been the choice for Image Classification related tasks over Multilayer Perceptron (MLP). Pick any SOTA model architecture, most probably you will find CNN being used in it.

Here are few advantages of using CNN over MLP for image Classification task -

For this case study, I used Transfer Learning. In simple words, Transfer Learning is making use of pre-trained models . The advantage of using such a model is that these models are built after some extensive research and are trained using high end processors, hence in most of the cases, such models give very good results.
I tried various models, and EfficientNet model initialized with ‘noisy-student’ weights gave the best results. Even this model is based on CNNs.

Below is the code for my best model-

Below is the plot of the model score over every epoch and Confusion Matrix showing the performance of model on Train and Test dataset.

On Kaggle leader board this model got a Public score of 0.94532 and Private score of 0.94441.

Then I tried various ensembles by combining the outputs of some of the top performing models, and it got a Public score of 0.95040 and a Private Score of 0.94317.

Using Images of larger size and increasing the batch size will definitely improve the score.

After completing the training, I went on to investigate to understand the kind of images my model was able to classify correctly and kind of images my model failed to classify.

For this I divided my whole data set into three -

After looking at number of samples, here is what I observed —

2. The images which had a brownish midrib and brownish spots were mostly classified as scab.

Leaves which were classified as Scab

3. The images which had yellow-orange patches were mostly classified as rust.

Leaves which were classified as Rust

4. The images which had a mix of brown and yellow-orange spots were mostly classified as Multi Diseased

Leaves which were classified as Multi Diseased

5. There are cases in which the leaf appear to be spotless but are actually diseased. In few of such cases the model predicted them to be Healthy.

Leaves which were diseased but were classified as Healthy

6. There were also some cases in which the model classified the multi diseased as either Scab or Rust.

To improve on point 5 and 6 above, the model needs to be trained with more of such images.

I used simple HTML and flask to design a web application for the model. Check out this demo -

There are many interesting experiments which we can do -

For all the people who wish to experiment and improve my work, please find the whole code in my Github profile (link given below).

Profile

Add a comment

Related posts:

What is the best way to read programming books?

Books on technologies (programming languages, web-development, creation of mobile applications) are filled with practical examples and tasks, and therefore you need to thoroughly “allowed to fold”…

Politics I

She lets her head fall onto the desk, semi-defeated. She makes a sound, which is somewhere between why-is-this-world-so-cruel and please-kill-me. She raises her head a few inches, stops, and then…

Can Writing Be as Fun as Playing Video Games?

Have you ever spent hours doing nothing, watching TV, or playing video games and thought to yourself: if only I could write with the same level of commitment and discipline? Is there a way to make…