Object Detection in an Aerial Imagery using Deep Learning Neural Network. (Review on Longan Fruit Tree).

Published on July 1, 2026

Object Detection in an Aerial Imagery using Deep Learning Neural Network. (Review on Longan Fruit Tree).

Longan fruit tree is one of the most important subtropical fruits in Thailand, which is currently the biggest producer in the world. The longan plantation is mostly cultivated in the northern part of Thailand, where 172,229 households grow longans and 80% of the production is exported to China. Therefore, this plantation has a huge economic value, supporting the lifestyle of many Thai Farmers. This article presents an approach, for the detection & counting of Longan plantation, and also explains its application for real time fertilizer spraying by enabling VPU (Intel Movidius Neural Compute Stick) in Fertilizer Spraying Drone (Figure 4 & 5). Not surprisingly, there are a lot of advantages of counting plantations, with some being: provides farmer valuable insights to their farmland supports yield prediction, helps irrigation management, growth monitoring; improving farm productivity and thus maximizing profit. enables automated spray of fertilizers & pesticides using VPU Intel Movidius Neural Stick . Our primary purpose was to detect longan fruit tree on the fly for fertilizer spraying purpose, which requires detector to perform at near real time, therefore speed as well as accuracy was our major concern. However, there is a huge trade-off in choosing an object detection algorithms in terms of speed and accuracy.

Fig. 1 Study Area, consisting longan plantation in tropical farmland of Northern Thailand.
Fig. 1 Study Area, consisting longan plantation in tropical farmland of Northern Thailand.

The data acquisition was carried on May 27, 2018 in an area of 0.95 sq.km consisting of mainly longan plantation. The aerial imagery was captured at a resolution of 4000 x 3000 pixels, which means the training phase to spend more time per epoch resizing our input images than the actual training. Our architecture during the training phase resizes the images to 112 x 112 , which means that the images are resized down by almost a factor of more than 30 times. To prevent this, we used GDAL tool ie. gdal2tiles (https://www.gdal.org/) to convert the images to a resolution of 256 x 256 before actually feeding our data to Convolutional Neural Network (CNN). A total of 3500 manually annotation of longan fruit were made, out of which 350 were supplied as test dataset and the remaining 3150 were the training dataset. Since, our dataset wasn’t substantially large, therefore we applied data augmentation techniques (6) namely, flipping along the horizontal and vertical axis, adjusting brightness, saturation and contrast.

Figure 2: Implemented methodological workflow for longan detection
Figure 2: Implemented methodological workflow for longan detection

The CNN implemented for this study was a very deep ResNet with 101 parameters termed as ResNet 101 (1,2) which was implemented based on Tensorflow Framework. The main parameters of the CNN such as: the number of kernels in the first convolutional layer, the number of kernels in second convolutional layers, the number of hidden units in FC layers, were adjusted continuously until we determined the best combination where the accuracy was highest when compared with the test dataset. Likewise, for the object detection algorithm, we applied Faster R-CNN which is a region proposal network which performs state of the art in terms of both speed and accuracy. For more details on Faster R-CNN, visit: 3, 4 and highly recommended link: 5 ( explained with code). From my experience, small object detection in an aerial imagery is quite complex and needs to take into account some parameters that controls the detection accuracy such as, anchor boxes ( scale and aspect ratios), number of proposals, number of stride during the Region proposal network phase etc. One tip: try reducing the scale and bring aspect ratio closer to the object that you want to detect; increase number of proposals , but not so much that you affect your detection performance and it will increase computation time too. The implemented deep learning architecture has demonstrated very good performance in longan tree detection which has been demonstrated in Figure 3.

Figure 3: Detection results using the workflow
Figure 3: Detection results using the workflow
Figure 4: Instruments for the data acquisition (DJI Drones), and insitu fertilizer spraying (Custom made green color drone) enabled VPU using Movidius Neural Stick (Fig 5).
Figure 4: Instruments for the data acquisition (DJI Drones), and insitu fertilizer spraying (Custom made green color drone) enabled VPU using Movidius Neural Stick (Fig 5).
Figure 5: Movidius Neural Compute Stick, enabling VPU in Fertilizer Spraying Applications
Figure 5: Movidius Neural Compute Stick, enabling VPU in Fertilizer Spraying Applications

For now, we have a good model for detecting longan fruit; our remaining work is to convert the frozen inference graph to the graph format which is the only supported format by the Intel Movidius Neural Compute Stick. I have also started posting medium blogs for detailed tutorials on Embedded Visions AI using Movidius Stick, so please have a look: https://medium.com/@ghimire.aiesecer Moreover, the future prospects of this research would focus on exploring different classification and object detection approach, and integrate together to maximize the performance in terms of both accuracy and speed for the real time fertilizer spraying applications.

Suman Ghimire

Suman Ghimire

CEO/Chairman

Suman writes about the precision Agriculture technology on the topic of "Enhanced Vision for farmers in Nepal".