Published as a conference paper at ICLR 2020
S ATELLITE - BASED P REDICTION OF F ORAGE C ONDI -
                                         TIONS FOR L IVESTOCK IN N ORTHERN K ENYA
                                          Andrew Hobbs                                            Stacey Svetlichnaya
                                          University of California - Davis                        Weights & Biases
                                          awhobbs@ucdavis.edu                                     stacey@wandb.com
                                                                                     A BSTRACT
arXiv:2004.04081v2 [cs.CV] 23 Apr 2020
                                                  This paper introduces the first dataset of satellite images labeled with forage qual-
                                                  ity by on-the-ground experts and provides proof of concept for applying computer
                                                  vision methods to index-based drought insurance. We also present the results of a
                                                  collaborative benchmark tool used to crowdsource an accurate machine learning
                                                  model on the dataset. Our methods significantly outperform the existing tech-
                                                  nology for an insurance program in Northern Kenya, suggesting that a computer
                                                  vision-based approach could substantially benefit pastoralists, whose exposure to
                                                  droughts is severe and worsening with climate change.
                                         1   I NTRODUCTION
                                         In recent years, a new literature has emerged to combine computer vision methods with satellite im-
                                         agery and apply these to important challenges in developing country agriculture. One set of studies
                                         focuses on the potential to use computer vision as a less expensive and more accurate substitute for
                                         traditional forms of data collection such as crop cutting and surveys. Burke & Lobell (2017) show
                                         that satellite images are as accurate as survey data in identifying variation in agricultural yields
                                         for smallholder farmers. Rustowicz et al. (2019) provide the first crop type semantic segmenta-
                                         tion dataset for smallholder farms in Africa, with labeled data from South Sudan and Ghana, and
                                         train a neural network that outperforms the previous state of the art. Kamilaris & Prenafeta-Boldú
                                         (2018) review the use of convolutional neural networks (CNNs) in agricultural applications, and
                                         note that while CNNs often outperform other methods, data quality is critical to success. Our study
                                         also builds on a growing literature applying computer vision methods to aerial data more generally.
                                         Maggiori et al. (2016) introduce a CNN-based framework for dense labeling of remote-sensing de-
                                         rived images, and Mnih & Hinton (2012) find that using an appropriately chosen loss function can
                                         substantially improve results when using aerial data with noisy labels.
                                         Weather risk is one of the most important challenges affecting developing country farmers, and it is
                                         especially serious for the pastoralists of Northern Kenya. Droughts frequently kill large numbers of
                                         livestock, which are the primary assets owned by pastoralist households and a key source of nutri-
                                         tion. Worse, climate change is expected to increase drought frequency and severity Seneviratne et al.
                                         (2017). Index-based insurance is designed to reduce the cost of droughts by using some measure-
                                         ment (the index) to predict average losses in a given region. When average losses reach a threshold,
                                         index insurance products provide farmers or pastoralists with cash to offset their losses. Early index
                                         insurance products used rainfall monitors and crop cuttings to estimate average losses; more recent
                                         products have relied on remote sensing methods and satellite data. In the region studied in this pa-
                                         per, the International Livestock Research Institute’s (ILRI) Index Based Livestock Insurance (IBLI)
                                         program has been in operation since 2011 (Chantarat et al., 2013). It relies on an index derived
                                         from the Normalized Differenced Vegetation Index (NDVI), a remote sensing-based measure that
                                         has been shown to accurately capture changes in green vegetation.
                                         As studied in detail by Jensen et al. (2019), index accuracy and cost are important determinants of
                                         the value of insurance for farmers. This study explores whether a computer vision-based approach
                                         can outperform the NDVI-based approach currently used for the IBLI program in Northern Kenya.
                                         Index accuracy plays a strong role in determining the value of index insurance to farmers, and poor
                                         accuracy is a major factor limiting farmer adoption of insurance (Carter et al., 2017). Our results
                                         demonstrate that computer vision methods can generate more accurate indices for insurance pur-
                                                                                           1
Published as a conference paper at ICLR 2020
poses than existing methods and provide proof-of-concept for a machine learning-based insurance
product.
2     DATASET
The dataset relies on two sources: 1) ground-based labels and photos from pastoralists assessing for-
age conditions on site and 2) satellite images from NASA’s LANDSAT mission.1 We select satellite
images centered on the geolocations for which we have labeled ground-level data. This approach
yields a supervised labeling of forage quality on satellite images, and it is highly generalizable. In
any cases where ground-level data on an agricultural outcome is available (e.g. crop yields, disease
status), this method can be used to develop a satellite-based index. Furthermore, it enables model
ensembling or transfer learning across countries or indices (e.g. yields of related crops), which could
improve model robustness and reduce the need for new supervised labels.
      Figure 1: Images were labeled using pastoralist report of forage conditions on the ground.
2.1    G ROUND - LEVEL L ABELS
A core challenge in applying computer vision methods to satellite data, particularly in develop-
ing countries, is the need for large numbers of accurate labels. Generating these labels is espe-
cially difficult because human experts cannot explicitly label aerial data with outcomes like poverty
level or forage quality. A number of creative approaches have been used to circumvent this prob-
lem. Sheehan et al. (2019) combine georeferenced Wikipedia entries with satellite images to predict
community-level asset wealth and education, and Jean et al. (2016) and Tingzon et al. (2019) use
nighttime lights data to train models that estimate consumption expenditure and asset wealth.
We tackle the labeling problem by leveraging a set of labels generated on the ground by pastoralist
nomads–experts in assessing forage quality in the region of Northern Kenya (data collected by ILRI
in 2015). At each reporting point, pastoralists were asked to estimate the number of cows that could
eat for a single day within 20 meters of their location on a scale of 0-3+. Labels are discrete, so
pastoralists must pick either 0, 1, 2, or 3+ based on their best estimate.
2.2    G ROUND - LEVEL I MAGES
At each labeled location, pastoralist data collectors also took a photo of the surroundings from their
vantage point. We used a subset of these photos to validate the initial pastoralist estimates through
Amazon Mechanical Turk. They are included with the dataset as a potential tool for checking data
quality. This verified subset of images also forms the validation set for the benchmark.
   1
     In recent years, much higher resolution satellite images have become available from companies like Planet
and Digital Globe. This paper uses 30m resolution data freely available from LANDSAT because higher res-
olution images were not available during the time period when the ground-level labels were generated. We
expect that even better results will be possible using higher-resolution images in the future.
                                                      2
Published as a conference paper at ICLR 2020
2.3     S ATELLITE I MAGES
For each labeled location, we downloaded a 65x65 pixel satellite image from within one week of
the date on which the label was generated. When images were not available that met this criterion,
the observation was dropped. The resulting dataset contains over 100,000 labeled images.
LANDSAT images are relatively low spatial resolution: each pixel represents a 30 meter square.
However, each pixel contains more data than in typical images: in addition to the standard 3 RGB
bands, LANDSAT contains 8 other bands with information outside of the visible spectrum. The
near-infrared band is of particular interest, because it is known to be helpful in measuring vegetation
and is an important part of the index currently used in the IBLI program.
3       C OLLABORATIVE B ENCHMARK
The data described above are part of an ongoing collaborative benchmark hosted by Weights &
Biases2 . The platform differs from many others in the sense that all participants can share their code
and training workflows, discuss their approaches, and potentially build on each others’ findings.
This collaborative structure is designed to promote faster progress and knowledge sharing across
participants (Biewald, 2020).
                        Figure 2: The top ten entries as of February 16, 2020.
The highest accuracy reported at the time of writing is 77.8%. The model is based on the EfficientNet
architecture proposed by Tan & Le (2019).3 This model slightly outperformed several ResNet-based
models and a tuned version of the CNN baseline. The CNN baseline is an intentionally simple
example for the benchmark: three convolutional blocks with max pooling, dropout on blocks 2 and
3, and two fully-connected layers before the softmax. The layer sizes and hyperparameters are easily
configurable to encourage participants to explore model variants. All of the models in the leaderboad
significantly outperform an NDVI-based model, which achieved a predictive accuracy of just 67%.
Since the difference between a basic CNN and a ResNet is only a few percent accuracy, there are
clear opportunities for improvement: tuning different network architectures, loss functions, optimiz-
ers, and other hyperparameters. On the data side, promising next steps include a finer-grain analysis
of the spectral bands (of the 11, a few seem to add more noise than signal, such that removing them
improves accuracy), filtering out images with obscuring clouds, data augmentation (e.g. rotate, flip),
and addressing the class imbalance (roughly 60% of the data is of class 0, classes 1 and 2 have 15%
each, and the remaining 10% is class 3+). The benchmark collaboration will continue to support
and cross-pollinate these model improvements.
4       A NALYSIS
In this section we explore several extensions to the modeling. First, we explore how much predictive
accuracy is lost when the images are center-cropped, reducing context. There is little loss of accuracy
    2
     See https://www.wandb.com/articles/droughtwatch to learn more about the benchmark
    3
     Details     on   this  as   well     as   other    leading    models     can   be   examined    at
https://app.wandb.ai/wandb/droughtwatch/benchmark/leaderboard
                                                   3
Published as a conference paper at ICLR 2020
even when cropping to 35x35 pixel rather than 65x65 pixel images. Second, we explore reframing
the problem as an ordinal regression problem to ensure that predictions are close even if not perfect.
4.1   I MAGE S IZE
Because each pixel is 30 meters across, the 65x65 pixel images in the dataset represent nearly 1
kilometer squares. While spatial context likely helps generate better predictions, it is intuitive that
these images may be larger than necessary. Sensitivity tests indicate that networks trained on much
smaller squares are able to perform similarly well: prediction accuracy did not decrease substantially
in our tests until images were smaller than 35x35. Smaller images also reduce training and prediction
time (since new image data must be downloaded from a server).
4.2   O RDINALITY
The benchmark and initial analysis relied on standard neural network architecture for categorical
outputs: labels encoded as one-hot vectors and the final layer as a softmax activating at most one
neuron in the output layer. A prediction is accurate only if it is exactly correct, which means that
mistakenly categorizing a ‘3’ as a ‘0’ is scored identically to categorizing a ‘2’ as a ‘1’, even though
the second is intuitively a smaller error.
Ordinality is especially important for this problem for two reasons. First, there is inherent subjectiv-
ity and uncertainty in the labeling process. Two expert pastoralists could look at the same 20 meter
circle, and one might estimate it could support one cow while the other would estimate two. It’s
much less likely that one would say ‘3+’ and the other ‘0.’ Second, for effective index insurance it is
not essential that we have an exactly correct estimate of conditions at each point in space and time,
but it is essential that our index only make small errors. While a small error might lead the insurance
product to slightly underpay in the event of a drought, a large error might lead to a complete failure
to pay, defeating the purpose of the product.
We follow the approach described by Cheng et al. (2008), replacing the one-hot vectors with a
format that preserves information on ordinality within the sample. For example, a location that can
support 0 cattle is encoded [0,0,0], 1 is encoded as [1,0,0], 2 as [1,1,0], and 3+ as [1,1,1]. Each entry
in the output vector can be thought of as predicting whether the carrying capacity is greater than or
equal to some value: a ‘1’ in the first position of the array means that the location can support one
or more animals; a ‘1’ in the second position means that it can support two or more, etc. Rather than
a softmax, the final layer uses a sigmoid activation function to activate more than one output.
Ordinal accuracy can be evaluated with a binary cross-entropy loss function and binary accuracy
to calculate the share of matching outputs between the prediction and target. In other words, a ‘2’
categorized as a ‘1’ receives an accuracy of 66.66%, while a ‘3’ categorized as a ‘0’ receives an
accuracy of 0% . Scored this way, our ordinal model achieves a validation accuracy of 87.88%.
Because of the difference in scoring and loss functions, the accuracy from the ordinal regressions is
not directly comparable to the categorical models. However, evaluating the best categorical model
with ordinal accuracy would only improve the metric: some of the errors previously counted as 0%
would add 33.33% or 66.66%, depending on the distance between the labels.
5     C ONCLUSION
Computer vision methods have great promise as a means of shielding farmers from the negative
effects of climate change due to their ability to flexibly map satellite data to information on yields,
forage quality, or other measures collected from the ground. This paper provides an initial dataset for
exploring the potential of computer vision for index insurance and proof of concept that computer
vision methods can outperform existing remote sensing techniques. Once trained and deployed,
computer vision models can assess conditions on demand, making it possible to monitor more fre-
quently and compensate the insured more quickly. By increasing both the accuracy and the speed
of insurance payments, computer vision has the potential to substantially increase the benefit of
insurance for farmers.
                                                   4
Published as a conference paper at ICLR 2020
ACKNOWLEDGMENTS
We would like to thank Nathan Jensen at the International Livestock Research Institute for making
the forage quality data we describe in this paper available for the benchmark competition we describe
in this paper.
The forage quality data used in this paper was collected through a research collaboration between the
International Livestock Research Institute, Cornell University, and UC San Diego. It was supported
by the Atkinson Centre for a Sustainable Futures Academic Venture Fund, Australian Aid through
the AusAID Development Research Awards Scheme Agreement No. 66138, the National Science
Foundation (0832782, 1059284, 1522054), and ARO grant W911-NF-14-1-0498.
R EFERENCES
Lukas Biewald. Experiment tracking with weights and biases, 2020. URL https://www.
  wandb.com/. Software available from wandb.com.
Marshall Burke and David B Lobell. Satellite-based assessment of yield variation and its determi-
 nants in smallholder african systems. Proceedings of the National Academy of Sciences, 114(9):
 2189–2194, 2017.
Michael Carter, Alain de Janvry, Elisabeth Sadoulet, and Alexandros Sarris. Index insurance for
 developing country agriculture: a reassessment. Annual Review of Resource Economics, 9:421–
 438, 2017.
Sommarat Chantarat, Andrew G Mude, Christopher B Barrett, and Michael R Carter. Designing
  index-based livestock insurance for managing asset risk in northern kenya. Journal of Risk and
  Insurance, 80(1):205–237, 2013.
Jianlin Cheng, Zheng Wang, and Gianluca Pollastri. A neural network approach to ordinal regres-
   sion. In 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress
   on Computational Intelligence), pp. 1279–1284. IEEE, 2008.
Neal Jean, Marshall Burke, Michael Xie, W Matthew Davis, David B Lobell, and Stefano Ermon.
  Combining satellite imagery and machine learning to predict poverty. Science, 353(6301):790–
  794, 2016.
Nathaniel Jensen, Quentin Stoeffler, Francesco Fava, Anton Vrieling, Clement Atzberger, Michele
  Meroni, Andrew Mude, and Michael Carter. Does the design matter? comparing satellite-based
  indices for insuring pastoralists against drought. Ecological economics, 162:59–73, 2019.
A Kamilaris and FX Prenafeta-Boldú. A review of the use of convolutional neural networks in
  agriculture. The Journal of Agricultural Science, 156(3):312–322, 2018.
Emmanuel Maggiori, Yuliya Tarabalka, Guillaume Charpiat, and Pierre Alliez. Convolutional neural
  networks for large-scale remote-sensing image classification. IEEE Transactions on Geoscience
  and Remote Sensing, 55(2):645–657, 2016.
Volodymyr Mnih and Geoffrey E Hinton. Learning to label aerial images from noisy data. In
  Proceedings of the 29th International conference on machine learning (ICML-12), pp. 567–574,
  2012.
Rose M Rustowicz, Robin Cheong, Lijing Wang, Stefano Ermon, Marshall Burke, and David Lobell.
  Semantic segmentation of crop type in africa: A novel dataset and analysis of deep learning
  methods. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
  Workshops, pp. 75–82, 2019.
Sonia I Seneviratne, Neville Nicholls, David Easterling, Clare M Goodess, Shinjiro Kanae, James
  Kossin, Yali Luo, Jose Marengo, Kathleen McInnes, Mohammad Rahimi, et al. Changes in
  climate extremes and their impacts on the natural physical environment. 2017.
                                                 5
Published as a conference paper at ICLR 2020
Evan Sheehan, Chenlin Meng, Matthew Tan, Burak Uzkent, Neal Jean, Marshall Burke, David Lo-
  bell, and Stefano Ermon. Predicting economic development using geolocated wikipedia articles.
  In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &
  Data Mining, pp. 2698–2706, 2019.
Mingxing Tan and Quoc Le. Efficientnet: Rethinking model scaling for convolutional neural net-
 works. In International Conference on Machine Learning, pp. 6105–6114, 2019.
Isabelle Tingzon, Ardie Orden, Stephanie Sy, Vedran Sekara, Ingmar Weber, Masoomali Fatehkia,
   Manuel Garcia Herranz, and Dohyung Kim. Mapping poverty in the philippines using machine
   learning, satellite imagery, and crowd-sourced geospatial information. International Archives of
   the Photogrammetry, Remote Sensing and Spatial Information Sciences, 42(4/W19), 2019.