Data Accuracy and Precision


In  🧑‍💻Module 1: Data 101 , we introduced the different types of data: physical records, digital records, qualitative data, quantitative data, and more. Errors can occur in any type of data and it is important to understand how errors affect both the accuracy and precision of your data.

Accuracy: How closely the data reflects the TRUE value (ie. how close is your data to reality?).

Precision: How reproducible is the same measurement under equivalent circumstances (ie. are repeated observations similar to one another?).

Consider the image of four dartboards below where the true value (e.g. reality) is represented by the red bullseye and measured data (e.g. observations) are represented by the darts landing on the dartboard. 


Scenario (a): Three observations are collected but none of them reflect reality very well.  The three observations are also widely dispersed. Low accuracy and low precision.

Scenario (b): Three observations are very similar but none of them reflect the true value very well - they are equally off target. Low accuracy, but high precision.

Scenario(c): Three observations are relatively close to reality yet they are all still fairly spread out. High accuracy, but low precision.

Scenario (d): All three observations are close to the true value AND all three observations are tightly grouped together. High accuracy and high precision.



Accuracy vs. Precision - Real World Examples

For a more concrete example, let’s look at some data from  Lakeland College’s Student Managed Farm  in Vermilion, Alberta. Because Lakeland College is used for educational purposes, the farm uses multiple agtech platforms for data collection and display. For example, Lakeland’s New Holland combine typically runs with a number of  wiring harnesses, or pucks, from different agtech vendors installed for teaching purposes. 

The following three figures are harvest reports for a quarter section of wheat from the 2022 growing season generated by different agtech software tools. We will use this set of harvest reports to examine the accuracy and precision of the data collection in this field by comparing the results from the different pieces of software.


Figure 1: Harvest report generated at Lakeland College using  Farmer’s Edge  Farm Command software.


Figure 2: Harvest report generated by  New Holland MyPLM  software at Lakeland College.


Figure 3: Harvest report generated by  Bayer Climate FieldView  at Lakeland College.



Yield Report Data - Lakeland College 2022

Title
Farm Command
New Holland
Fieldview
Field Acres
134.71
135.9
N/A
Acres Harvested
134.78
135.4
188.6
Average yield (bu/acre)
63.75
70.94
59
Total yield (bu)
8592.09
9602.10
N/A
Average moisture (%)
11.87
12.1
11.8

As a first step, let’s look at the data from the Farmer’s Edge and New Holland reports. As you can see there is a relatively small amount of variation in the measured field acres, acres harvested, and the average moisture across the platforms. Overall, this suggests that these data may be fairly precise observations. 

Now let’s look at the final set of observations from  Bayer Climate FieldView .  In this case, the measured acreage is significantly higher at 188.6 acres - approximately 30% larger than the value obtained from Farmer’s Edge or New Holland. Similar differences are also seen in the total yield and yield per acre measured by FieldView compared to the other agtech platforms.

Having seen the additional data from the FieldView software, we are left wondering about the overall accuracy and precision of our yield measurements. What is causing it to diverge from the other two datasets?  Some potential explanations could be:

  • Different field boundaries definitions.
  • Incorrect or neglected settings in the data collection equipment (e.g. yield monitor in the combine) or software.
  • Calibration of the data collection equipment was not done correctly.

Despite the larger difference between data from FieldView and the other two platforms, it is difficult to come to a definitive conclusion about which value is more accurate from this limited set of observations.

While a commercial farm isn’t likely to have multiple agtech platforms collecting yield data to perform this kind of cross-comparison, the staff at Lakeland College could consider a number of approaches to incrementally troubleshoot the yield data they are measuring including: 

  • Confirming the calibration and configuration: Before the next harvesting operation, the calibration and configuration of each piece of equipment, and the settings that are applied within the agtech software tool, should be checked and rechecked. 

  • Checking with an alternative data source: The accuracy of the yield data measured by the combine yield monitor and the three agtech software tools could be compared with an independent measure of yield, taken for example from a grain cart with a scale designed to measure the weight of the grain deposited in the cart. In the context of remotely sensed data (e.g. satellite or drone imagery) this approach to confirming data with additional measurements taken on-farm is called ground truthing. 

Ultimately, getting to the bottom of this data quality issue will require more observations and the close attention of the staff operating the harvesting equipment. This example shows the importance of thinking critically about the data you measure, not to mention the importance of taking an incremental approach to resolving data quality issues. 



Next:  What Causes Errors?