What Makes Good Ag Big Data?

Agriculture technologies using data to offer farmers insights on their operations are gathering pace. In 2015 precision agriculture startups, which all use data in some way, raised $661 million, according to AgFunder’s annual agtech investing report. They ranged from data collection hardware devices and drones to pure software companies offering data analysis and decision support tools. (We explored the offerings of eight different big data companies in a story last year.)

In 2016, companies in the segment will no doubt continue to raise capital from VCs as the farm becomes increasingly digitized. But there’s still a long way to go before technologies using big data become commonplace on the farm.

One of the issues the segment is facing is in the standardization of data across different technologies and applications. AgGateway, a non-profit business consortium, is currently working on producing a set of data standards for the whole range of players using or selling big data to work to. For the software-only options out there, this is particularly important as they plan to accumulate and use data from several different sources.

As you will have read in our story from last year, there are some integrated big data companies offering both sides of the equation: data collection, and decision support software analyzing the data. As one farmer and agtech entrepreneur told AgFunderNews, these integrated offerings can reach the market faster than those reliant on data feed partnerships and standardization with third parties. Current partnership activity in precision ag also points to this, as we reported earlier this week.

But as Wade Barnes, CEO of Farmers Edge and John Corbett, CEO of aWhere have told AgFunderNews in the past, the technology is only as good as the data it collects. And this is another big challenge facing the subsector: how to ensure the data it’s collecting and analyzing are reliable and accurate.

We spoke to three of the most established ag data companies to find out more about their methods of data collection and how they think their data will set them apart from others.

Farmers Edge, the Canadian precision ag company which recently raised $41 million, collects data from its own sensors attached to farm machinery, dubbed the CanPlug. It acquired this technology in 2014. It also installs weather stations every 2,500 acres as part of its effort to digitize the farm.

aWhere is an agronomic weather data platform, which has started sharing its data with third parties such as MyAgCentral, a cloud-based data platform for growers, and more recently AgVerdict, the strategic decision-making toolkit from Wilbur-Ellis. aWhere’s CEO John Corbett prides himself on the quality, depth, and locality of the weather data his stations provide.

FarmLink, an ag data analytics company, which recently raised $24.6 million at Series C, collects data from sensors attached to the machinery it leases out. As with the others, it also uses data from public sources such as the USDA and satellite imagery.

What makes good data?

Farmers Edge CEO Wade Barnes: While other industry players are relying on pre-existing data pools, Farmers Edge is creating new, high-quality data sets in regions that have traditionally lacked accurate, valid, reliable, complete, unbiased and consistent field intelligence. A large part of our value proposition is equipping growers with a rich, temporal-geospatial data collection and analytics platform that optimizes data quality. Our independent farm management platform collects real-time, field-centric information, processes it and then delivers growers the highest quality and most accurate data in the industry. Through our proprietary, patented platform, we create management zones and provide real-time data from the field-level, enabling highly precise, predictive models that support growers’ decisions around a number of essential variables, including crop stages, field operations, pest, and disease pressure, logistics management, and soil needs and nutrient requirements.

aWhere CEO John Corbett: For us and any part of the agricultural value chain that we provide data to, the data must meet the 4 C’s : Current, Correct, Complete, Consistent to ensure its value. First, data must be Current and Correct. These criteria are the first two metrics that define high-quality data. The data must be Current so that the signal is actionable when the customer wants it and, of course, it must be correct. The data must also be Complete and Consistent. This means the data cannot be lacking in a key attribute. For agriculture, all ag-meteorology variables are needed, including temperature, rainfall, humidity, solar radiation and wind. Other themes require different variables, but clients need a complete solution set. Consistency provides the on-demand and accessibility characteristic. This means providing standard access across the industry (i.e., robust API), responding to client issues, and never breaking access while keeping service at 100 percent.

FarmLink CEO Ron LeMay: Imagine you are visiting Washington, D.C., and want to visit the White House. You enter “1600 Pennsylvania Ave, SE,” into your GPS. Instead of the intended “1600 Pennsylvania Ave, NE”, where you expect to find the President’s home, you find an apartment building. The lesson? One slight error can lead to significantly different outcomes – and in agriculture, potentially much costlier. Good isn’t good enough when you are making decisions that directly impact the bottom line. Agriculture must demand high-quality data, not just good data. High-quality data should be sufficiently reliable to justify making important decisions so that it is actionable. To make data actionable, extraordinary care should be taken across the board — from data collection through quality assurance to management — with rigorous quality assurances at each step. In our first year, we discarded the majority of our data because it lacked the quality for producers to use as the foundation for their big decisions. Fast forward to last year. More than 95 percent of collected data met our standard for being actionable. The adage “garbage in, garbage out” rings true. Farmers can gain deeper insights to increase yield and profitability while reducing risk by accessing data they trust. Poor quality data not only becomes a financial risk for farmers but has the potential to slow adoption of precision agriculture tools and practices that are necessary to have a timely and lasting impact on global productivity.

How do you vet the quality of your data?

Farmers Edge: We assure data quality through proactive governance. The Farmers Edge data Quality Assurance (QA) program begins in the field and is applied across our entire business platform — from equipment calibration to automated data collection methods to data storage schemes, application code algorithms and data science applications. By comparison, other data QA we’ve come across can be less rigorous and rely only on data QA integrity checks, removing duplicates in the data set, standardizing field values, and detecting null-value segregations. Our data QA is strict, intensive and is scaling globally.

aWhere: We test our data against independent data with similar characteristics and attributes. Where aWhere agricultural intelligence then distinguishes itself is in the specific targets we focus upon. Our quality tests target agricultural issues and agricultural business questions by geography, and time. We optimize our data for the growing season; for agricultural areas; and for agriculturally sensitive thresholds — the decision-driving information. For example, we examine the accuracy of temperature data through the lens of ‘growing degree days’ then test this against physiological observations in the field. Do models predict pomological maturity accurately? Does our temperature data drive growth stage models so that the field check of the models is aligned? This is an important point because the quality is not defined first, or only through comparison to a ground station; ground stations — from sensor calibration to physical location — have separate quality issues. Yes, we compare to ground stations, but we work to do so through the lens of an agricultural decision. Is it time to irrigate? Spray at a critical growth stage? These are decision driving issues about which accuracy and data quality must be vetted for agricultural operations.

For rainfall, we focus on the 2mm – 35mm range for a 24 hour time period. We optimize our algorithms and interpretation of satellite data to be most accurate in this range during the growing season. Rainfall above this threshold has a large surface runoff character – soil moisture does not increase. The quality of ag-met data is dependent upon the business question that’s being studied. We optimize our algorithms to be accurate for key thresholds supporting agricultural business questions.

FarmLink: Our quality control starts at the point of collection, including a 300-point inspection with 28 points that impact yield readings to ensure accurate data collected in real time. Our process compensates for operator inconsistencies, combine calibration errors, and land variability. Quality assurance and validation continue as our data moves from the field into our proprietary data set. From there, we combine machine learnings with human insights to model the impact of environmental variables and farming practices on yield. The resulting data analytics pinpoints what drives productivity in the most profitable and sustainable manner in a specific field.

How do you curate data?

Farmers Edge: Our platform analyzes and curates data from a variety of sources, including weather conditions, farm equipment, and soil, and routinely processes query results that can contain up to hundreds of billions of unique pieces of information. As the world’s leader in successfully integrating multi- and hyperspectral satellite and aerial imagery, our system can process terabytes and even petabytes of data, which allows us to create variable rate prescriptive models and to monitor overall crop health. Farmers Edge utilizes and masters informational systems that power the largest websites and data-mining services in the world, including distributed file systems, structured and unstructured data storage, dynamic data retrieval methods and many advanced statistical modeling approaches, such as Markov Chain Monte Carlo methods.

aWhere: As an online ‘solution as a service’ organization, data curation is important. As time passes, data that were once current and actionable ‘this season’ become historical – and access, on-demand and fast, remains important as risk, trends, and the change in environment over time become the questions our data address. Therefore, our curation includes not only ever-expanding access and evolution of API models but maintains a reference to the source and algorithm involved in data creation.

FarmLink: FarmLink analysis is based on a complex recipe of private and public data sources, tested and retested, to produce unique indices with an accuracy level greater than 90 percent. We combine our proprietary data with 20 years of information from public data providers including USDA, NRCS, NOAA and others. Our data platform has been validated by collaborations with industry leaders to deliver highly sophisticated tools for top producers and the industry.

What do you think is the next big horizon for ag big data? How do you see it growing as a segment during 2016? We’d love to hear from you, especially if you are a farming using a data technology! Send your two cents to: [email protected].

Share this article

What Makes Good Ag Big Data?

From 20 cows to $30m: Sid’s Farm bets on premium dairy in evolving Indian market

AgriFood Signals: BinSentry bags another $25m, Lloyds backs regen ag, Zepto IPO