Automated QC of Environmental Data

Part 2: A Quality Control API This is the second in a two part series on automated quality control of environmental data. The first part gives an overview of quality control with some specific methods for environmental data. In this post, I describe how I created an API (api.crceanalytics.com) to automatically flag uploaded data using some of the techniques described in Part 1. Motivation A couple years ago I began work on a Python package: EnviroDataQC (https://github.com/chrisrycx/EnviroDataQC) that I could use to perform automated quality control on meteorological data being uploaded to DyaconLive, a web portal for Dyacon weather stations. The library didn’t need to be overly complex, I just wanted a way to perform range and behavioral checks in a few different contexts without having to re-write quality control code every time. Recently, I thought it might be interesting to build an API that would connect to EnviroDataQC, creating a means for anyone, anywhere to quickly perform data QC. I built a prototype limited to air temperature data at api.crceanalytics.com and gave it a simple, interactive front end. I think the concept could be taken a lot further eventually. ...

February 3, 2022 · 7 min · Chris Cox

Automated Quality Control of Environmental Data

Part 1: Quality Control Background Overview This is the first of a 2 part series on automated quality control of environmental data. This part gives an overview of quality control methods and the second part (under development) details how I used Python to create a demonstration API (api.crceanalytics.com) that performs automated QC on uploaded air temperature data. Environmental Data Quality Control Data is always going to need some sort of quality control after it is collected. This is especially true of environmental data collected from autonomous sensors placed in challenging environments. Sensors often break or experience electrical problems that lead to gaps in data and/or anomalous readings. Furthermore, bad data may also result from calibration or maintenance performed while the sensor is operating. Preventative maintenance and ongoing data monitoring, practices known as quality assurance (Campbell et al., 2013), help to reduce that amount of problematic data collected. However, as sensor technology gets cheaper, much more data is collected and manual methods for identifying suspicious and bad data becomes increasingly difficult and subjective (Jones et al., 2018). Automation of data quality control is going to be increasing important for ensuring high quality environmental data. ...

January 14, 2022 · 8 min · Chris Cox