Usage

Pre-requisites

In order to access the data from Netatmo citizen weather stations (CWS), you need a Netatmo username, password, client ID and client secret. You can obtain the former credentials following the steps below:

  1. Create an Netatmo account at auth.netatmo.com/access/signup. The entered email address and password will respectively be the username and password used in netatmo-geopy.
  2. From your account, navigate to dev.netatmo.com/apps/ and click "Create" to create an app. This only serves to obtain a client ID and secret key, so you do not need to enter any specific information in "app name" and "description".
  3. Once the app is created, save the generated "client ID" and "client secret" which will appear in the form below (entitled "App Technical Parameters"), which will be used in netatmo-geopy.

Features

First import netatmo-geopy as in:

import netatmo_geopy as nat

You can then use netatmo-geopy to get the CWS temperature measurements for a region of interest as in:

# latitude/longitude bounds of the region of interest
lon_sw, lat_sw, lon_ne, lat_ne = 6.5175, 46.5012, 6.7870, 46.6058

# init the CWS recorder
cws_recorder = nat.CWSRecorder(
    lon_sw,
    lat_sw,
    lon_ne,
    lat_ne,
    username="<your-netatmo-username>",
    password="<your-netatmo-password>",
    client_id="<your-netatmo-client-id>",
    client_secret="<your-netatmo-client-secret>",
)

Alternatively, instead of passing the Netatmo credentials to the initalization of CWSRecorder, you can also set them as the NETATMO_USERNAME, NETATMO_PASSWORD, NETATMO_CLIENT_ID, and NETATMO_CLIENT_SECRET environment variables, then netatmo-geopy will use them automatically when required. In order to use more concise code snippets, the remainder of this page assumes that the Netatmo credientials are provided using the environment variables.

Then, the current snapshot of CWS measurements in the region can be obtained as in:

gdf = cws_recorder.get_snapshot_gdf()
gdf.head()
2022-02-12T19:13 geometry
station_id
02:00:00:01:5e:e0 6.6 POINT (6.82799 46.47089)
02:00:00:22:c0:c0 4.9 POINT (6.82904 46.47005)
02:00:00:2f:0b:16 3.5 POINT (6.82516 46.47294)
02:00:00:59:00:2a 3.8 POINT (6.84547 46.46779)
02:00:00:52:ed:5a 3.8 POINT (6.87359 46.47067)

You can also use the plot_snapshot to plot the data on a map:

nat.plot_snapshot(gdf)

lausanne-snapshot

Schedule a periodic job to record CWS data for a region

It is possible to use netatmo-geopy to set up a periodic job to record CWS measurements. To that end, you need to provide the time_unit argument to the initialization of CWSRecorder, as in:

snapshot_data_dir = "data/lausanne"
cws_recorder = nat.CWSRecorder(
    lon_sw, lat_sw, lon_ne, lat_ne, dst_dir=snapshot_data_dir, time_unit="hour"
)

which will dump an hourly snapshot of CWS measurements to the directory specified with the dst_dir argument. The time_unit argument can be combined with the interval, at and until arguments, e.g., the following task will record the CWS measurements of the region at the 30th minute of every three hours for the next 24 hours:

from datetime import datetime, timedelta

cws_recorder = nat.CWSRecorder(
    lon_sw,
    lat_sw,
    lon_ne,
    lat_ne,
    dst_dir=snapshot_data_dir,
    time_unit="hours",
    interval=3
    at=":30",
    until=datetime.now() + timedelta(hours=24),
)

See the documentation of schedule for more examples on scheduling periodic jobs.

Note that Netatmo CWS data are measured every 5 minutes by the modules and sent to the servers every 10 minutes, so the period when recording CWS data should not be shorter than 10 minutes.

Assemble CWS snapshots into a single time-series geo-data frame

After a time series of snapshots have been dumped to a directory, the CWSDataset class can be used to assemble the data into a single geo-data frame, i.e., the ts_gdf attribute:

cws_dataset = nat.CWSDataset(snapshot_data_dir=snapshot_data_dir)
cws_dataset.ts_gdf.head()
2022-02-06 09:30:00 2022-02-06 12:30:00 2022-02-06 15:30:00 2022-02-06 18:30:00 2022-02-06 21:30:00 2022-02-07 00:30:00 2022-02-07 03:30:00 2022-02-07 06:30:00 geometry
station_id
02:00:00:01:5e:e0 1.542567 4.815597 0.550991 0.948516 2.600634 0.312831 3.088689 3.442664 POINT (6.82799 46.47089)
02:00:00:22:c0:c0 4.093453 2.291656 0.258319 3.346670 4.571841 2.299931 0.447544 4.558038 POINT (6.82904 46.47005)
02:00:00:2f:0b:16 1.588176 4.521104 3.060942 1.931824 3.027879 2.567090 1.326534 0.043705 POINT (6.82516 46.47294)
02:00:00:59:00:2a 0.452659 2.443335 2.270666 0.867035 3.965786 2.200247 3.443507 1.314949 POINT (6.84547 46.46779)
02:00:00:52:ed:5a 1.022992 1.795367 1.099024 2.775641 1.663362 1.033040 1.875658 1.031009 POINT (6.87359 46.47067)

Quality controls

To ensure the quality and reliability of the collected CWS temperature measurements, the CWSDataset class implements three quality control methods based on the work of Napoly et al. (2018) 1.

Duplicated station locations

First, multiple stations may share the same location, which it is likely due to an incorrect set up that led to automatic location assignment based on the IP address of the wireless network. To that end, the get_mislocated_stations method can be used as in:

mislocated_stations = cws_dataset.get_mislocated_stations()
mislocated_stations.head()
station_id
02:00:00:01:5e:e0    False
02:00:00:22:c0:c0    False
02:00:00:2f:0b:16    False
02:00:00:59:00:2a    False
02:00:00:52:ed:5a    False
Name: geometry, dtype: bool

Then, pandas boolean indexing can be used to filter out the mislocated stations from the time series geo-data frame as in:

cws_dataset.ts_gdf[~mislocated_stations]
Outlier stations

Measurements can show suspicious deviations from a normal distribution. Stations with high occurrence of such measurements can be related to radiative errors in non-shaded areas or other measurement errors 2. A boolean series of stations that may be considered outliers (based on a modified z-score using robust Qn variance estimators) can be obtained with the get_outlier_stations method as in:

outlier_stations = cws_dataset.get_outlier_stations()

The statistical parameters used to determine whether a station is considered an outlier can be customized using the low_alpha, high_alpha and station_outlier_threshold arguments.

Indoor stations

Finally, stations whose time series of measurements show low correlations with the spatial median time series are likely set up indoors. These stations can be determined using the get_indoor_stations method as in:

indoor_stations = cws_dataset.get_indoor_stations()

The Pearson correlation threshold to consider that a station is set up indoors can be customized using the station_indoor_corr_threshold argument.

Combining quality controls

In order to apply all quality controls, the boolean indexes described above can be combined as in:

ts_qc_gdf = cws_dataset.ts_gdf[~(mislocated_stations | outlier_stations | indoor_stations)]

References


  1. Adrien Napoly, Tom Grassmann, Fred Meier, and Daniel Fenner. Development and application of a statistically-based quality control for crowdsourced air temperature data. Frontiers in Earth Science, pages 118, 2018. 

  2. Fred Meier, Daniel Fenner, Tom Grassmann, Marco Otto, and Dieter Scherer. Crowdsourcing air temperature from citizen weather stations for urban climate research. Urban Climate, 19:170–191, 2017.