Creating a CF-1.6 timeSeries using pocean

Creating a CF-1.6 timeSeries using pocean#

Created: 2018-02-27

IOOS recommends to data providers that their netCDF files follow the CF-1.6 standard. In this notebook we will create a CF-1.6 compliant file that follows file that follows the Discrete Sampling Geometries (DSG) of a timeSeries from a pandas DataFrame.

The pocean module can handle all the DSGs described in the CF-1.6 document: point, timeSeries, trajectory, profile, timeSeriesProfile, and trajectoryProfile. These DSGs array may be represented in the netCDF file as:

  • orthogonal multidimensional: when the coordinates along the element axis of the features are identical;

  • incomplete multidimensional: when the features within a collection do not all have the same number but space is not an issue and using longest feature to all features is convenient;

  • contiguous ragged: can be used if the size of each feature is known;

  • indexed ragged: stores the features interleaved along the sample dimension in the data variable.

Here we will use the orthogonal multidimensional array to represent time-series data from am hypothetical current meter. We’ll use fake data for this example for convenience.

Our fake data represents a current meter located at 10 meters depth collected last week.

from datetime import datetime, timedelta

import numpy as np
import pandas as pd

x = np.arange(100, 110, 0.1)
start = datetime.now() - timedelta(days=7)

df = pd.DataFrame(
    {
        "time": [start + timedelta(days=n) for n in range(len(x))],
        "longitude": -48.6256,
        "latitude": -27.5717,
        "depth": 10,
        "u": np.sin(x),
        "v": np.cos(x),
        "station": "fake buoy",
    }
)


df.tail()
time longitude latitude depth u v station
95 2022-05-10 15:18:50.687502 -48.6256 -27.5717 10 0.440129 -0.897934 fake buoy
96 2022-05-11 15:18:50.687502 -48.6256 -27.5717 10 0.348287 -0.937388 fake buoy
97 2022-05-12 15:18:50.687502 -48.6256 -27.5717 10 0.252964 -0.967476 fake buoy
98 2022-05-13 15:18:50.687502 -48.6256 -27.5717 10 0.155114 -0.987897 fake buoy
99 2022-05-14 15:18:50.687502 -48.6256 -27.5717 10 0.055714 -0.998447 fake buoy

Let’s take a look at our fake data.

%matplotlib inline


import matplotlib.pyplot as plt
from oceans.plotting import stick_plot

q = stick_plot([t.to_pydatetime() for t in df["time"]], df["u"], df["v"])

ref = 1
qk = plt.quiverkey(
    q, 0.1, 0.85, ref, f"{ref} m s$^{-1}$", labelpos="N", coordinates="axes"
)

plt.xticks(rotation=70)
(array([19024., 19038., 19052., 19066., 19083., 19097., 19113., 19127.]),
 [Text(0, 0, ''),
  Text(0, 0, ''),
  Text(0, 0, ''),
  Text(0, 0, ''),
  Text(0, 0, ''),
  Text(0, 0, ''),
  Text(0, 0, ''),
  Text(0, 0, '')])
../../../_images/3693c31daccfd03e68828196d0d9f036fc5849ce723ea9033a78dffe01f20017.png

pocean.dsg is relatively simple to use. The user must provide a DataFrame, like the one above, and a dictionary of attributes that maps to the data and adhere to the DSG conventions desired.

Because we want the file to work seamlessly with ERDDAP we also added some ERDDAP specific attributes like cdm_timeseries_variables, and subsetVariables.

attributes = {
    "global": {
        "title": "Fake mooring",
        "summary": "Vector current meter ADCP @ 10 m",
        "institution": "Restaurant at the end of the universe",
        "cdm_timeseries_variables": "station",
        "subsetVariables": "depth",
        # These are only the required attributions from
        # https://ioos.github.io/ioos-metadata/ioos-metadata-profile-v1-2.html#attribution
        "creator_country": "USA",
        "creator_email": "fake_email@somedomain.org",
        "creator_institution": "IOOS",
        "creator_sector": "academic",
        "creator_url": "https://ioos.github.io/ioos_code_lab/content/intro.html",
        "publisher_country": "USA",
        "publisher_email": "fake_email@somedomain.org",
        "publisher_institution": "IOOS",
        "publisher_url": "https://ioos.github.io/ioos_code_lab/content/intro.html",
    },
    "longitude": {
        "units": "degrees_east",
        "standard_name": "longitude",
    },
    "latitude": {
        "units": "degrees_north",
        "standard_name": "latitude",
    },
    "z": {
        "units": "m",
        "standard_name": "depth",
        "positive": "down",
    },
    "u": {
        "units": "m/s",
        "standard_name": "eastward_sea_water_velocity",
    },
    "v": {
        "units": "m/s",
        "standard_name": "northward_sea_water_velocity",
    },
    "station": {"cf_role": "timeseries_id"},
}

We also need to map the our data axes to pocean’s defaults. This step is not needed if the data axes are already named like the default ones.

axes = {"t": "time", "x": "longitude", "y": "latitude", "z": "depth"}
from pocean.dsg.timeseries.om import OrthogonalMultidimensionalTimeseries
from pocean.utils import downcast_dataframe

df = downcast_dataframe(df)  # safely cast depth np.int64 to np.int32
dsg = OrthogonalMultidimensionalTimeseries.from_dataframe(
    df,
    output="fake_buoy.nc",
    attributes=attributes,
    axes=axes,
)

The OrthogonalMultidimensionalTimeseries saves the DataFrame into a CF-1.6 TimeSeries DSG.

!ncdump -h fake_buoy.nc
netcdf fake_buoy {
dimensions:
	station = 1 ;
	time = 100 ;
variables:
	int crs ;
	double time(time) ;
		time:units = "seconds since 1990-01-01 00:00:00Z" ;
		time:standard_name = "time" ;
		time:axis = "T" ;
	string station(station) ;
		station:cf_role = "timeseries_id" ;
		station:long_name = "station identifier" ;
	double latitude(station) ;
		latitude:axis = "Y" ;
		latitude:units = "degrees_north" ;
		latitude:standard_name = "latitude" ;
	double longitude(station) ;
		longitude:axis = "X" ;
		longitude:units = "degrees_east" ;
		longitude:standard_name = "longitude" ;
	int depth(station) ;
		depth:_FillValue = -9999 ;
		depth:axis = "Z" ;
	double u(station, time) ;
		u:_FillValue = -9999.9 ;
		u:units = "m/s" ;
		u:standard_name = "eastward_sea_water_velocity" ;
		u:coordinates = "time depth longitude latitude" ;
	double v(station, time) ;
		v:_FillValue = -9999.9 ;
		v:units = "m/s" ;
		v:standard_name = "northward_sea_water_velocity" ;
		v:coordinates = "time depth longitude latitude" ;

// global attributes:
		:Conventions = "CF-1.6" ;
		:date_created = "2022-02-11T18:18:00Z" ;
		:featureType = "timeseries" ;
		:cdm_data_type = "Timeseries" ;
		:title = "Fake mooring" ;
		:summary = "Vector current meter ADCP @ 10 m" ;
		:institution = "Restaurant at the end of the universe" ;
		:cdm_timeseries_variables = "station" ;
		:subsetVariables = "depth" ;
		:creator_country = "USA" ;
		:creator_email = "fake_email@somedomain.org" ;
		:creator_institution = "IOOS" ;
		:creator_sector = "academic" ;
		:creator_url = "https://ioos.github.io/ioos_code_lab/content/intro.html" ;
		:publisher_country = "USA" ;
		:publisher_email = "fake_email@somedomain.org" ;
		:publisher_institution = "IOOS" ;
		:publisher_url = "https://ioos.github.io/ioos_code_lab/content/intro.html" ;
}

It also outputs the dsg object for inspection. Let us check a few things to see if our objects was created as expected. (Note that some of the metadata was “free” due t the built-in defaults in pocean.

dsg.getncattr("featureType")
'timeseries'
type(dsg)
pocean.dsg.timeseries.om.OrthogonalMultidimensionalTimeseries

In addition to standard netCDF4-python object .variables method pocean’s DSGs provides an “categorized” version of the variables in the data_vars, ancillary_vars, and the DSG axes methods.

[(v.standard_name) for v in dsg.data_vars()]
['eastward_sea_water_velocity', 'northward_sea_water_velocity']
dsg.axes("T")
[<class 'netCDF4._netCDF4.Variable'>
 float64 time(time)
     units: seconds since 1990-01-01 00:00:00Z
     standard_name: time
     axis: T
 unlimited dimensions: 
 current shape = (100,)
 filling on, default _FillValue of 9.969209968386869e+36 used]
dsg.axes("Z")
[<class 'netCDF4._netCDF4.Variable'>
 int32 depth(station)
     _FillValue: -9999
     axis: Z
 unlimited dimensions: 
 current shape = (1,)
 filling on]
dsg.vatts("station")
{'cf_role': 'timeseries_id', 'long_name': 'station identifier'}
dsg["station"][:]
array(['fake buoy'], dtype=object)
dsg.vatts("u")
{'_FillValue': -9999.9,
 'units': 'm/s',
 'standard_name': 'eastward_sea_water_velocity',
 'coordinates': 'time depth longitude latitude'}

We can easily round-trip back to the pandas DataFrame object.

dsg.to_dataframe().head()
t x y z station u v
0 2022-02-04 15:18:50.687502 -48.6256 -27.5717 10 fake buoy -0.506366 0.862319
1 2022-02-05 15:18:50.687502 -48.6256 -27.5717 10 fake buoy -0.417748 0.908563
2 2022-02-06 15:18:50.687502 -48.6256 -27.5717 10 fake buoy -0.324956 0.945729
3 2022-02-07 15:18:50.687502 -48.6256 -27.5717 10 fake buoy -0.228917 0.973446
4 2022-02-08 15:18:50.687502 -48.6256 -27.5717 10 fake buoy -0.130591 0.991436

For more information on pocean please check the docs.