I'm working on a data-plugin for the upcoming Sentinel 5 precursor mission (S5P, L2 data only, launch somewhere this year). We'd like to have the CIS toolchain available during the commissioning phase so we can compare with a variety of other instruments and models to understand what we do wrong ;-)
I'm one of the developers of the S5P L2 file format, so from that perspective I think I have the knowledge to write the data plugin. I also have several years of Python experience, so I guess I'll be fine there. I just need to get my head around the structure of CIS, and see how some of the idiosyncrasies of the instrument on Sentinel 5 precursor can be handled (some questions follow below).
A small introduction of S5P.
The Sentinel 5 precursor mission carries a single instrument: TROPOMI. Sentinel 5 precursor is an atmospheric composition mission, observing backscattered solar radiance and solar irradiance, and retrieving trace gas columns, ozone profiles, aerosol properties and cloud information from the these spectra. More information on the instrument, sample files and documentation can be found on http://www.tropomi.eu
For the discussion here it is important to know that the file format of S5P (both L1B and L2) is netCDF4, with all the metadata we could think of, as fully CF compliant as we can make it, but using hierarchy (groups) to organise the data. Attributes are used to link to geolocation (and other ancillary data fields), but because the latitude and longitude fields may reside in another group, I fear that some special glue is still needed for these files.
Time storage is always tricky. We use a two step approach: all variables have an initial time dimension (of length 1, so this is a dummy dimension). The value stored in the accompanying time array is the reference time for this orbit, which is UTC midnight before the start of the orbit. The value is stored as seconds since 2010-01-01 (as indicated in the units of the variable). The same reference time is also stored in global attributes in several different units (ISO date/time string, Julian day to keep IDL users happy, days since 1950-01-01 as used in many models, seconds since 1970-01-01 to help out C and Python programmers). The actual time of observation is stored in an additional delta_time variable, which gives the offset with respect to the reference time in milliseconds. This two-step approach solves a few issues (and no doubt creates others): It absorbs leap seconds so level 2 users don't have to bother with those, and it decouples the flight direction from the time dimension. The flight direction is of course closely coupled to the latitude or "Y" dimension (we'll be placed in a polar orbit). The latter allows us to more closely observe the CF order of dimensions: T, (Z), Y, X.
Observations on CIS:
In the CIS documentation I noticed that some code examples use lines that are too long to fit in the available space. This is somewhat annoying, and does not help in understanding the code. Also having nested vertical scroll bars is not nice, but this may well be a limitation of Sphinx itself. The offline version works better so I'll use that instead. In my local copy I figured out where to increase the maximum line-width, so that is solved as well.
In the MODIS example I noticed a small mistake:
"""regex_list = [r'.*' + product + '.*\.hdf' for product in product_names]""" should be """regex_list = [r'.*' + product + r'.*\.hdf' for product in product_names]""" (the raw-modifier applies only to individual strings, not to all parts of an expression, and the character that needs escaping is in the last part).
In the data plugin reference there is also a mistake: """return [r'.*CODE*.nc']""" should be """return [r'.*CODE.*\.nc']""".
We retrieve ozone profiles from the observations. From the MODIS example I can't quite get how to organise this: There will be an extra vertical dimension in the profiles, how should that be handled? Initially I will simply skip these products (or at least these fields), as that will make things easier. Another option is to have an additional analysis plugin to create partial integrals over a profile, resulting in a single value per ground pixel. But as I said: this is not a prime priority (but if this requires me to prepare the reading plugin differently I'd like to hear that).
What I also do not quite understand is how granules are dealt with. MODIS has 5 minute granules of constant size. The nominal granule size for S5P comes in two flavours. In near real-time operation the granule size for level 2 is 5 minutes of observations (each observation takes 1080 ms, each observation contains 77 to 450 spectra, depending on the band). For offline processing the granule size is a whole orbit (sunlit side only). This means that the number of observations is not fixed (once per day a solar observation is added, shortening the period for radiance observations). My guess is that the number of spectra per observation (the width of the swath) should be constant, but that for the other dimension the only requirement is that they match the latitude & longitude of that granule, not of other granules.
The next issue that I need to resolve is an instrument feature: the 4 detectors - and therefore the different products derived from different parts of the spectrum - will have a different geospatial sampling. When loading a single product this isn't an issue, but when loading products derived from two different detectors (say tropospheric NO₂ and cloud properties) this will have to be handled. Of course when deriving the tropospheric NO₂ column we already take this mismatch into account when we use the cloud product, but we'll want to perform comparisons involving multiple products. We have extra knowledge on how to combine different bands, as the mapping is fixed, at least between four of the 8 bands. We do need an extra lookup table for this. How is the interaction with the cis command line tool to get access to an extra lookup table? Independent mapping to a fixed L3 grid is also possible, but especially cloud products from different orbits should not be aggregated onto a L3 grid due to (natural) variability.
A final question, related the the previous: S5P will fly in an afternoon orbit, relatively close to the A-train, but actually in lose formation with NPP. For offline processing the cloud product from the VIIRS instrument will be mapped onto the S5P observational grid for cloud screening. It would be nice to be able to map L2 from OMI/Aura and MODIS/Aqua to S5p L2 for comparison without going through L3. Can CIS be of use here? Of course an additional plugin for OMI must be made, but the limitations are very similar to those in TROPOMI.
Sorry for the long post, and thanks for reading up to here.
Maarten Sneep (KNMI)