problems in aggregated AERONET monthly data

10 posts / 0 new
Last post
Masaru Yoshioka
problems in aggregated AERONET monthly data

Hi.

I thought monthly averaging of AERONET point data using cis aggregate was working OK and so I wrote a short script and calculated July monthly averages for all of over 1000 stations over entire periods. This took about 8 hours last evening which I think is pretty good.

However, I found that the results are not always good. So I have been checking and testing lots of things, but I haven't been able to figure out what was wrong.

Like I wrote in yesterday's post, I did this;

cis aggregate -v AOT_500:920801_171209_Izana.lev20 t=[1997-06-01T00:00:00,1997-08,P30D] -o AOT_500_Izana_monthly_1997-0608.nc

this gave me a result like these: 0.0112922097186701, 0.0396474479087452, _ . These are similar to the values in monthly average data (0.011310, 0.042982, N/A). I repeated this today and got the same results.

But now I'm calculating monthly averages and storing the result for each month in a separate file. In AOT_500_Izana_monthly_1997-06.nc created last night, AOT_500 has no value. This is how it looks;

AOT_500 =
{_} ;

I repeated this today just like this;

cis aggregate -v AOT_500:920801_171209_Izana.lev20 t=[1997-06-01T00:00:00,1997-06,P30D] -o AOT_500_Izana_monthly_1997-06.nc

The result is the same and the netCDF file does not have a value.

This is strange because there are lots of valid measurements after 17th June 1997. I noticed the same thing at Abisko for July 2007. Measurements start on 23rd July 2007 at this station, and a few tens of measurements are available in this month. This is much fewer than typical number of measurements in a month but still a sample large enough to do statistical analyses. However cis aggregate gives no value. The downloaded monthly data show a value of 0.075950. I repeated cis aggregate just like above for Izana but the result was the same.

Here is another problem. In some cases downloaded and aggregated monthly averages have similar values. The values for July 1997 at Izana are 0.042982 and 0.0438219161585366 and these are very close to each other. But those for August 2007 at Abisko are 0.067408 and 0.0726925972222222 and these are slightly more different. At Abracos_Hill, aggregated values for July of 1999-2005 are

0.0948166771929825, 0.137734431818182, no value, no value, 0.257221920716113, 0.285831646551724, 0.15355733126935

some of which are very different from the values in the downloaded data;

0.174853, 0.138214, N/A, 0.241016, 0.234410, 0.302466, 0.248706 .

Now I calculated monthly average for July 1999 from the point data in Excel and got 0.168352892, which is close to the value in the downloaded monthly data. There are about 680 measurements covering the entire length of the month. I repeated the calculation one more time manually;

cis aggregate -q --force-overwrite AOT_500:920801_171209_Abracos_Hill.lev20 t=[1999-07-01T00:00:00,1999-07,P31D] -o AOT_500_Abracos_Hill_monthly_1999-07.nc

and I got 0.0978945903225806 . For some reason this is not exactly the same as the result from yesterday but still similar to it.

Can you see any problem or error in what I have done? Do you have any idea what went wrong? Can you think of anything else I could try and check?

Thanks,
Masaru

duncanwp
That's strange, this works

That's strange, this works for me:

$ cis aggregate -v AOT_500:970101_971231_Izana/970101_971231_Izana.lev20 t=[1997-06-01T00:00:00,1997-06,P30D] -o AOT_500_Izana_monthly_1997 -06.nc

$ ncdump AOT_500_Izana_monthly_1997 -06.nc

netcdf AOT_500_Izana_monthly_1997-06 {
dimensions:
longitude = 1 ;
latitude = 1 ;
altitude = 1 ;
time = UNLIMITED ; // (1 currently)
bnds = 2 ;
variables:
double AOT_500(longitude, latitude, altitude, time) ;
AOT_500:_FillValue = NaN ;
AOT_500:long_name = "AOT_500" ;
AOT_500:units = "1" ;
double longitude(longitude) ;
longitude:axis = "X" ;
longitude:bounds = "longitude_bnds" ;
longitude:units = "degrees_east" ;
longitude:standard_name = "longitude" ;
double longitude_bnds(longitude, bnds) ;
double latitude(latitude) ;
latitude:axis = "Y" ;
latitude:bounds = "latitude_bnds" ;
latitude:units = "degrees_north" ;
latitude:standard_name = "latitude" ;
double latitude_bnds(latitude, bnds) ;
double altitude(altitude) ;
altitude:bounds = "altitude_bnds" ;
altitude:units = "meters" ;
altitude:standard_name = "altitude" ;
double altitude_bnds(altitude, bnds) ;
double time(time) ;
time:axis = "T" ;
time:bounds = "time_bnds" ;
time:units = "days since 1600-01-01 00:00:00" ;
time:standard_name = "time" ;
time:calendar = "gregorian" ;
double time_bnds(time, bnds) ;
double AOT_500_std_dev(longitude, latitude, altitude, time) ;
AOT_500_std_dev:_FillValue = NaN ;
AOT_500_std_dev:long_name = "Corrected sample standard deviation of AOT_500" ;
AOT_500_std_dev:units = "1" ;
double AOT_500_num_points(longitude, latitude, altitude, time) ;
AOT_500_num_points:_FillValue = NaN ;
AOT_500_num_points:long_name = "Number of points used to calculate the mean of AOT_500" ;

// global attributes:
:history = "2017-12-18T09:14:07Z Aggregated using CIS version 1.5.4\n variables: [\'AOT_500\']\n from files: [\'970101_971231_Izana/970101_971231_Izana.lev20\']\n using new grid: {\'time\': slice(145153.0, 145170.0, 30.0)}\n with kernel: ." ;
:Conventions = "CF-1.5" ;
data:

AOT_500 =
{0.0111407727272727} ;

longitude = -16.499 ;

longitude_bnds =
-16.499, -16.499 ;

latitude = 28.309 ;

latitude_bnds =
28.309, 28.309 ;

altitude = 2391 ;

altitude_bnds =
2391, 2391 ;

time = 145168 ;

time_bnds =
145153, 145170 ;

AOT_500_std_dev =
{0.00180107062299243} ;

AOT_500_num_points =
{44} ;
}

Which version of CIS are you using?

It's also worth pointing out that averaging the Aeronet data like this is likely to introduce some significant errors compare to model monthly means. For example the Aeronet instrument can only measure during clear daytime conditions, wheras the model will give you an all-sky all-day mean. This is discussed in one of my colleagues papers here: https://www.atmos-chem-phys.net/16/1065/2016/

By first collocating the model values onto the Aeronet observations and then averaging both over a month (or longer) you should minimise these sampling errors.

Masaru Yoshioka
Hi. Thanks. So it works for

Hi. Thanks. So it works for you. It is cis installed on JASMIN and

$ cis version
Using CIS version: 1.5.4 (Stable)

OK, this is it. So it is not very old if not the latest one?

duncanwp
Yes, that version should be

Yes, that version should be fine.

What does CIS info tell you about the aeronet file you're using?

Mine looks like:

[dwatsonparris@jasmin-sci1 ~]$ cis info AOT_500:970101_971231_Izana.lev20
Ungridded data: AOT_500 / (1)
Shape = (2100,)

Total number of points = 2100
Number of non-masked points = 2097
Long name = AOT_500
Standard name = None
Units = 1
Missing value = -999.0
Range = (0.0037169999999999998, 0.251361)
History =
Coordinates:
longitude
Long name =
Standard name = longitude
Units = degrees_east
Missing value = None
Range = (-16.498999999999999, -16.498999999999999)
History =
latitude
Long name =
Standard name = latitude
Units = degrees_north
Missing value = None
Range = (28.309000000000001, 28.309000000000001)
History =
altitude
Long name =
Standard name = altitude
Units = meters
Missing value = None
Range = (2391.0, 2391.0)
History =
time
Long name =
Standard name = time
Units = days since 1600-01-01 00:00:00
Missing value = None
Range = (1997-06-17 07:58:37, 1997-07-26 08:01:09)
History =

Masaru Yoshioka
Mine looks like this and the

Mine looks like this and the same as yours except for the length of the data. Masaru

$ cis info AOT_500:920801_171209_Izana.lev20
Ungridded data: AOT_500 / (1)
Shape = (171935,)

Total number of points = 171935
Number of non-masked points = 163920
Long name = AOT_500
Standard name = None
Units = 1
Missing value = -999.0
Range = (-0.0058910000000000004, 1.341121)
History =
Coordinates:
longitude
Long name =
Standard name = longitude
Units = degrees_east
Missing value = None
Range = (-16.498999999999999, -16.498999999999999)
History =
latitude
Long name =
Standard name = latitude
Units = degrees_north
Missing value = None
Range = (28.309000000000001, 28.309000000000001)
History =
altitude
Long name =
Standard name = altitude
Units = meters
Missing value = None
Range = (2391.0, 2391.0)
History =
time
Long name =
Standard name = time
Units = days since 1600-01-01 00:00:00
Missing value = None
Range = (1997-06-17 07:58:37, 2016-11-10 17:09:03)
History =

Masaru Yoshioka
update

Now I created monthly all point data as;

cis subset AOT_500:ALL_POINTS/920801_171209_Izana.lev20 time=[1997-06-01T00:00:00,1997-06] -o Monthly/AOT_500_Izana_1997-06.nc

and then aggregated this to make monthly average data;

cis aggregate AOT_500:Monthly/AOT_500_Izana_1997-06.nc time=[1997-06-01T00:00:00,1997-06,P30D] -o monave/AOT_500_Izana_1997-06_monave_test.nc

This gave me a result of 0.0137529851851852 . This compares to values obtained in other methods like these;

My previous cis result: _ (no value)
Duncan's result with CIS: 0.0111407727272727
Value in AERONET monthly data: 0.011310
Value calculated with excel: 0.01129221

So I have to say this result is still not quite right.

Average could not be calculated with nco. ncra gives me an error because there is no record dimension and ncea doesn't do anything... well, actually I can make record dimension and use ncra? So I tried this;

ncecat Monthly/AOT_500_Izana_1997-06.nc Monthly/AOT_500_Izana_1997-06_tmp.nc # Add record dimension named "record"
ncpdq -O -a obs,record Monthly/AOT_500_Izana_1997-06_tmp.nc Monthly/AOT_500_Izana_1997-06_tmp.nc # Switch "record" and "obs" so "obs" becomes record dimension
ncwa -O -a record Monthly/AOT_500_Izana_1997-06_tmp.nc Monthly/AOT_500_Izana_1997-06_tmp.nc # Remove "record"
ncra -O Monthly/AOT_500_Izana_1997-06_tmp.nc monave/AOT_500_Izana_1997-06_monave_nco.nc # Average along record dimension "obs"

This gave me: 0.0112922097186701 !!!! This is exactly the same result as Excel. Maybe I should do this even though it is a bit of a detour. I copied the full result below.

So please could you let me know if you come up with a solution in cis aggregate or any other advice related to this issue? Until then I've got at least one way that works for me.

Thanks,
Masaru

ncdump monave/AOT_500_Izana_1997-06_monave_nco.nc |less

netcdf AOT_500_Izana_1997-06_monave_nco {
dimensions:
obs = UNLIMITED ; // (1 currently)
variables:
double longitude(obs) ;
longitude:standard_name = "longitude" ;
longitude:units = "degrees_east" ;
longitude:cell_methods = "record: mean obs: mean" ;
double latitude(obs) ;
latitude:standard_name = "latitude" ;
latitude:units = "degrees_north" ;
latitude:cell_methods = "record: mean obs: mean" ;
double altitude(obs) ;
altitude:standard_name = "altitude" ;
altitude:units = "meters" ;
altitude:cell_methods = "record: mean obs: mean" ;
double time(obs) ;
time:standard_name = "time" ;
time:units = "days since 1600-01-01 00:00:00" ;
time:calendar = "gregorian" ;
time:cell_methods = "record: mean obs: mean" ;
double AOT_500(obs) ;
AOT_500:_FillValue = -999. ;
AOT_500:units = "1" ;
AOT_500:long_name = "AOT_500" ;
AOT_500:missing_value = -999. ;
AOT_500:history = "2017-12-22T11:12:06Z Subsetted using limits: time: [145153.0, 145182.999988]" ;
AOT_500:cell_methods = "record: mean obs: mean" ;

// global attributes:
:source = "CIS1.5.4" ;
:history = "Fri Dec 22 12:41:00 2017: ncra Monthly/AOT_500_Izana_1997-06_tmp.nc monave/AOT_500_Izana_1997-06_monave_nco.nc\nFri Dec 22 12:40:02 2017: ncwa -O -a record Monthly/AOT_500_Izana_1997-06_tmp.nc Monthly/AOT_500_Izana_1997-06_tmp.nc\nFri Dec 22 12:39:19 2017: ncpdq -O -a obs,record Monthly/AOT_500_Izana_1997-06_tmp.nc Monthly/AOT_500_Izana_1997-06_tmp.nc\nFri Dec 22 12:38:08 2017: ncecat Monthly/AOT_500_Izana_1997-06.nc Monthly/AOT_500_Izana_1997-06_tmp.nc" ;
:NCO = "\"4.5.5\"" ;
:nco_openmp_thread_number = 1 ;
data:

longitude = -16.4989999999998 ;

latitude = 28.3090000000001 ;

altitude = 2391 ;

time = 145176.224122494 ;

AOT_500 = 0.0112922097186701 ;
}

Masaru Yoshioka
correction to #6

The command at the top was wrong. This is what I actually did;

cis subset AOT_500:ALL_POINTS/920801_171209_Izana.lev20 time=[1997-06-01T00:00:00,1997-06-30T23:59:59] -o Monthly/AOT_500_Izana_1997-06.nc

Masaru Yoshioka
Oh, but ncra does not

Oh, but ncra does not calculate standard deviation! It's actually quite important for my current work! There doesn't seem to be a way to calculate standard deviation along a dimension in nco.

So the best solution would be to somehow make cis aggregate work correctly. If it is not possible, I will need to write my own program to calculate monthly statistics and save them into a netCDF file. Of course it is possible but then I will need to go through an additional step...

Do you have any idea how I can make cis aggregate work well for me? Please can you give me any advice on this?

Masaru

james_oneill
Workaround

Hi Maseru,

You're right - There seems to be an issue with CIS (or possibly one of its dependencies) when it comes to aggregating over time. More specifically, it doesn't always do what it says in the documentation.

What it says: "When a date/time is used as a range start, the earliest date/time compatible with the supplied components is used (e.g., 2010-04 is treated as 2010-04-01T00:00:00) and when used as a range end, the latest compatible date/time is used."

What it does: A bit unpredictable, but you can ascertain what time bounds it has used by looking at the "time_bnds" variable (use 'ncdump' on the output .nc file to see this info). See examples further below.

There is a fairly easy workaround, which is to always be explicit with your time strings. For example, to aggregate only the June 1997 data, use the command: cis aggregate AOT_500:920801_171209_Izana.lev20 t=[1997-06-01T00:00:00,1997-06-30T23:59:59,P30D]. This gives a result of 0.01129..., which is exactly the same as the value you calculated in Excel.

It might be that one of the CIS developers picks up on this issue and fixes it in a later release (or at least updates the documentation), but in the meantime, this is a very easy workaround.

Thanks,
James

===

Example 1 - Your first command in the forum post:
Command: cis aggregate AOT_500:920801_171209_Izana.lev20 t=[1997-06-01T00:00:00,1997-08,P30D]
time_bnds: 145153, 145183, 145213, 145243
Result: Seems fine. The first bound (145153) is equivalent to 1/6/97 (in CIS's units of "days since 1/1/1600") and there are then 30 days between each pair of bounds.

Example 2 - Your second command in the forum post:
Command: cis aggregate AOT_500:920801_171209_Izana.lev20 t=[1997-06-01T00:00:00,1997-06,P30D]
time_bnds: 145153, 145164
Result: The first bound is still correct, but the second bound is only 11 days after this. Since June 1997 only has valid data from the 17th of the month onwards, no valid data are captured in this range and so CIS reports no value as a result.

Example 3 - Duncan's first command in the forum post (I can't run it myself as I don't have his file, but I can see his ncdump output in the post):
Command: cis aggregate AOT_500:970101_971231_Izana.lev20 t=[1997-06-01T00:00:00,1997-06,P30D]
time_bnds: 145153, 145170
Result: Again, the first bound is fine, but the second bound is only 17 days after this. As a result, Duncan's output value (0.01114...) is actually just the mean of all the values from 17/6/97. This can be confirmed by running the command "cis aggregate AOT_500:920801_171209_Izana.lev20 t=[1997-06-17T00:00:00,1997-06-17T23:59:59,P1D]", which again gives 0.01114...

Masaru Yoshioka
Re: Workaround

Thank you James for looking into this problem. I didn't know that we can specify the period to aggregate in such a precise way. But I noticed that it can actually be found in an example in the documentation (1st example on p. 34 of documentation "Release 1.6.0 (Stable)" Dec 04, 2017). From now on, I will use this way to specify period.

Masaru

Add comments

Log in Register
Website designed & built by OCC