How do I stop cis col from extrapolating?

3 posts / 0 new
Last post
Masaru Yoshioka
How do I stop cis col from extrapolating?

As I mentioned, I did

cis col aod550_total:aod550_total_teafwa_pb20080101.nc /group_workspaces/jasmin2/crescendo/Data/AERONET/AOT/LEV20/ALL_POINTS/920801_171209_Izana.lev20:collocator=lin[extrapolate=False] -o aod550_total_teafwa_pb20080101_Izana.nc

and this seems to have given me spatially and temporarily collocated data on the AERONET point measurements. This is really a huge progress considering the struggling in the past week or so. However, I want to take one more step forward. The result contains time dimension of 171935 !!! This must be because cis col extrapolated the simulation data over the entire period of measurements (1997-2016).

aod550_total_teafwa_pb20080101.nc contains 3 hourly data of aod550 for a day. They are at 0, 3, 6, 9, 12, 15, 18, 21 hours GMT. 920801_171209_Izana.lev20 also has only 8 measurements on 01/01/2008. These 8 measurements were made between 9:59 and 12:20. So collocation of model data on all of these measurements can be done only by interpolation and I expect only 8 data points in the output. You know what I mean?

The documentation says cis col does not extrapolate data as default but it seemed to have done that. So I added ":collocator=lin[extrapolate=False]" but this did not change the result at all.

Is there a way to stop cis col from extrapolating data in time? If the input data is for 20080101, I want to get the same number of collocated data points in the result as the measurements in the reference data on that day. If there is no measurement on that day in the reference data, I want to get no data in the result. Is it possible to do this?

Thanks,
Masaru

Masaru Yoshioka
workaround

Although I still don't know how to stop extrapolating, I now have an idea to work around this issue. I can prepare monthly reference data like this;

cis subset -v AOT_500:ALL_POINTS/920801_171209_Izana.lev20 time=[2008-01-01T00:00:00,2008-01-31T23:59:59] -o Monthly/AOT_500_Izana_2008-01.nc

Then I can collocate 3 hourly data for a month like this;

cis col aod550_total:aod550_total_teafwa_pb200801??.nc /group_workspaces/jasmin2/crescendo/Data/AERONET/AOT/LEV20/Monthly/AOT_500_Izana_2008-01.nc -o aod550_total_teafwa_pb200801_Izana.nc

This now works because time dimension is 'time' and not 't' in these input files.

And now aod550_total_teafwa_pb200801_Izana.nc has exactly the same dimension structure as AOT_500_Izana_2008-01.nc. I can take monthly averages of both and compare to each other.... I will do that for 440 nm. I was previously requesting only AOD 550 in 3 hourly outputs and am now running UM job now to get AOD 440.

The only problem is the computation time. This cis subset took as long as 9 minutes. I have more than 1000 locations and 12 months for our simulation results only. I will have to be comparing outputs from 5 models (or maybe a bit more)... This will take 9*12*5*1000 = 540,000 min = 9000 hours!!! Gosh, this is prohibitively long. If I limit to 2 months like January and July, that will still take 1500 hours = 62 days. If I split this into 10 jobs and run them in parallel, it will be done in a week or so.

Please could you suggest it if you have a better idea? Thanks.

Masaru

Masaru Yoshioka
update

I tried cis col again and this time it only take about 15 second! It was therefore a temporary issue.

So even though I still don't know how to stop cis col from extrapolating the data in time, I'm happy enough with this workaround.

Thanks,
Masaru

Website designed & built by OCC