Ensemble Reduction ©

12.6 Advanced: Ensemble Reduction Techniques
Previous		HOME			Next

Background When building ensembles we always have the possibility of redundancy, which means that some of the ensemble members may not be much different from each other. Any ensemble might contain redundant information that overemphasizes certain transport and dispersion features that might be inaccurate. For example, perhaps a sub-group of members all use the same meteorological data, which might not be as accurate as another meteorological data set that is used by fewer members. Therefore, including these members would be more biased than one constructed from an ensemble based upon only independent members. The current version of the ensemble reduction program requires that the model and measured data sets be identical in terms of the 1:1 correspondence of each model and observed data record.

In this section we demonstrate how to apply a reduction technique with the intent to produce more accurate results than those obtained with the full ensemble. In this evaluation all possible member combinations (in pairs, trios, etc... ) are compared with the measured data values to find the combination that produces the minimum mean square error for that ensemble mean.

We are continuing from the last section where we analyzed five ensemble members compared with measurements: hysplit2.001 (wrf27uw), hysplit2.002 (era40), hysplit2.003 (narr), hysplit2.004 (wrf27), and hysplit2.005 (wrf09). Re-create these results if they are not already available in the working directory.

As with the previous example, it is necessary to insure that in the Setup Run / Grid Menu the concentration output file is just named hysplit2. The base name will then be passed through the GUI to the ensemble scripts, where the programs automatically search for the 3-digit suffix.

If you just completed the previous Ensemble Verification section, then you should already have created a series of five DATEM formatted files with the base name hysplit_datem followed by a 3-digit suffix corresponding to each ensemble member. Otherwise go back and complete this section. To continue in this section, delete member hysplit_datem.006 from the ../working directory. Recall that this member represented the ensemble mean of all five members. We will be creating a new ensemble mean of the independent members and do not want to include this ensemble mean as part of the reduction calculation.

Now open the Display / Ensemble / Reduction menu. Select the base name of the DATEM files corresponding to the ensemble members and then select the DATEM file corresponding to the measured values. Press the Apply button and the file reduc.txt will be created. This file contains the minimum mean square error for all the possible model combinations. Note that group #2 has the minimum error and consists of members: 1 and 5.

The next step is to create a new ensemble mean consisting of members 1 and 5. Open the Concentration / Utilities / Binary File Merge menu. Enter hysplit2 as the input name which will be treated as a wildcard to find all the input file names. The output file should be named as the next file in the sequence: hysplit2.007. Because we will be adding only two input files (001 and 005) to create file 007, the multiplier should be changed from 1.0 to 0.50 to convert the sum to an average.

Now press the yellow Create Filenames button and the file INFILE will be created. The base name field will have been erased and the name INFILE will appear in the input name field. The contents of this file should be opened in notepad and the unwanted files (002, 003, and 004) should deleted. Then press the green Process Files button and the desired ensemble mean output file hysplit2.007 will be created in the working directory.

To recompute the statistics of all the ensemble members, including the new ensemble mean, open the Concentration / Display / Ensemble / Statistics menu to open a simple interface requiring only the name of the measured data file and the concentration units conversion factor. After these values are set press the yellow Execute to run the script and display a listing of the output file summarizing the performance statistics of each member. Compared with the previous ensemble mean (006), this reduced ensemble mean (007) shows about the same or better performance in all of the six metrics (note that NMSE was used as the reduction metric). Do not delete the sumstat.txt file, it will be used in exercise #12.

In this case the reduction identified members #1 and #5, although both WRF data, they had different spatial resolution and boundary layer physics. When building ensembles we always have the possibility of redundancy, which means that some of the ensemble members may not be much different from each other. Any ensemble might contain redundant information that overemphasizes certain transport and dispersion features that might be inaccurate. For example, perhaps a sub-group of members all use the same meteorological data, which might not be as accurate as another meteorological data set that is used by fewer members. Therefore, including these members would be more biased than one constructed from an ensemble based upon only independent members. The current version of the ensemble reduction program requires that the model and measured data sets be identical in terms of the 1:1 correspondence of each model and observed data record.

Further reading Stein et al., 2015, Potential Use of Transport and Dispersion Model Ensembles for Forecasting Applications.

1 s

12.6 Advanced: Ensemble Reduction Techniques