Trajectory / Special Runs / Clustering

The Trajectory Cluster Analysis window has the series of tasks necessary for running a trajectory cluster analysis.  This differs from most, if not all, other HYSPLIT GUI windows that only run one program or do one task.  Given a set of trajectories beginning at one location, the cluster analysis will objectively result in sub-sets of trajectories, called clusters, that are each different from the other sub-sets.  The program will usually produce at least one possible outcome set of clusters. If more than one outcome is given, the user must then subjectively choose one for the final result. The trajectories to be clustered can be created in a variety of different approaches. Trajectory output file names should begin with a common base name as defined in the setup menu tdump and are then followed by some arbitrary identification text (e.g. date), for example, as created by the Run Daily menu option.

Cluster member trajectories are assigned based on latitude and longitude as described below, not height. Diagnostic variables (precipitation, etc) in the trajectory endpoints files are ignored.

Description of clustering process:

Initially, total spatial variance is zero. Each trajectory is defined to be a cluster, in other words, there are N trajectories and N clusters. For the first iteration, which two clusters (trajectories) are paired? For every combination of trajectory pairs, the cluster spatial variance (SPVAR) is calculated. SPVAR is the sum of the squared distances between the endpoints of the cluster's component trajectories and the mean of the trajectories in that cluster. Then the total spatial variance (TSV), the sum of all the cluster spatial variances, is calculated. The pair of clusters combined are the ones with the lowest increase in total spatial variance. After the first iteration, the number of clusters is N-1. Clusters paired always stay together.

D = distance between a trajectory endpoint and the corresponding cluster-mean endpoint

SPVAR = SUM(all trajectories in cluster) [SUM(all trajectory endpoints) {D*D} ]


For the second iteration, which two clusters are paired? The clusters are either individual trajectories or the cluster of two trajectories that were initially paired. Again every combination is tried, and the SPVAR, and TSV for each is calculated. The two clusters combined are the ones that result in the lowest increase in TSV. The percent change in TSV and number of clusters (N-2) are written to a file.

The iterations continue until the last two clusters are combined, resulting in N trajectories in one cluster.

In the first several clustering iterations the TSV increases greatly, then for much of the clustering it typically increases at a small, generally constant rate, but at some point it again increases rapidly, indicating that the clusters being combined are not very similar. This latter increase suggests where to stop the clustering and is clearly seen in a plot of percent change in TSV vs. number of clusters, where the number of clusters are decreasing to the right on the plot. The iterative step just before (to the left of on the plot) the large increase in the change of TSV gives the final number of clusters. Typically there are a few "large" increases.

How to run the cluster analysis:

The window shown is from the Run Example case. For Run Standard, the Run_ID is "Standard", Hours to cluster is "36", the Endpoints folder (directory) is "c:/hysplit/cluster/endpts", and the Number of clusters is set to "1". Run Example performs cluster analysis on the example set of 12-h duration forward trajectories. Note one of the trajectories has a duration less than 12 hours and so it is not clustered. The number of trajectories in the example set is small to keep the cluster section of the HYSPLIT PC package a reasonable size.

Step 1Inputs.

Step 2Run Cluster Program.  Possible solutions to the cluster analysis will be available at the end of this step.

  The cluster program produces these output files:

Step 3  Get Results.  This step may be repeated using different numbers of clusters. If you exit the GUI, but have not archived your results, enter the Run_ID and the Working folder again from Step 1, then continue with Step 3. If you have already archived the results, but want to try a different number of clusters, manually copy everything from the archive folder to the cluster working folder, enter the Run_ID and the Archive and working folders, then enter the number of clusters, etc.

Table of Contents