6.3 Clustering Equations


Previous

HOME


Next

In the clustering process the spatial variance (SV) is computed between each endpoint (k) along trajectory (j) within its cluster (i):

  • SVi,j = Σk (Pj,k - Mi,k)2
where the sum is taken over the number of endpoints along the trajectory and P and M are the position vectors for the individual trajectory and its cluster mean trajectory, respectively. The cluster spatial variance (CSV) is then just the sum of the spatial variance of all trajectories within its cluster:
  • CSVi = Σj SVi,j
and the total spatial variance (TSV) is the sum of the CSV over all clusters:
  • TSV = Σi CSVj,k

Clustering starts by assigning each trajectory to its own cluster, so that there are i clusters with j=1 trajectories in each cluster. In each iteration the cluster number is reduced by one as two clusters are merged together. So that after the 2nd iteration there will be i-1 clusters consisting of one cluster with two trajectories and the remainder with one trajectory. This process continues until only one cluster remains.

With each iteration, the TSV is computed for each cluster merge combination, that is the trajectories in cluster 1 are added to those in cluster 2, cluster 3, and so on, until the TSV is computed for all remaining cluster combinations, resulting in (i2-i)/2 computations per iteration. The combination with the minimum TSV is passed on to the next iteration. This process continues until there is only one cluster. Large groups of trajectories could take quite some time to complete the clustering process.

Initially the TSV rises dramatically, then levels off for the remaining iterations until near the end of the computation, at which point disparate clusters start to be merged and the TSV goes up again. The ideal final number of clusters would be prior to the point of rise.