Re: Evaluation question
Posted by 2cents on March 12, 2002 at 18:29:27:

Hi Roger:

Let's say there are 3 outcomes to a prediction: HIT (in all 3 parameters), partial hit (one or more parameters), and a complete miss (no parameters where matched).

You sound like you want to address both the partial hit and complete miss cases by some method which allows assigning a "degree of closeness" to the partial hit or miss (for training or evaluation purposes).

My suggestion is to expand the range of the 3 stated parameters in turn (location,time window, magnitude) and then calculate the probability for that particular earthquake.

For example, someone predicts a 3.3-4.3 quake in a given region and time window. Instead a mag. 5 happens. You could then calculate the probability of the expanded prediction to include the mag. 5.

Now a rule with a decision threshold could be used. If the probability of the expanded prediction parameters earthquake was less than some number between 0.0 and 1.0 (say 33% or 25 % or some threshold percentage) then it would be retained as a partial-hit but with the probability of the "captured earthquake" so noted in a scoring column format.

For example column headings could be:

1) Orig. Prediction - Time, Location, Mag (T,L,M)
1b) Odds of a complete Hit.
2) Expanded Prediction - (Use new Mag range to include Mag 5)
2b) Probability of earthquake using "captured earthquake" (the mag. 5)

If no earthquake can be found nearby in the predicted region of time, location, and magnitude with a probability less than say 33% ( 1 in 3 or 4) then the prediction is ruled a complete miss.

For each predictor an accumulated score will develop that can show the accumulated odds that have been showing for Hits, Misses, and Partial hits. The probabilities (all multiplied together over time) of the expanded predictions with the captured earthquakes gives an indication of how close the predictor is (or is not) over time.

A sample might be as shown for predictor X ovewr time:

1 - Made 1000 predictions
2 - Had 30% exactly correct predictions (300). Accumulated odds are 1 in 1000 (or whatever it turns out to be). The accumulated odds are the individual probabilities of a correct prediction multiplied together. So in this case, (say the average odds for this predictor is 1 in 4 so we would (0.25) to the 300 power (and then inverted).

Accumulated odds = 1 / (0.25)**300

Note: A predictor could always adjust the parameters to do 1 in 4 odds predictions or 1 in 20 or whatever they choose.

2 - Using the probability threshold of 33 % to decide HIT or MISS, accumulated 70% Incorrect predictions (note a MISS is defined as an incorrect prediction and not an undetected earhquake that was missed altogether as far as making a prediction goes...).
Upon searching a list of earthquakes and calculating the probability by expanding the original prediction in time window, location (range ring), or magnitude all the cases where the odds or above the threshold (e.g. 33% are tossed out of further consideration...ruled as background seismicity earthquakes).

Remaining will be a range of earthquakes with varying probabilities depending on their parameters with regards to the original prediction.

Let's say there were 5 earthquakes in expanded predictions with the following probabilities of happening (I can't spell occurrence?):

0.26 (26 %)
0.20
0.15
0.10
0.08

Clearly the .08 earthquake will make the predictor look good whereas the 0.26 probability case will not.

A decision could be made at this point to accumulate (as for the hits cases) either the best or worst (or both) probability case as part of the predictors record of success/failure.

The benefit of this approach (assuming I explained it well enough) is that it becomes "geography" independent sort of by noit using "hardwired" ranges per magnitude (and other approaches) since seismicity varies around the globe and the hardwiring will create an evaluation bias IMO.

Ok...so to try (again) and create a summary performance for predictor X it would be as follows:

1 - Made 100 predictions
2 - 25 % where hits. Average odds for those 25 are 1 in 4 (or whatever it is).
3 - Using a threshold of 33% had 25% complete Misses (since quakes with odds abovew 33 % were ruled as misses).
4 - Had 50 % partial hits with average high odds (if using the low probability quake on the list) of 1 in 10 and average low odds of 1 in 3.3 (if using the high probability quake on the list).
Or accumlated odds could be used in place of average odds or vice versa depending on preference.

This method would quickly sort out whose is onto to something and who is not.

Of course the question of handling aftershocks, etc. is another topic of discussion (though one could argue that the probabilities method above could account for this effect (and give low weighting to it automatically).

Comments are welcome.

Just my 2cents worth...


Follow Ups:
     ● A clarification or two - 2cents  18:53:31 - 3/12/2002  (13622)  (0)