# 398 probability/cab.p

A cab was involved in a hit and run accident at night. Two cab companies,
the Green and the Blue, operate in the city. Here is some data:

a) Although the two companies are equal in size, 85% of cab
accidents in the city involve Green cabs and 15% involve Blue cabs.

b) A witness identified the cab in this particular accident as Blue.
The court tested the reliability of the witness under the same circumstances
that existed on the night of the accident and concluded that the witness
correctly identified each one of the two colors 80% of the time and failed
20% of the time.

What is the probability that the cab involved in the accident was
Blue rather than Green?

If it looks like an obvious problem in statistics, then consider the
following argument:

The probability that the color of the cab was Blue is 80%! After all,
the witness is correct 80% of the time, and this time he said it was Blue!

What else need be considered? Nothing, right?

If we look at Bayes theorem (pretty basic statistical theorem) we
should get a much lower probability. But why should we consider statistical
theorems when the problem appears so clear cut? Should we just accept the
80% figure as correct?

probability/cab.s

The police tests don't apply directly, because according to the
wording, the witness, given any mix of cabs, would get the right
answer 80% of the time. Thus given a mix of 85% green and 15% blue
cabs, he will say 20% of the green cabs and 80% of the blue cabs are
blue. That's 20% of 85% plus 80% of 15%, or 17%+12% = 29% of all the
cabs that the witness will say are blue. Of those, only 12/29 are
actually blue. Thus P(cab is blue | witness claims blue) = 12/29.
That's just a little over 40%.

Think of it this way... suppose you had a robot watching parts on a
conveyor belt to spot defective parts, and suppose the robot made a
correct determination only 50% of the time (I know, you should
probably get rid of the robot...). If one out of a billion parts are
defective, then to a very good approximation you'd expect half your
parts to be rejected by the robot. That's 500 million per billion.
But you wouldn't expect more than one of those to be genuinely
defective. So given the mix of parts, a lot more than 50% of the
REJECTED parts will be rejected by mistake (even though 50% of ALL the
parts are correctly identified, and in particular, 50% of the
defective parts are rejected).

When the biases get so enormous, things starts getting quite a bit
more in line with intuition.

For a related real-life example of probability in the courtroom see
People v. Collins, 68 Cal 2d319 (1968).

Continue to: