@@ -160,7 +160,7 @@ Pr(B | A) = Pr(A)/Pr(A) = 1
160160
161161But since _ all_ probabilities are less than 1, it follows that ` Pr(B | A) ≥ Pr(A) ` .
162162
163- # Base-rate fallacy
163+ # Base-rate fallacy {.solved}
164164
165165The [ base-rate fallacy] ( https://en.wikipedia.org/wiki/Base_rate_fallacy ) is
166166about a common probabilistic fallacy involving conditional probabilities, which
@@ -210,3 +210,160 @@ medicine to a patient where we're more certain than not that they have the
210210disease. How many times do you need to get a positive result before you should
211211administer the medicine if you apply Bayesian updating after each positive
212212result.
213+
214+ ## Solution {#base-rate-fallacySolution .solution}
215+
216+ 1 . Here's the relevant probabilities:
217+
218+ - ` Pr(HasDisease) = 0.01 `
219+ - ` Pr(TestPositive | HasDisease) = 0.9 `
220+ - ` Pr(¬TestPositive | ¬HasDisease) = 0.9 `
221+
222+ 2 . According to Bayes formula
223+
224+ ```
225+ Pr(HasDisease | TestPositive)
226+ ```
227+ ```
228+ =
229+ ```
230+ ```
231+ [Pr(HasDisease) x Pr(TestPositive | HasDisease) ] / Pr(TestPositive)
232+ ```
233+
234+ We have `Pr(HasDisease)` and `Pr(TestPositive | HasDisease)`, but we need to figure out `Pr(TestPositive)`.
235+
236+ For this, we use the law of total probability. Apply it to `TestPositive` as `A` and `HasDisease` as `B` gives us:
237+
238+ ```
239+ Pr(TestPositive)
240+ ```
241+ ```
242+ =
243+ ```
244+ ```
245+ Pr(TestPositive| HasDiesease)Pr(HasDisease)
246+ ```
247+ ```
248+ +
249+ ```
250+ ```
251+ Pr(TestPositive | ¬HasDisease)Pr(¬HasDisease)
252+ ```
253+
254+ We have `Pr(TestPositive| HasDiesease)` and `Pr(HasDisease)` given. To obtain
255+ `Pr(TestPositive | ¬HasDisease)` and `Pr(¬HasDisease)`, we apply the negation laws:
256+ ```
257+ Pr(TestPositive | ¬HasDisease) = 1 - Pr(¬TestPositive | ¬HasDisease)
258+ ```
259+ ```
260+ Pr(¬HasDisease) = 1 - Pr(HasDisease)
261+ ```
262+
263+ Now we have all the relevant values and can calculate:
264+
265+ ```
266+ Pr(¬HasDisease) = 1 - Pr(HasDisease) = 1 - 0.01 = 0.99
267+ ```
268+ ```
269+ Pr(TestPositive | ¬HasDisease) = 1 - Pr(¬TestPositive | ¬HasDisease)
270+ ```
271+ ```
272+ = 1 - 0.9 = 0.1
273+ ```
274+ ```
275+ Pr(TestPositive | ¬HasDisease)Pr(¬HasDisease) = 0.99 x 0.1 = 0.099
276+ ```
277+
278+ So:
279+
280+ ```
281+ Pr(TestPositive)
282+ ```
283+ ```
284+ =
285+ ```
286+ ```
287+ Pr(TestPositive| HasDiesease)Pr(HasDisease)
288+ ```
289+ ```
290+ +
291+ ```
292+ ```
293+ Pr(TestPositive | ¬HasDisease)Pr(¬HasDisease)
294+ ```
295+ ```
296+ =
297+ ```
298+ ```
299+ (0.9 x 0.01) + 0.099 = 0.108
300+ ```
301+
302+ Now for the final probabilities:
303+
304+ ```
305+ Pr(HasDisease | TestPositive)
306+ ```
307+ ```
308+ =
309+ ```
310+ ```
311+ [Pr(HasDisease) x Pr(TestPositive | HasDisease) ] / Pr(TestPositive)
312+ ```
313+ ```
314+ =
315+ ```
316+ ```
317+ (0.01 x 0.9) / 0.108 = 0.009 / 0.108 = 0.083...
318+ ```
319+
320+ In other words, a random person testing positive, given this setup, gives
321+ us a probability of around 8% of them having the disease—even if the test
322+ is 90% reliable. This is because the prior probability of the person having
323+ the disease—the so-called base-rate—is very low.
324+
325+ Note that it's important here that the person was chosen randomly. If there
326+ is additional information, such as them showing symptoms, the prior of them
327+ having the disease would be different.
328+
329+ 3. If we re-set the probabilities using Bayesian updating, we now have:
330+
331+ ```
332+ Pr(Disease) = 0.083...
333+ ```
334+
335+ This affects the value of `Pr(TestPositive)`, which we've calculated using `Pr(Disease)`.
336+
337+ We now get:
338+
339+ ```
340+ Pr(¬HasDisease) = 1 - Pr(HasDisease) = 1 - 0.083.. = 0.916...
341+ ```
342+ ```
343+ Pr(TestPositive | ¬HasDisease)Pr(¬HasDisease) = 0.1 x 0.916... = 0.0916...
344+ ```
345+
346+ So, we now have for `Pr(TestPositive)` that:
347+
348+ ```
349+ Pr(TestPositive) = 0.009 + 0.0916... = 0.1006...
350+ ```
351+
352+ ```
353+ Pr(HasDisease | TestPositive)
354+ ```
355+ ```
356+ =
357+ ```
358+ ```
359+ [Pr(HasDisease) x Pr(TestPositive | HasDisease) ] / Pr(TestPositive)
360+ ```
361+ ```
362+ =
363+ ```
364+ ```
365+ (0.083 x 0.9) / 0.106 = 0.0747 / 0.1006... ≈ 0.742..
366+ ```
367+
368+ So, two consecutive positive tests on a random citizen are sufficient to raise the probability to around 75%.
369+
0 commit comments