Jaccard Coefficient — Unit 5

Overview

The Jaccard coefficient is a measure of similarity between two sets of unequal or asymmetric vectors.

Values range from 0 to 1, where values closer to 1 indicate greater similarity.

Jaccard Similarity Formula

Similarity = F11 / (F11 + F10 + F01)

Jaccard Distance / Dissimilarity

D = (F10 + F01) / (F11 + F10 + F01)

A higher Jaccard distance indicates that two records are more dissimilar.

Task

Calculate the Jaccard distance/dissimilarity between the possible pairs.

Gender is a binary variable of equal importance and is therefore not considered in the calculation.

Step 1 — Convert Table to Positive/Negative Outputs

It is unclear what A means in the original table (possibly ambiguous).

For these calculations:

Name Fever Cough Test-1 Test-2 Test-3 Test-4
Jack 1 0 1 0 0 0
Mary 1 0 1 0 1 0
Jim 1 1 0 0 0 0

Calculations

A) Jack and Mary

D = (0 + 1) / (2 + 0 + 1)

D = 1 / 3

D = 0.33

B) Jack and Jim

D = (1 + 1) / (1 + 2 + 0)

D = 2 / 3

D = 0.67

C) Mary and Jim

D = (2 + 1) / (1 + 2 + 1)

D = 3 / 4

D = 0.75

Conclusion

(Mary, Jim) has the greatest Jaccard distance.

Therefore, Mary and Jim are the most dissimilar pair.

The pair (Mary, Jim) has a greater Jaccard distance than (Jack, Mary) or (Jack, Jim).


⬅️ Return to Machine Learning