# CHI-SQUARE TEST For Machine Learning and Data Analytics

• By Sagar Gade
• October 6, 2023
• Machine Learning

# CHI-SQUARE TEST For Machine Learning and Data Analytics

Chi-Square Test for Machine Learning and Data Analytics.

▪ The most popular measure to test for an association between two categorical data and features.

• The chi-square test is based on observed data and Expected data.   • It measures the difference between observed and expected data.

Chi-Square Test for Feature Selection in Machine Learning

## Steps to perform the Chi-Square Test For Machine Learning and Data Analytics:

1.Define Hypothesis.

2.Build a Contingency table.

3.Find the expected values.

4.Calculate the Chi-Square statistic.

5.Accept or Reject the Null Hypothesis.

### 2. Contingency table

A table showing the distribution of one variable in rows and another in columns. It is used   to study the relation between two variables.

Type Compact Large Midsize Small Sporty Van  AirBags

Driver &

2 3 6 0 3 0

Passenger

Driver only 9 6 11 5 8 3

None 5 0 4 16 2 6

### Accept or Reject the Null Hypothesis

If the p-value is less than the assumed significance value (0.05), then we fail to accept that there is no association between the variables. That is, we reject the   NULL hypothesis and accept the alternate hypothesis claim. Enhance your skills and dive into the world of Machine Learning with our hands-on Machine Learning training in Pune

DEMO

CONCLUSION

Output

(31.496973760366618,

0.0004854823787767891,

10,

array([[2.51685393, 1.41573034,

3.30337079, 3.30337079, 2.04494382,

1.41573034], [7.5505618 , 4.24719101,

9.91011236, 9.91011236, 6.13483146,

4.24719101], [5.93258427, 3.33707865,

7.78651685, 7.78651685, 4.82022472,

3.33707865]]))

From above, 0.00048 is the p-value, 31.49 is the statistical value and 10 is the degree of freedom. As the p-value is greater than 0.05, we reject the NULL   hypothesis.

variables ‘AirBags’ and ‘Type’ are not independent of each other.

Author:-