Understanding Hypothesis Testing and its application Using Python

 Understanding  Hypothesis Testing and its application Using Python

Contents of the blog:
  • What is hypothesis testing
  • Application of hypothesis testing 
  • Steps For Hypothesis Testing
  • Type 1 and Type 2 Error in Hypothesis Testing
  • T-Test  / t- value definition and how to apply using Python Code
  • P - Test  / p value definition and how to apply using Python Code
  • ANOVA (Analysis of Variance) Test definition and how to apply using Python Code



HYPOTHESIS TESTING:

Hypothesis testing is a technique by using a statistical method to check whether the assumption made by us is right or not on population parameter (population parameter means whole data when we do hypothesis testing we do on sample data  (some data of the population data)). So basically we do a test on sample data to verify our assumption and whatever the result we get from the sample data. We assume its true for the population parameter(whole data)

Example: Suppose I say A country GDP is greater than B country or the average weight of a class is 60kg. We have made some assumptions now we need now whether our assumption is right or not using stats

Application of Hypothesis testingTo check whether the assumption made by you is true or not using a statistical method


Steps For Hypothesis Testing
  • First, we have to give two hypotheses in which only one can be true. For example, suppose I say the average weight of the class is 40 kg. So this statement can be false or true
  • We will write down what kind of hypothesis test we are going to use and why we are using certain hypothesis testing.
  • We test our hypothesis in our python code to verify whether our assumption is correct or not.
  • After the test, we check whether our null hypothesis (we talk about it later on) is correct or not. Based on the result we will accept and reject null hypothesis

Basics Terms in Hypothesis Testing

  • Null Hypothesis (H0): The hypothesis which we assume is correct . For example, suppose I say the average weight of the class is 40kg. and I assume its true (not verified by stats yet) then it is called the null hypothesis
  • Alternate Hypothesis (H1): The hypothesis which we assume is incorrect (opposite to null hypothesis). For example, suppose I say the average weight of the class is 40 kg. and I assume it's false (not verified by stats yet) then it is called an alternate hypothesis.

Type 1 and Type 2 Error in Hypothesis Testing

Type 1 Error: Suppose after statistical analysis you reject null hypothesis but in reality null hypothesis was correct. This is a false positive or types 1 error mistake. 
Example: Suppose you are playing among us game. After talking with all player you found a guilty and remove him but in reality, the guy was innocent.

Type 2 Error: Suppose after statistical analysis you accept null hypothesis but in reality null hypothesis was wrong. This is a false negative or types 2 error mistake. 
Example: Acquitting the criminal Because you assume he/she is innocent but in reality, he/she is not.


Table of error types
Null hypothesis (H0) is
 
TrueFalse
Decision
about null
hypothesis (H0)
Don't
reject

Correct inference
(true negative)

(probability = 1−α)

Type II error
(false negative)
(probability = Î²
RejectType I error
(false positive)
(probability = Î±

Correct inference
(true positive)

(probability = 1−β)
 


T-test Definition: It is used to measure the difference between two mean groups. To calculate the t-test we need three-parameter  (means, standard deviation, and data size . mean value of both that data to measure the difference, standard deviation of both the groups and no. of data each group have)

P - Test Definition(p value): It is basically the probability of the result between 0 and 1 . if the value of p <= .05 then you will reject that hypothesis.

Let's try to understand what are these using Python Code

One Sample Ttest / t- test

Here we have our dataset of weight that population parameter or data. We first calculate the mean of it. After that, we randomly choose 10 values from our dataset. 
H0 =  Assume mean weight of the student is 30 (Null hypothesis)
H1  = Assume mean weight of the student is not 30 (Alternate Hypothesis)

import numpy as np
weight = [30,23,35,25,36,28,36,28,29,30,26,35,28,26,27,29] #population data
mean = np.mean(weight)
mean

Output: 29.4375

#sample data 
sample_size = 10
weight_sample = np.random.choice(weight,sample_size)
weight_sample
Output: array([36, 35, 29, 28, 26, 35, 28, 36, 36, 23])

from scipy.stats import ttest_1samp
ttest_value ,p_value = ttest_1samp(weight_sample,30)
print('t- test value is ',ttest_value)
print('p value is ',p_value)
if p_value>.05:
    print('We accept the null hypothesis')
else:
    print("We reject the null hypothesis")

Output:
t- test value is  0.7717436331412892
p value is  0.46004898227095714
We accept the null hypothesis


Two-sample T-test With Python

The Independent Samples t Test or 2-sample t-test compares the means of two independent groups in order to determine whether there is statistical evidence that the associated population means are significantly different. The Independent Samples t Test is a parametric test. This test is also known as: Independent t Test



Code 

import numpy as np
import pandas as pd
import scipy.stats as stats
import math
np.random.seed(6)
School_weight=stats.poisson.rvs(loc=18,mu=37,size=1300) #whole school weight 
classA_weight=stats.poisson.rvs(loc=18,mu=30,size=60) #from  the school we select a classA weight
np.random.seed(12)
classb_weight=stats.poisson.rvs(loc=18,mu=33,size=60)
cwa =classA_weight.mean()
cwb =classb_weight.mean()
sw = School_weight.mean()
print('classA weight mean', cwa)
print('classB weight mean', cwb)
print('school weight mean ', sw)

_,p_value=stats.ttest_ind(a=classA_weight,b=classb_weight,equal_var=False)

print("p value ", p_value)

if p_value < 0.05:    # alpha value is 0.05 or 5%
    print(" we are rejecting null hypothesis")
else:
    print("we are accepting null hypothesis")

Output
classA weight mean 48.766666666666666
classB weight mean 50.63333333333333
school weight mean  55.176153846153845
p value  0.07526101562567064
we are accepting null hypothesis


Annova Test: 
The t-test works well when dealing with two groups, but sometimes we want to compare more than two groups at the same time.

For example, if we wanted to test whether petal_width age differs based on some categorical variable like species, we have to compare the means of each level or group the variable

One Way F-test(Anova) :-

It tell whether two or more groups are similar or not based on their mean similarity and f-score.

Example : there are 3 different category of iris flowers and their petal width and need to check whether all 3 group are similar or not


Code : 

import seaborn as sns

d=sns.load_dataset('iris')

d.head()


sepal_lengthsepal_widthpetal_lengthpetal_widthspecies
05.13.51.40.2setosa
14.93.01.40.2setosa
24.73.21.30.2setosa
34.63.11.50.2setosa
45.03.61.40.2setosa

d_anova = d[['petal_width','species']]
groups = pd.unique(d_anova.species.values)
groups

Output : array(['setosa', 'versicolor', 'virginica'], dtype=object)

d_data = {group:d_anova['petal_width'][d_anova.species == group] for group in groups}
d_data

F, p = stats.f_oneway(d_data['setosa'], d_data['versicolor'], d_data['virginica'])
if p<0.05:
    print("reject null hypothesis")
else:
    print("accept null hypothesis")

Output : reject null hypothesis



Other blog


Post a Comment

0 Comments