CONTENT
A/B Testing in Power BI
A/B testing is a popular statistical technique used to understand whether a change to method actually performs better. It is commonly used in website design, marketing campaigns, and health care.
Jackie Tejwani
Director - Business Intelligence
August 26, 2024
Introduction
In this blog I will go over on the 5 steps required to perform A/B testing in Power BI.
For the use case I will be using an example health care dataset. The dataset contains records of 479 patients with history of substance abuse and mental health conditions which have been split into two different programs:
Usual care: 336 patients (70%)
Intervention: 143 patients (30%)
Data Model
Steps
Step 1: Identify a goal
The goal at the end of the study is to understand is there a statistical difference between both the programs.
The study will be using a DLA-Assessment score which will be measured before the program and after. An increase in DLA-Assessment score means the patient has increase their general health post the program.
Finally we will measure the mean difference between both the programs to see which one is more effective.
Step 2: Develop two different segments (Test group & control group)
The patients will be split into two different types of programs
Test group
Patients will be put in an Intervention program: 143 patients (30%)
Control group
No change in the Patients care program: 336 patients (70%)
Step 3: Set up Hypothesis (null hypothesis & alternative hypothesis)
H0 = There is no difference between both the programs
Ha = There is a difference between both the programs
Step 4: Analyze results (Difference in means between both methods)
Calculating means
Mean of DLA assessment 1 before patients going on the program:
Average DLA base = AVERAGEX( 'fct_Substance Abuse Program' ,
'fct_Substance Abuse Program'[Daily Living Activity Tes 1]
)
Mean of DLA assessment 2 post patients going on the program:
Average DLA followup = AVERAGEX( 'fct_Substance Abuse Program' ,
'fct_Substance Abuse Program'[Daily Living Activity Test 2]
)
DLA Assessment Improvement:
AVERAGEX( 'fct_Substance Abuse Program',
[Daily Living Activity Test 2] - [Daily Living Activity Tes 1]
)
μ0 = Mean DLA Assessment Improvement for Intervention Program
CALCULATE( [Difference in Means],
FILTER( dim_Program, dim_Program[Program Key] = 1)
)
μ1 = Mean DLA Assessment Improvement for Usual Care program
CALCULATE( [Difference in Means],
FILTER( dim_Program, dim_Program[Program Key] = 2)
)
μ0 - μ1 = Difference in means
[Average DLA Followup Internvention] - [Average DLA Followup UsualCare]
Step 5: Calculate the statistical significance of the results (eliminate change by random chance)
Once we have calculated the difference in means between both programs, we can then check this result to see if the change is statistically significant.
For this example we will be using 2 tailed Z-test with a 95% confidence interval.
Calculating standard deviation
σ0 = STDEVX.P( ADDCOLUMNS( VALUES( 'fct_Substance Abuse Program'[Patient ID]) , "_Value" , [Average DLA followup] ), [_Value] )
σ1 = STDEVX.P( ADDCOLUMNS( VALUES( 'fct_Substance Abuse Program'[Patient ID]) , "_Value" , [Average DLA base] ), [_Value] )
Calculating standard error
Var SDUsualCare = POWER(CALCULATE( [SD DLA followup] , dim_Program[Program Key] = 2) , 2)
Var SDIntervention = POWER(CALCULATE( [SD DLA followup] , dim_Program[Program Key] = 1) , 2)
Return
SQRT( (DIVIDE( SDUsualCare , [Patient Count Usual Care] , 0) + (DIVIDE( SDIntervention , [Patient Count Intervention] , 0)
)))
Calculating Z-Score
DIVIDE( ([Difference in Means Follow up] - 0) , [Standard Error] , 0 )
Calculating Confidence Interval
Confidence Interval 1 = [Difference in Means Follow up] - (1.96 * [Standard Error] )
Confidence Interval 2 = [Difference in Means Follow up] + (1.96 * [Standard Error] )
Step 6: Interpreting the Results
The Z-score result reflects how many standard deviations away Difference in means between the two programs.
For a 95% confidence interval the critical value of the Z-score is either <-1.96 or >1.96 is considered statistically significant. This signifies that the difference in means between the two programs is unlikely due to just chance.
Introduction
In this blog I will go over on the 5 steps required to perform A/B testing in Power BI.
For the use case I will be using an example health care dataset. The dataset contains records of 479 patients with history of substance abuse and mental health conditions which have been split into two different programs:
Usual care: 336 patients (70%)
Intervention: 143 patients (30%)
Data Model
Steps
Step 1: Identify a goal
The goal at the end of the study is to understand is there a statistical difference between both the programs.
The study will be using a DLA-Assessment score which will be measured before the program and after. An increase in DLA-Assessment score means the patient has increase their general health post the program.
Finally we will measure the mean difference between both the programs to see which one is more effective.
Step 2: Develop two different segments (Test group & control group)
The patients will be split into two different types of programs
Test group
Patients will be put in an Intervention program: 143 patients (30%)
Control group
No change in the Patients care program: 336 patients (70%)
Step 3: Set up Hypothesis (null hypothesis & alternative hypothesis)
H0 = There is no difference between both the programs
Ha = There is a difference between both the programs
Step 4: Analyze results (Difference in means between both methods)
Calculating means
Mean of DLA assessment 1 before patients going on the program:
Average DLA base = AVERAGEX( 'fct_Substance Abuse Program' ,
'fct_Substance Abuse Program'[Daily Living Activity Tes 1]
)
Mean of DLA assessment 2 post patients going on the program:
Average DLA followup = AVERAGEX( 'fct_Substance Abuse Program' ,
'fct_Substance Abuse Program'[Daily Living Activity Test 2]
)
DLA Assessment Improvement:
AVERAGEX( 'fct_Substance Abuse Program',
[Daily Living Activity Test 2] - [Daily Living Activity Tes 1]
)
μ0 = Mean DLA Assessment Improvement for Intervention Program
CALCULATE( [Difference in Means],
FILTER( dim_Program, dim_Program[Program Key] = 1)
)
μ1 = Mean DLA Assessment Improvement for Usual Care program
CALCULATE( [Difference in Means],
FILTER( dim_Program, dim_Program[Program Key] = 2)
)
μ0 - μ1 = Difference in means
[Average DLA Followup Internvention] - [Average DLA Followup UsualCare]
Step 5: Calculate the statistical significance of the results (eliminate change by random chance)
Once we have calculated the difference in means between both programs, we can then check this result to see if the change is statistically significant.
For this example we will be using 2 tailed Z-test with a 95% confidence interval.
Calculating standard deviation
σ0 = STDEVX.P( ADDCOLUMNS( VALUES( 'fct_Substance Abuse Program'[Patient ID]) , "_Value" , [Average DLA followup] ), [_Value] )
σ1 = STDEVX.P( ADDCOLUMNS( VALUES( 'fct_Substance Abuse Program'[Patient ID]) , "_Value" , [Average DLA base] ), [_Value] )
Calculating standard error
Var SDUsualCare = POWER(CALCULATE( [SD DLA followup] , dim_Program[Program Key] = 2) , 2)
Var SDIntervention = POWER(CALCULATE( [SD DLA followup] , dim_Program[Program Key] = 1) , 2)
Return
SQRT( (DIVIDE( SDUsualCare , [Patient Count Usual Care] , 0) + (DIVIDE( SDIntervention , [Patient Count Intervention] , 0)
)))
Calculating Z-Score
DIVIDE( ([Difference in Means Follow up] - 0) , [Standard Error] , 0 )
Calculating Confidence Interval
Confidence Interval 1 = [Difference in Means Follow up] - (1.96 * [Standard Error] )
Confidence Interval 2 = [Difference in Means Follow up] + (1.96 * [Standard Error] )
Step 6: Interpreting the Results
The Z-score result reflects how many standard deviations away Difference in means between the two programs.
For a 95% confidence interval the critical value of the Z-score is either <-1.96 or >1.96 is considered statistically significant. This signifies that the difference in means between the two programs is unlikely due to just chance.
Introduction
In this blog I will go over on the 5 steps required to perform A/B testing in Power BI.
For the use case I will be using an example health care dataset. The dataset contains records of 479 patients with history of substance abuse and mental health conditions which have been split into two different programs:
Usual care: 336 patients (70%)
Intervention: 143 patients (30%)
Data Model
Steps
Step 1: Identify a goal
The goal at the end of the study is to understand is there a statistical difference between both the programs.
The study will be using a DLA-Assessment score which will be measured before the program and after. An increase in DLA-Assessment score means the patient has increase their general health post the program.
Finally we will measure the mean difference between both the programs to see which one is more effective.
Step 2: Develop two different segments (Test group & control group)
The patients will be split into two different types of programs
Test group
Patients will be put in an Intervention program: 143 patients (30%)
Control group
No change in the Patients care program: 336 patients (70%)
Step 3: Set up Hypothesis (null hypothesis & alternative hypothesis)
H0 = There is no difference between both the programs
Ha = There is a difference between both the programs
Step 4: Analyze results (Difference in means between both methods)
Calculating means
Mean of DLA assessment 1 before patients going on the program:
Average DLA base = AVERAGEX( 'fct_Substance Abuse Program' ,
'fct_Substance Abuse Program'[Daily Living Activity Tes 1]
)
Mean of DLA assessment 2 post patients going on the program:
Average DLA followup = AVERAGEX( 'fct_Substance Abuse Program' ,
'fct_Substance Abuse Program'[Daily Living Activity Test 2]
)
DLA Assessment Improvement:
AVERAGEX( 'fct_Substance Abuse Program',
[Daily Living Activity Test 2] - [Daily Living Activity Tes 1]
)
μ0 = Mean DLA Assessment Improvement for Intervention Program
CALCULATE( [Difference in Means],
FILTER( dim_Program, dim_Program[Program Key] = 1)
)
μ1 = Mean DLA Assessment Improvement for Usual Care program
CALCULATE( [Difference in Means],
FILTER( dim_Program, dim_Program[Program Key] = 2)
)
μ0 - μ1 = Difference in means
[Average DLA Followup Internvention] - [Average DLA Followup UsualCare]
Step 5: Calculate the statistical significance of the results (eliminate change by random chance)
Once we have calculated the difference in means between both programs, we can then check this result to see if the change is statistically significant.
For this example we will be using 2 tailed Z-test with a 95% confidence interval.
Calculating standard deviation
σ0 = STDEVX.P( ADDCOLUMNS( VALUES( 'fct_Substance Abuse Program'[Patient ID]) , "_Value" , [Average DLA followup] ), [_Value] )
σ1 = STDEVX.P( ADDCOLUMNS( VALUES( 'fct_Substance Abuse Program'[Patient ID]) , "_Value" , [Average DLA base] ), [_Value] )
Calculating standard error
Var SDUsualCare = POWER(CALCULATE( [SD DLA followup] , dim_Program[Program Key] = 2) , 2)
Var SDIntervention = POWER(CALCULATE( [SD DLA followup] , dim_Program[Program Key] = 1) , 2)
Return
SQRT( (DIVIDE( SDUsualCare , [Patient Count Usual Care] , 0) + (DIVIDE( SDIntervention , [Patient Count Intervention] , 0)
)))
Calculating Z-Score
DIVIDE( ([Difference in Means Follow up] - 0) , [Standard Error] , 0 )
Calculating Confidence Interval
Confidence Interval 1 = [Difference in Means Follow up] - (1.96 * [Standard Error] )
Confidence Interval 2 = [Difference in Means Follow up] + (1.96 * [Standard Error] )
Step 6: Interpreting the Results
The Z-score result reflects how many standard deviations away Difference in means between the two programs.
For a 95% confidence interval the critical value of the Z-score is either <-1.96 or >1.96 is considered statistically significant. This signifies that the difference in means between the two programs is unlikely due to just chance.
CONTENT
SHARE