A/B Testing in Power BI

A/B testing is a popular statistical technique used to understand whether a change to method actually performs better. It is commonly used in website design, marketing campaigns, and health care.

Jackie Tejwani

Jackie Tejwani

Director - Business Intelligence

August 26, 2024

Introduction

In this blog I will go over on the 5 steps required to perform A/B testing in Power BI.

For the use case I will be using an example health care dataset. The dataset contains records of 479 patients with history of substance abuse and mental health conditions which have been split into two different programs:

Usual care: 336 patients (70%)

Intervention: 143 patients (30%)


Data Model


Steps

Step 1: Identify a goal

The goal at the end of the study is to understand is there a statistical difference between both the programs.

The study will be using a DLA-Assessment score which will be measured before the program and after. An increase in DLA-Assessment score means the patient has increase their general health post the program.

Finally we will measure the mean difference between both the programs to see which one is more effective.


Step 2: Develop two different segments (Test group & control group)

The patients will be split into two different types of programs

Test group

Patients will be put in an Intervention program: 143 patients (30%)

Control group

No change in the Patients care program: 336 patients (70%)


Step 3:  Set up Hypothesis (null hypothesis & alternative hypothesis)

H0 = There is no difference between both the programs

Ha = There is a difference between both the programs


Step 4:  Analyze results (Difference in means between both methods)

Calculating means

Mean of DLA assessment 1 before patients going on the program:

Average DLA base = AVERAGEX( 'fct_Substance Abuse Program' , 
                             'fct_Substance Abuse Program'[Daily Living Activity Tes 1]
                           )


Mean of DLA assessment 2 post patients going on the program:

Average DLA followup = AVERAGEX( 'fct_Substance Abuse Program' , 
                                 'fct_Substance Abuse Program'[Daily Living Activity Test 2]
                               )

 

DLA Assessment Improvement:

AVERAGEX( 'fct_Substance Abuse Program',
          [Daily Living Activity Test 2] - [Daily Living Activity Tes 1] 
        )

 

μ0  = Mean DLA Assessment Improvement for Intervention Program

CALCULATE( [Difference in Means], 
           FILTER( dim_Program, dim_Program[Program Key] = 1)
         )

 

μ1 = Mean DLA Assessment Improvement for Usual Care program

CALCULATE( [Difference in Means], 
           FILTER( dim_Program, dim_Program[Program Key] = 2)
         )

 

μ0 - μ1 = Difference in means

[Average DLA Followup Internvention] - [Average DLA Followup UsualCare]


Step 5:  Calculate the statistical significance of the results (eliminate change by random chance)

Once we have calculated the difference in means between both programs, we can then check this result to see if the change is statistically significant.

For this example we will be using 2 tailed Z-test with a 95% confidence interval.


Calculating standard deviation

σ0 = STDEVX.P( ADDCOLUMNS( VALUES( 'fct_Substance Abuse Program'[Patient ID]) , "_Value" , [Average DLA followup] ), [_Value] )

σ1 = STDEVX.P( ADDCOLUMNS( VALUES( 'fct_Substance Abuse Program'[Patient ID]) , "_Value" , [Average DLA base] ), [_Value] )

 

Calculating standard error

Var SDUsualCare = POWER(CALCULATE( [SD DLA followup] , dim_Program[Program Key] = 2) , 2)

Var SDIntervention = POWER(CALCULATE( [SD DLA followup] , dim_Program[Program Key] = 1) , 2)

Return

SQRT( (DIVIDE( SDUsualCare , [Patient Count Usual Care] , 0) + (DIVIDE( SDIntervention , [Patient Count Intervention] , 0)

)))

 

Calculating Z-Score

DIVIDE( ([Difference in Means Follow up] - 0) , [Standard Error] , 0 )

 

Calculating Confidence Interval

Confidence Interval 1 = [Difference in Means Follow up] - (1.96 * [Standard Error] )

Confidence Interval 2 = [Difference in Means Follow up] + (1.96 * [Standard Error] )


Step 6:  Interpreting the Results

The Z-score result reflects how many standard deviations away Difference in means between the two programs.

For a 95% confidence interval the critical value of the Z-score is either <-1.96  or >1.96 is considered statistically significant. This signifies that the difference in means between the two programs is unlikely due to just chance.


Introduction

In this blog I will go over on the 5 steps required to perform A/B testing in Power BI.

For the use case I will be using an example health care dataset. The dataset contains records of 479 patients with history of substance abuse and mental health conditions which have been split into two different programs:

Usual care: 336 patients (70%)

Intervention: 143 patients (30%)


Data Model


Steps

Step 1: Identify a goal

The goal at the end of the study is to understand is there a statistical difference between both the programs.

The study will be using a DLA-Assessment score which will be measured before the program and after. An increase in DLA-Assessment score means the patient has increase their general health post the program.

Finally we will measure the mean difference between both the programs to see which one is more effective.


Step 2: Develop two different segments (Test group & control group)

The patients will be split into two different types of programs

Test group

Patients will be put in an Intervention program: 143 patients (30%)

Control group

No change in the Patients care program: 336 patients (70%)


Step 3:  Set up Hypothesis (null hypothesis & alternative hypothesis)

H0 = There is no difference between both the programs

Ha = There is a difference between both the programs


Step 4:  Analyze results (Difference in means between both methods)

Calculating means

Mean of DLA assessment 1 before patients going on the program:

Average DLA base = AVERAGEX( 'fct_Substance Abuse Program' , 
                             'fct_Substance Abuse Program'[Daily Living Activity Tes 1]
                           )


Mean of DLA assessment 2 post patients going on the program:

Average DLA followup = AVERAGEX( 'fct_Substance Abuse Program' , 
                                 'fct_Substance Abuse Program'[Daily Living Activity Test 2]
                               )

 

DLA Assessment Improvement:

AVERAGEX( 'fct_Substance Abuse Program',
          [Daily Living Activity Test 2] - [Daily Living Activity Tes 1] 
        )

 

μ0  = Mean DLA Assessment Improvement for Intervention Program

CALCULATE( [Difference in Means], 
           FILTER( dim_Program, dim_Program[Program Key] = 1)
         )

 

μ1 = Mean DLA Assessment Improvement for Usual Care program

CALCULATE( [Difference in Means], 
           FILTER( dim_Program, dim_Program[Program Key] = 2)
         )

 

μ0 - μ1 = Difference in means

[Average DLA Followup Internvention] - [Average DLA Followup UsualCare]


Step 5:  Calculate the statistical significance of the results (eliminate change by random chance)

Once we have calculated the difference in means between both programs, we can then check this result to see if the change is statistically significant.

For this example we will be using 2 tailed Z-test with a 95% confidence interval.


Calculating standard deviation

σ0 = STDEVX.P( ADDCOLUMNS( VALUES( 'fct_Substance Abuse Program'[Patient ID]) , "_Value" , [Average DLA followup] ), [_Value] )

σ1 = STDEVX.P( ADDCOLUMNS( VALUES( 'fct_Substance Abuse Program'[Patient ID]) , "_Value" , [Average DLA base] ), [_Value] )

 

Calculating standard error

Var SDUsualCare = POWER(CALCULATE( [SD DLA followup] , dim_Program[Program Key] = 2) , 2)

Var SDIntervention = POWER(CALCULATE( [SD DLA followup] , dim_Program[Program Key] = 1) , 2)

Return

SQRT( (DIVIDE( SDUsualCare , [Patient Count Usual Care] , 0) + (DIVIDE( SDIntervention , [Patient Count Intervention] , 0)

)))

 

Calculating Z-Score

DIVIDE( ([Difference in Means Follow up] - 0) , [Standard Error] , 0 )

 

Calculating Confidence Interval

Confidence Interval 1 = [Difference in Means Follow up] - (1.96 * [Standard Error] )

Confidence Interval 2 = [Difference in Means Follow up] + (1.96 * [Standard Error] )


Step 6:  Interpreting the Results

The Z-score result reflects how many standard deviations away Difference in means between the two programs.

For a 95% confidence interval the critical value of the Z-score is either <-1.96  or >1.96 is considered statistically significant. This signifies that the difference in means between the two programs is unlikely due to just chance.


Introduction

In this blog I will go over on the 5 steps required to perform A/B testing in Power BI.

For the use case I will be using an example health care dataset. The dataset contains records of 479 patients with history of substance abuse and mental health conditions which have been split into two different programs:

Usual care: 336 patients (70%)

Intervention: 143 patients (30%)


Data Model


Steps

Step 1: Identify a goal

The goal at the end of the study is to understand is there a statistical difference between both the programs.

The study will be using a DLA-Assessment score which will be measured before the program and after. An increase in DLA-Assessment score means the patient has increase their general health post the program.

Finally we will measure the mean difference between both the programs to see which one is more effective.


Step 2: Develop two different segments (Test group & control group)

The patients will be split into two different types of programs

Test group

Patients will be put in an Intervention program: 143 patients (30%)

Control group

No change in the Patients care program: 336 patients (70%)


Step 3:  Set up Hypothesis (null hypothesis & alternative hypothesis)

H0 = There is no difference between both the programs

Ha = There is a difference between both the programs


Step 4:  Analyze results (Difference in means between both methods)

Calculating means

Mean of DLA assessment 1 before patients going on the program:

Average DLA base = AVERAGEX( 'fct_Substance Abuse Program' , 
                             'fct_Substance Abuse Program'[Daily Living Activity Tes 1]
                           )


Mean of DLA assessment 2 post patients going on the program:

Average DLA followup = AVERAGEX( 'fct_Substance Abuse Program' , 
                                 'fct_Substance Abuse Program'[Daily Living Activity Test 2]
                               )

 

DLA Assessment Improvement:

AVERAGEX( 'fct_Substance Abuse Program',
          [Daily Living Activity Test 2] - [Daily Living Activity Tes 1] 
        )

 

μ0  = Mean DLA Assessment Improvement for Intervention Program

CALCULATE( [Difference in Means], 
           FILTER( dim_Program, dim_Program[Program Key] = 1)
         )

 

μ1 = Mean DLA Assessment Improvement for Usual Care program

CALCULATE( [Difference in Means], 
           FILTER( dim_Program, dim_Program[Program Key] = 2)
         )

 

μ0 - μ1 = Difference in means

[Average DLA Followup Internvention] - [Average DLA Followup UsualCare]


Step 5:  Calculate the statistical significance of the results (eliminate change by random chance)

Once we have calculated the difference in means between both programs, we can then check this result to see if the change is statistically significant.

For this example we will be using 2 tailed Z-test with a 95% confidence interval.


Calculating standard deviation

σ0 = STDEVX.P( ADDCOLUMNS( VALUES( 'fct_Substance Abuse Program'[Patient ID]) , "_Value" , [Average DLA followup] ), [_Value] )

σ1 = STDEVX.P( ADDCOLUMNS( VALUES( 'fct_Substance Abuse Program'[Patient ID]) , "_Value" , [Average DLA base] ), [_Value] )

 

Calculating standard error

Var SDUsualCare = POWER(CALCULATE( [SD DLA followup] , dim_Program[Program Key] = 2) , 2)

Var SDIntervention = POWER(CALCULATE( [SD DLA followup] , dim_Program[Program Key] = 1) , 2)

Return

SQRT( (DIVIDE( SDUsualCare , [Patient Count Usual Care] , 0) + (DIVIDE( SDIntervention , [Patient Count Intervention] , 0)

)))

 

Calculating Z-Score

DIVIDE( ([Difference in Means Follow up] - 0) , [Standard Error] , 0 )

 

Calculating Confidence Interval

Confidence Interval 1 = [Difference in Means Follow up] - (1.96 * [Standard Error] )

Confidence Interval 2 = [Difference in Means Follow up] + (1.96 * [Standard Error] )


Step 6:  Interpreting the Results

The Z-score result reflects how many standard deviations away Difference in means between the two programs.

For a 95% confidence interval the critical value of the Z-score is either <-1.96  or >1.96 is considered statistically significant. This signifies that the difference in means between the two programs is unlikely due to just chance.


SHARE