# Data wrangling# read "dta" fileslibrary(haven) # create tablelibrary(gtsummary)# data wrangling and other functionslibrary(tidyverse)# if you want to have a output similar to statalibrary(tidylog)library(epiDisplay) #some basic tests# Limit significant digits to 2, remove scientific notationlibrary(survival)options(digits =2, scipen =9)
# * BRAZILIAN CASE-CONTROL STUDY OF RISK FACTORS FOR INFANT DEATH FROM DIARRHOEA# * case 1=case, 0=control# * milkgp 1=breast only, 2=breast+other, 3=other only# * bf 1=breastfed, 2=not breastfed# * water Piped water supply: 1=in house, 2=in plot, 3=none# * wat2 1=in house/plot 2=none# * agegp Age group (months): 1=0-1, 2=2-3, 3=4-5, 4=6-8, 5=9-11# * agegp2 1=0-2, 2=3-5, 3=6-11# * milkgp 1=breast only, 2=breast+other, 3=other only# Check the class of each variableglimpse(diabraz)glimpse(diabraz2)# case need to be numericdiabraz2$case <-as.numeric(diabraz2$case)
Q1
Use the DIABRAZ.DTA dataset (1 control per case) and the match and mhodds commands to analyse the association between breast feeding (bf) and diarrhoea mortality. Use the mhodds command to estimate the odds ratio, calculate a confidence interval for the OR, and test the null hypothesis of no association. Repeat this analysis using the clogit command, and compare your results. [For comparison purposes you may like to repeat these analyses using ordinary (unconditional) logistic regression, which should not give the correct results].
Mantel-Haenszel
Show the code
diabraz |> pubh::mhor(case ~ pair / bf)
OR Lower.CI Upper.CI Pr(>|z|)
(Intercept) 1.11 0.60 2.05 0.740
pair 0.97 0.95 0.99 0.015
pair:bf 1.02 1.01 1.03 0.005
Common OR Lower CI Upper CI Pr(>|z|)
Cochran-Mantel-Haenszel: 4.8 2 12 < 0.001
Test for effect modification (interaction): p = 0.65
Show the code
# Other option#epiDisplay::mhor(diabraz$case, diabraz$bf, diabraz$pair, graph = F)
Abbreviations: CI = Confidence Interval, OR = Odds Ratio
Q2
Were children with a piped water supply to the house at lower risk than those with a supply to the plot?
Show the code
mod1<-clogit(case ~ water + agegp2 +strata(set), data = diabraz2)tbl_regression(mod1, exponentiate = T)
Characteristic
OR
95% CI
p-value
Water supply, 3 groups
2.17
1.53, 3.07
<0.001
Age, 3 groups
1.23
0.99, 1.53
0.062
Abbreviations: CI = Confidence Interval, OR = Odds Ratio
Show the code
mod2 <-clogit(case ~ water * agegp2 +strata(set), data = diabraz2)tbl_regression(mod2, exponentiate = T)
Characteristic
OR
95% CI
p-value
Water supply, 3 groups
1.71
0.89, 3.32
0.11
Age, 3 groups
0.99
0.56, 1.75
>0.9
Water supply, 3 groups * Age, 3 groups
1.12
0.85, 1.47
0.4
Abbreviations: CI = Confidence Interval, OR = Odds Ratio
Show the code
lmtest::lrtest(mod1, mod2)
#Df
LogLik
Df
Chisq
Pr(>Chisq)
2
-175
NA
NA
NA
3
-174
1
0.66
0.42
Answer
Fit the main effects, water and agegp2, and then compare with the model which includes the interaction term, water*agegp2. This reveals no evidence of interaction ( \(\chi_{10}^2\)= 1.71, P=0.79). agegp2 has been used above, giving 4 interaction terms. With age grouped in 5 groups (agegp) there would be 8 interaction terms. It is best to avoid fitting interactions involving a large number of parameters, since such tests for interaction generally have low power
Q3
Using the full dataset (DIABRAZ2.DTA), use the variable milkgp to examine the effects of infant feeding practices on the risk of death from diarrhoea. You will need to make sure that you take appropriate account of age in your analysis, since age is likely to be a very strong confounder, and you may also need to consider the possible confounding effects of other risk factors. Draw up a table to show the effect of infant feeding practices.
Show the code
# crudemod1 <-clogit(case ~ milkgp +strata(set), data = diabraz2) # adjusted for sexmod2 <-clogit(case ~ milkgp + sex +strata(set), data = diabraz2) # adjusted for agemod3 <-clogit(case ~ milkgp + agegp +strata(set), data = diabraz2) # adjusted for age by monthmod4 <-clogit(case ~ milkgp +as.factor(age) +strata(set), data = diabraz2) # adjusted for age and mothers educationmod5 <-clogit(case ~ milkgp +as.factor(age) + meduc +strata(set), data = diabraz2) tbl1 <-tbl_regression(mod1, exponentiate = T)tbl2 <-tbl_regression(mod2, exponentiate = T)tbl3 <-tbl_regression(mod3, exponentiate = T)tbl4 <-tbl_regression(mod4, exponentiate = T)tbl5 <-tbl_regression(mod5, exponentiate = T)tbl_merge(list(tbl1, tbl2, tbl3, tbl4, tbl5),tab_spanner =c("**Crude**", "**Adjusted Sex**","**Adjusted agegroup**","**Adjusted age-month**","**Adjusted age-month/mother education**")) |>modify_column_hide(columns =c(p.value_1, p.value_2, p.value_3, p.value_4, p.value_5))# remove p values (tidier)
Characteristic
Crude
Adjusted Sex
Adjusted agegroup
Adjusted age-month
Adjusted age-month/mother education
OR
95% CI
OR
95% CI
OR
95% CI
OR
95% CI
OR
95% CI
Milk, 3 groups
2.51
1.90, 3.30
2.52
1.91, 3.32
3.74
2.65, 5.28
3.64
2.51, 5.28
4.03
2.73, 5.95
Sex
0.78
0.52, 1.16
Age, 5 groups
0.66
0.54, 0.79
as.factor(age)
0
—
—
—
—
1
1.00
0.23, 4.36
0.83
0.19, 3.70
2
0.84
0.20, 3.50
0.75
0.18, 3.19
3
3.08
0.74, 12.8
2.56
0.60, 10.8
4
1.18
0.28, 5.06
0.99
0.23, 4.32
5
0.79
0.17, 3.70
0.61
0.12, 2.99
6
1.13
0.24, 5.34
1.02
0.21, 4.88
7
0.73
0.16, 3.39
0.57
0.12, 2.78
8
0.45
0.10, 2.14
0.38
0.08, 1.86
9
0.06
0.01, 0.47
0.05
0.01, 0.42
10
0.29
0.05, 1.73
0.23
0.04, 1.39
11
0.22
0.04, 1.35
0.18
0.03, 1.10
Maternal education
0.68
0.52, 0.89
Abbreviations: CI = Confidence Interval, OR = Odds Ratio