3: Matched case-control studies

Author

Thiago Cerqueira Silva

Show the code
# Data wrangling
# read "dta" files
library(haven) 
# create table
library(gtsummary)
# data wrangling and other functions
library(tidyverse)
# if you want to have a output similar to stata
library(tidylog)
library(epiDisplay) #some basic tests
# Limit significant digits to 2, remove scientific notation
library(survival)
options(digits = 2, scipen = 9)
Show the code
diabraz <- read_dta("datasets/DIABRAZ.DTA")
diabraz2 <- read_dta("datasets/DIABRAZ2.DTA")
Show the code
# * BRAZILIAN CASE-CONTROL STUDY OF RISK FACTORS FOR INFANT DEATH FROM DIARRHOEA
# * case       1=case, 0=control
# * milkgp     1=breast only, 2=breast+other, 3=other only
# * bf         1=breastfed, 2=not breastfed
# * water      Piped water supply: 1=in house, 2=in plot, 3=none
# * wat2       1=in house/plot 2=none
# * agegp      Age group (months): 1=0-1, 2=2-3, 3=4-5, 4=6-8, 5=9-11
# * agegp2     1=0-2, 2=3-5, 3=6-11
# * milkgp     1=breast only, 2=breast+other, 3=other only
# Check the class of each variable
glimpse(diabraz)
glimpse(diabraz2)
# case need to be numeric
diabraz2$case <- as.numeric(diabraz2$case)

Q1

Use the DIABRAZ.DTA dataset (1 control per case) and the match and mhodds commands to analyse the association between breast feeding (bf) and diarrhoea mortality. Use the mhodds command to estimate the odds ratio, calculate a confidence interval for the OR, and test the null hypothesis of no association. Repeat this analysis using the clogit command, and compare your results. [For comparison purposes you may like to repeat these analyses using ordinary (unconditional) logistic regression, which should not give the correct results].

Mantel-Haenszel

Show the code
diabraz |>
  pubh::mhor(case ~ pair / bf)

              OR Lower.CI Upper.CI Pr(>|z|)
(Intercept) 1.11     0.60     2.05    0.740
pair        0.97     0.95     0.99    0.015
pair:bf     1.02     1.01     1.03    0.005

                          Common OR Lower CI Upper CI Pr(>|z|)
Cochran-Mantel-Haenszel:        4.8        2       12  < 0.001

Test for effect modification (interaction): p =  0.65 
 
Show the code
# Other option
#epiDisplay::mhor(diabraz$case, diabraz$bf, diabraz$pair, graph = F)

Conditional Logistic regression

Show the code
clogit_mod <- clogit(case ~ bf + strata(pair), diabraz)
tbl_regression(clogit_mod, exponentiate = T)
Characteristic OR 95% CI p-value
Milk, 2 groups 4.83 2.01, 11.6 <0.001
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

Unconditional Logistic regression

Show the code
unclogit_mod <- glm(case ~ bf, diabraz, family = binomial)
tbl_regression(unclogit_mod, exponentiate = T)
Characteristic OR 95% CI p-value
Milk, 2 groups 3.00 1.62, 5.63 <0.001
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

Q2

Were children with a piped water supply to the house at lower risk than those with a supply to the plot?

Show the code
mod1<- clogit(case ~ water + agegp2 + strata(set), data = diabraz2)
tbl_regression(mod1, exponentiate = T)
Characteristic OR 95% CI p-value
Water supply, 3 groups 2.17 1.53, 3.07 <0.001
Age, 3 groups 1.23 0.99, 1.53 0.062
Abbreviations: CI = Confidence Interval, OR = Odds Ratio
Show the code
mod2 <- clogit(case ~ water * agegp2 + strata(set), data = diabraz2)
tbl_regression(mod2, exponentiate = T)
Characteristic OR 95% CI p-value
Water supply, 3 groups 1.71 0.89, 3.32 0.11
Age, 3 groups 0.99 0.56, 1.75 >0.9
Water supply, 3 groups * Age, 3 groups 1.12 0.85, 1.47 0.4
Abbreviations: CI = Confidence Interval, OR = Odds Ratio
Show the code
lmtest::lrtest(mod1, mod2)
#Df LogLik Df Chisq Pr(>Chisq)
2 -175 NA NA NA
3 -174 1 0.66 0.42

Fit the main effects, water and agegp2, and then compare with the model which includes the interaction term, water*agegp2. This reveals no evidence of interaction ( \(\chi_{10}^2\)= 1.71, P=0.79). agegp2 has been used above, giving 4 interaction terms. With age grouped in 5 groups (agegp) there would be 8 interaction terms. It is best to avoid fitting interactions involving a large number of parameters, since such tests for interaction generally have low power

Q3

Using the full dataset (DIABRAZ2.DTA), use the variable milkgp to examine the effects of infant feeding practices on the risk of death from diarrhoea. You will need to make sure that you take appropriate account of age in your analysis, since age is likely to be a very strong confounder, and you may also need to consider the possible confounding effects of other risk factors. Draw up a table to show the effect of infant feeding practices.

Show the code
# crude
mod1 <- clogit(case ~ milkgp + strata(set), data = diabraz2) 

# adjusted for sex
mod2 <- clogit(case ~ milkgp + sex + strata(set), data = diabraz2) 

# adjusted for age
mod3 <- clogit(case ~ milkgp + agegp + strata(set), data = diabraz2) 

# adjusted for age by month
mod4 <- clogit(case ~ milkgp + as.factor(age) + strata(set), data = diabraz2) 

# adjusted for age and mothers education
mod5 <- clogit(case ~ milkgp + as.factor(age) + meduc + strata(set), data = diabraz2) 

tbl1 <- tbl_regression(mod1, exponentiate = T)
tbl2 <- tbl_regression(mod2, exponentiate = T)
tbl3 <- tbl_regression(mod3, exponentiate = T)
tbl4 <- tbl_regression(mod4, exponentiate = T)
tbl5 <- tbl_regression(mod5, exponentiate = T)
tbl_merge(list(tbl1,
               tbl2,
               tbl3,
               tbl4,
               tbl5),
          tab_spanner = c("**Crude**", "**Adjusted Sex**",
                          "**Adjusted agegroup**","**Adjusted age-month**",
                          "**Adjusted age-month/mother education**")) |> 
    modify_column_hide(columns = c(p.value_1, p.value_2,
                                   p.value_3, p.value_4,
                                   p.value_5))# remove p values (tidier)
Characteristic
Crude
Adjusted Sex
Adjusted agegroup
Adjusted age-month
Adjusted age-month/mother education
OR 95% CI OR 95% CI OR 95% CI OR 95% CI OR 95% CI
Milk, 3 groups 2.51 1.90, 3.30 2.52 1.91, 3.32 3.74 2.65, 5.28 3.64 2.51, 5.28 4.03 2.73, 5.95
Sex

0.78 0.52, 1.16





Age, 5 groups



0.66 0.54, 0.79



as.factor(age)









    0





    1





1.00 0.23, 4.36 0.83 0.19, 3.70
    2





0.84 0.20, 3.50 0.75 0.18, 3.19
    3





3.08 0.74, 12.8 2.56 0.60, 10.8
    4





1.18 0.28, 5.06 0.99 0.23, 4.32
    5





0.79 0.17, 3.70 0.61 0.12, 2.99
    6





1.13 0.24, 5.34 1.02 0.21, 4.88
    7





0.73 0.16, 3.39 0.57 0.12, 2.78
    8





0.45 0.10, 2.14 0.38 0.08, 1.86
    9





0.06 0.01, 0.47 0.05 0.01, 0.42
    10





0.29 0.05, 1.73 0.23 0.04, 1.39
    11





0.22 0.04, 1.35 0.18 0.03, 1.10
Maternal education







0.68 0.52, 0.89
Abbreviations: CI = Confidence Interval, OR = Odds Ratio