16: Additive & multiplicative models

Author

Thiago Cerqueira Silva

Q1

Paperwork question

Some notes: Additive scale in this practical can be seem as “risk difference”, while multiplicative as “risk ratio”. Don’t get confused with additive from “generalized additive models”

Q2

This question uses data from a retrospective cohort study conducted in southern Africa investigating the risk factors for tuberculosis (TB) among male gold miners (contained in goldmine.dta). We want to examine the joint effect of HIV status (hiv coded 0 for negative and 1 for positive) and silicosis (silic coded 0 for none, 1 for possible, 2 for early, and 3 for advanced) on the incidence of TB (tb coded 0 for negative and 1 for positive). Follow-up is assessed using entry and exit.

Show the code
library(haven) 
# create table
library(gtsummary)
# data wrangling and other functions
library(tidyverse)
# if you want to have a output similar to stata
library(tidylog)

# Limit significant digits to 3, remove scientific notation (most of the time, scipen=999 removes all time)
options(digits = 4, scipen = 9)
Show the code
gold <- read_dta("datasets/GOLDMINE.DTA")|> 
  mutate(across(where(is.labelled),as_factor))

**Outcome*: - tb (tubercolosis diagnosis, 0/1)

**Exposures*: - hiv (HIV status at entry: 0/1) - silic (silicosis grad: 0 = “none”, 1 = “possible”, 2 = “early”, 3 = “advanced”)

**Time*: -entry (date of entry, formatted as number of days) -exit (date of exit, formatted as number of days) -dob (date of birth, formatted as number of days)

Create follow-up time (years) and drop missing data in silicosis

Show the code
gold <- gold |> 
  mutate(silic = as.factor(silic)) |> 
  mutate(fup_y = (exit-entry)/365.25)

gold_2 <- gold |> 
  drop_na(silic)
  1. Use the strate and streg (or glm) commands to examine the crude effect of HIV status on TB.
Show the code
mod1 <- glm(tb ~ hiv + offset(log(fup_y)) , data=gold_2, family=poisson)
tbl1 <- tbl_regression(mod1, exponentiate=T)
mod2 <- glm(tb ~ silic+ offset(log(fup_y)) , data=gold_2, family=poisson)
tbl2 <- tbl_regression(mod2, exponentiate=T)
tbl_merge(list(tbl1,tbl2),
          tab_spanner = c("**HIV**","**Silicosis**"))
Characteristic
HIV
Silicosis
IRR 95% CI p-value IRR 95% CI p-value
HIV status at entry 4.59 3.48, 6.10 <0.001


silic





    0



    1


1.83 1.29, 2.54 <0.001
    2


3.53 2.13, 5.53 <0.001
    3


3.58 2.31, 5.35 <0.001
Abbreviations: CI = Confidence Interval, IRR = Incidence Rate Ratio
  1. Are there any biological reasons why you might expect the effects of HIV status and silicosis on TB incidence to combine multiplicatively or additively?

One might expect the effects of HIV and silicosis on TB to combine additively because they have two separate pathways leading to active tuberculosis: HIV tends to cause TB through immunosuppression, whereas silicosis causes TB through lung impairment caused by prolonged silica dust exposure.

  1. Examine the joint effects of HIV status and silicosis on TB in a multiplicative Poisson model. Is there evidence of an interaction on the multiplicative scale?
Show the code
# gold_2_agg <- gold_2 |> 
#   group_by(hiv,silic) |> 
#   summarise(n_tb = sum(tb),
#       pyears = sum(fup_y))
# 
# mod3 <- glm(n_tb ~hiv+ silic +offset(log(pyears)), data=gold_2_agg, family=poisson)
# mod31 <- glm(n_tb ~hiv* silic +offset(log(pyears)), data=gold_2_agg, family=poisson)

mod3 <- glm(tb ~hiv+ silic+ offset(log(fup_y)) , data=gold_2, family=poisson)
mod31 <- glm(tb ~hiv*silic+ offset(log(fup_y)) , data=gold_2, family=poisson)

tbl1 <- tbl_regression(mod3, exponentiate=T)
tbl2 <- tbl_regression(mod31, exponentiate=T)
tbl_merge(list(tbl1,tbl2),
          tab_spanner = c("No interaction","Interaction"))
Characteristic
No interaction
Interaction
IRR 95% CI p-value IRR 95% CI p-value
HIV status at entry 4.82 3.65, 6.41 <0.001 4.85 3.33, 7.16 <0.001
silic





    0

    1 1.98 1.40, 2.76 <0.001 1.84 1.02, 3.19 0.035
    2 3.57 2.16, 5.60 <0.001 2.58 0.89, 5.94 0.046
    3 4.12 2.65, 6.15 <0.001 5.34 2.86, 9.46 <0.001
HIV status at entry * silic





    HIV status at entry * 1


1.12 0.56, 2.31 0.7
    HIV status at entry * 2


1.59 0.57, 5.18 0.4
    HIV status at entry * 3


0.60 0.25, 1.40 0.2
Abbreviations: CI = Confidence Interval, IRR = Incidence Rate Ratio
Show the code
anova(mod3,mod31)
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
3943 1423 NA NA NA
3940 1420 3 2.786 0.4259

The LRT for interaction gives a p-value of 0.43, hence there is no evidence of an interaction on the multiplicative scale. The adjusted RRs for HIV status and silicosis are given in the model above; both effects are strong after adjusting for one another. We have no evidence to suggest that these effects do not combine multiplicatively, so there is an estimated 20-fold increase in the rate of developing TB among those HIV-positive who have advanced silicosis compared to those HIV-negative without silicosis (RR=4.82x4.12=19.86).

The code in Stata first aggregate the dataset, it is not strictly necessary.

Show the code
cat("Expected/Observed number of TB cases")
Expected/Observed number of TB cases
Show the code
gold_2 |>
  mutate(
    predicted_cases = predict(mod3, type = "response")
  ) |>
  group_by(hiv, silic) |>
  summarise(
    observed = sum(tb),
    predicted = sum(predicted_cases)
  )
hiv silic observed predicted
0 0 40 40.127
0 1 17 18.329
0 2 5 6.956
0 3 15 11.589
1 0 78 77.873
1 1 30 28.671
1 2 15 13.044
1 3 12 15.411

For all covariates, the observed number of TB events is similar to the expected number under the multiplicative model.

  1. Examine the joint effects of HIV infection and silicosis on TB in an additive rate model. Is there evidence of an interaction on the additive scale?

The answer to part (iv) suggests that the multiplicative model is appropriate and we could stop here. There is no evidence of an interaction on the multiplicative scale and for two out of the three categories of silicosis, the interaction was positive rather than negative. Hence, there is no statistical evidence to suggest additive effects. Let’s try fitting the additive model anyway, on the grounds that it is biologically plausible.

Show the code
# Create rate variable
gold_2 <- gold_2 |> 
  mutate(rate = tb/fup_y)

# use the poisson distribution with identity link
mod4 <- glm(rate ~hiv+ silic , data=gold_2, family=poisson(link="identity"),
            weights = fup_y)
mod41 <- glm(rate ~hiv*silic , data=gold_2, family=poisson(link="identity"),
            weights = fup_y)

anova(mod4,mod41)
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
3943 1437 NA NA NA
3940 1420 3 16.82 0.0008

The LRT is given by 16.82 [2*(-955.979- -964.390)] which is a chi-square with 3 df, giving p=0.0008 and so there is strong evidence of interaction on the additive scale. Note that all three interaction terms are positive and the “main effects” are also positive (so all RRs>1), suggesting that the effects are more than additive (likely multiplicative).

Show the code
# tbl_regression(mod3, exponentiate=T)
# tbl_regression(mod3, exponentiate=T)

gold_2 |> 
  mutate(
    predicted_cases = predict(mod4,type = "response")
  ) |> group_by(hiv, silic) |> 
  summarise(observed = sum(tb),
            predicted= sum(predicted_cases*fup_y))
hiv silic observed predicted
0 0 40 37.645
0 1 17 19.847
0 2 5 8.870
0 3 15 17.008
1 0 78 92.356
1 1 30 20.802
1 2 15 7.070
1 3 12 8.402

For some covariates, the observed number of TB events differs markedly from the expected number under the additive model. In summary, the strong evidence for an interaction (p=0.0008) and the discrepancies between the observed and expected values for the additive model suggest that the effects of HIV and silicosis do not combine additively.