10: Analysis of correlated outcome data

Author

Thiago Cerqueira Silva

Show the code

library(haven) 
# create table
library(gtsummary)
# data wrangling and other functions
library(tidyverse)
# if you want to have a output similar to stata
library(tidylog)
#cox
library(survival)
# plots survival
library(ggsurvfit)
# Limit significant digits to 2, remove scientific notation
options(digits = 2, scipen = 9)

papua <- read_dta("datasets/pngnew.dta") |> 
  mutate(across(where(is.labelled),as_factor))

In the first part of the practical we examine data from a pneumococcal vaccine trial performed in Papua New Guinea (pngnew.dta). The data are arranged by episode, so that each child in the trial may have several records, one for each episode of clinical pneumonia (without laboratory confirmation of pneumococcal infection) and, usually, one for the period from their last episode until they exited the trial. Thus, for each child we have data on the number of episodes they experienced (the numerator) and the length of time for which they were followed up (the denominator). We can thus calculate rates and rate ratios and an appropriate statistical model is a Poisson regression model with a random effect to account for between child variability in susceptibility/exposure.

Note

These data are from a pneumococcal vaccine trial performed in Papua New Guinea, assessing the vaccine efficacy in preventing clinical episodes of pneumonia among children.

Each child might have more than one record, because each record represents an episode of pneumonia (or the last period of follow-up, without pneumonia).

*Outcome* : any indicates whether the current period of observation ended in an episode or not

*Exposure* : vacc (vaccination: 1 = placebo, 2 = vaccine)

*Cluster* : id (child)

*Time* : -timein (date of entry in this follow-up period) -timeout (date of exit from this follow-up period) -dob (date of birth)

*Other* - sex (1 = male, 2 = female) - anyprev (0: no previous episodes of pneumonia, 1: any prev. episodes)

Q1

Each child had a unique identification number (id) which identifies which records belong to which child. Examine the data for child id 2921:

Show the code

papua <- papua |> 
  mutate(sex = factor(sex, levels = c(1, 2), labels = c("male", "female")),
                 vacc = factor(vacc, levels = c(1, 2), labels = c("placebo", "vaccine")),
                 pyrs = as.numeric(timeout - timein) / 365.25)


papua |> 
  filter(id==2921)

# A tibble: 5 × 10
     id sex   vacc      any anyprev timein     timeout    dob        datevac   
  <dbl> <fct> <fct>   <dbl>   <dbl> <date>     <date>     <date>     <date>    
1  2921 male  placebo     1       0 1981-10-26 1982-04-07 1976-12-08 1981-10-26
2  2921 male  placebo     1       1 1982-04-07 1982-10-18 1976-12-08 1981-10-26
3  2921 male  placebo     1       2 1982-10-18 1983-05-04 1976-12-08 1981-10-26
4  2921 male  placebo     1       3 1983-05-04 1983-05-25 1976-12-08 1981-10-26
5  2921 male  placebo     0       4 1983-05-25 1983-11-30 1976-12-08 1981-10-26
# ℹ 1 more variable: pyrs <dbl>

Q2

Individual children can have multiple records in the database, depending on how many episodes of clinical pneumonia they experienced. Use the following commands to examine the number of children in each vaccine group and the number of episodes per child

Show the code

papua |> group_by(id) |> 
  summarise(episodes = sum(any),
            vacc = max(as.numeric(vacc))) |> 
  mutate(episodes = as.character(episodes)) |> 
  tbl_summary(include = episodes, by = vacc)

Characteristic	1 N = 719¹	2 N = 671¹
episodes
0	225 (31%)	242 (36%)
1	183 (25%)	155 (23%)
10	0 (0%)	1 (0.1%)
11	2 (0.3%)	0 (0%)
2	131 (18%)	114 (17%)
3	73 (10%)	72 (11%)
4	53 (7.4%)	46 (6.9%)
5	24 (3.3%)	23 (3.4%)
6	14 (1.9%)	6 (0.9%)
7	10 (1.4%)	5 (0.7%)
8	3 (0.4%)	4 (0.6%)
9	1 (0.1%)	3 (0.4%)
¹ n (%)

Coding explanation

Compared to STATA, you dont need to worry about the dataset being modified unintentionally (R>Stata). The coding is grouping the individuals by id and doing operations in each small dataset (all rows from each id). sum(any) sums all values in the “any” variable, max(as.numeric(vacc)) is a workaround of recovering the group, as each participant is either vaccinated or unvaccinated. This approach wouldn’t work if the participant could be vaccinated after enter the study

Determine how many episodes there were in total in each vaccine group

Show the code

papua |> group_by(vacc) |> summarise(episodes = sum(any))

# A tibble: 2 × 2
  vacc    episodes
  <fct>      <dbl>
1 placebo     1205
2 vaccine     1038

Q3

Things you dont need to do in R.

Q4

Examine the incidence rates in the vaccinated and unvaccinated groups, and to calculate a rate ratio, ignoring any within-child clustering.

Show the code

pyears(Surv(pyrs,any) ~ vacc, data = papua, scale = 1) %>%
  summary(n = F, rate = T, ci.r = T, scale = 100)

Call: pyears(formula = Surv(pyrs, any) ~ vacc, data = papua, scale = 1)

number of observations = 3627

  vacc     Events   Time   Event rate   CI (rate)  
--------- -------- ------ ------------ ----------- 
 placebo    1205    1217       99       94 - 105  
 vaccine    1038    1160       90       84 -  95

Rate ratio (using Poisson and not MH - Stata solution)

Show the code

dt_pois <- (pyears(Surv(pyrs,any) ~ vacc, data = papua, scale = 1,
       data.frame = T))$data 
  glm(event~vacc + offset(log(pyears)), family=poisson,
      data=dt_pois) |> 
    tbl_regression(exponentiate = T)

Characteristic	IRR	95% CI	p-value
vacc
placebo	—	—
vaccine	0.90	0.83, 0.98	0.017
Abbreviations: CI = Confidence Interval, IRR = Incidence Rate Ratio

Show the code

    glm(event~vacc + offset(log(pyears)), family=poisson,
      data=dt_pois) |> 
    tbl_regression(exponentiate = F) |> 
    modify_column_unhide(column = std.error) #trick to show SE

Characteristic	log(IRR)	SE	95% CI	p-value
vacc
placebo	—	—	—
vaccine	-0.10	0.042	-0.18, -0.02	0.017
Abbreviations: CI = Confidence Interval, IRR = Incidence Rate Ratio, SE = Standard Error

Q5

Refit the Poisson regression model, this time asking Stata to compute robust standard errors adjusted for clustering. What impact does taking account of the clustering have on your interpretation of the vaccine’s effect?

We have multiple options to calculate cluster robust SE in R.

Show the code

# Using the miceadds package (not my preferred option)
# pois_rob <- miceadds::glm.cluster(papua,
#                         any ~ vacc + offset(log(pyrs)),
#                         cluster = "id",
#                         family = "poisson")
# summary(pois_rob)

Show the code

# using sandwich and lmtest
library(sandwich)
library(lmtest)
pois_no_cluster <- glm(any ~ vacc + offset(log(pyrs)),
                family = "poisson",
                data = papua)
coeftest(pois_no_cluster, vcov = vcovCL, cluster = ~ id)


z test of coefficients:

            Estimate Std. Error z value Pr(>|z|)  
(Intercept) -0.00973    0.04172   -0.23    0.816  
vaccvaccine -0.10117    0.06087   -1.66    0.097 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Show the code

# using parameters::model_parameters (requires sandwich package)
parameters::model_parameters(
  pois_no_cluster,
  vcov = "vcovCL",
  vcov_args = list(cluster=papua$id),
  exponentiate=T
)

Parameter      |  IRR |   SE |       95% CI |     z |     p
-----------------------------------------------------------
(Intercept)    | 0.99 | 0.04 | [0.91, 1.07] | -0.23 | 0.816
vacc [vaccine] | 0.90 | 0.06 | [0.80, 1.02] | -1.66 | 0.097

Answer

The SE increased from 0.042 to 0.061 (log IRR)

Q6

Now fit a random effects model to take account of within child clustering.

Show the code

mod_gamma <- coxph(Surv(pyrs, any) ~ vacc + frailty(id, 
    distribution = "gamma"), data = papua)
tbl_regression(mod_gamma,
               exponentiate = T)

Characteristic	HR	95% CI	p-value
vacc
placebo	—	—
vaccine	0.88	0.77, 1.00	0.046
Abbreviations: CI = Confidence Interval, HR = Hazard Ratio

Coding explanation

We are using a Cox regression with a fraitly term. In the stata exercise, it has been calculated using a parametric model

Q7

Show the code

library(lme4)
# Fit model
pois_re <- glmer(any ~ vacc + offset(log(pyrs)) + (1|id),
                 data = papua,
                 family = "poisson")

summary(pois_re)

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: poisson  ( log )
Formula: any ~ vacc + offset(log(pyrs)) + (1 | id)
   Data: papua

     AIC      BIC   logLik deviance df.resid 
    9758     9777    -4876     9752     3624 

Scaled residuals: 
   Min     1Q Median     3Q    Max 
-1.554 -0.654  0.095  1.021 17.376 

Random effects:
 Groups Name        Variance Std.Dev.
 id     (Intercept) 0.673    0.821   
Number of obs: 3627, groups:  id, 1390

Fixed effects:
            Estimate Std. Error z value    Pr(>|z|)    
(Intercept)  -0.2704     0.0477   -5.67 0.000000014 ***
vaccvaccine  -0.1213     0.0648   -1.87       0.061 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr)
vaccvaccine -0.632

Show the code

# Output
broom.mixed::tidy(pois_re,
     conf.int = TRUE,
     exponentiate = TRUE,
     effects = "fixed")

# A tibble: 2 × 8
  effect term        estimate std.error statistic     p.value conf.low conf.high
  <chr>  <chr>          <dbl>     <dbl>     <dbl>       <dbl>    <dbl>     <dbl>
1 fixed  (Intercept)    0.763    0.0364     -5.67     1.43e-8    0.695     0.838
2 fixed  vaccvaccine    0.886    0.0574     -1.87     6.10e-2    0.780     1.01

Q8

Fit the model using age, instead time since study entry

Show the code

mod_gamma <- coxph(Surv(time = as.numeric(timein),
                        time2 = as.numeric(timeout),
                        origin = as.numeric(dob),
                        any) ~ vacc + frailty(id, 
    distribution = "gamma"), data = papua)
tbl_regression(mod_gamma,
               exponentiate = T)

Characteristic	HR	95% CI	p-value
vacc
placebo	—	—
vaccine	0.91	0.83, 1.01	0.078
Abbreviations: CI = Confidence Interval, HR = Hazard Ratio

Q9

Refit the random effects model also adjusting for age and sex.

Show the code

papua <- papua |> 
  mutate(age_years = as.numeric((timein - dob)/365.25))
pois_re <- glmer(any ~ vacc + offset(log(pyrs)) + 
                   age_years+
                   sex+
                   (1|id),
                 data = papua,
                 family = "poisson")

papua<- papua |> 
  mutate(agegrp = cut(age_years,
                              breaks = c(0, 1, 2, 3, 4,+Inf),
                              labels = c("0-1yr,", "1-2yr", "2-3yr", "3-4yr", ">=4yr")))
# Output
tbl_cont <- pois_re |> 
  tbl_regression(exponentiate=T)

pois_re_grp <- glmer(any ~ vacc + offset(log(pyrs)) + 
                   agegrp+
                     sex+
                   (1|id),
                 data = papua,
                 family = "poisson")

tbl_cat <- pois_re_grp |> 
  tbl_regression(exponentiate=T)

tbl_merge(
  list(tbl_cont, tbl_cat),
  tab_spanner = c("Continuous age","Categorical age")
)

Characteristic	Continuous age			Categorical age
Characteristic	IRR	95% CI	p-value	IRR	95% CI	p-value
vacc
placebo	—	—		—	—
vaccine	0.90	0.81, 1.01	0.073	0.91	0.82, 1.02	0.10
age_years	0.63	0.61, 0.66	<0.001
sex
male	—	—		—	—
female	1.02	0.91, 1.14	0.7	1.02	0.92, 1.14	0.7
agegrp
0-1yr,				—	—
1-2yr				0.67	0.59, 0.76	<0.001
2-3yr				0.41	0.35, 0.47	<0.001
3-4yr				0.29	0.25, 0.34	<0.001
>=4yr				0.16	0.14, 0.19	<0.001
Abbreviations: CI = Confidence Interval, IRR = Incidence Rate Ratio

Answer

Controlling for age and sex results in an adjusted rate ratio of 0.91, compared with an unadjusted rate ratio of 0.88. The evidence for a vaccine effect has weakened a little (P-value is now 0.10). Given that this was a fairly large RCT we would not expect strong confounding by age or sex (this can be checked by cross-tabulating age and sex at enrolment against vaccine group).

Q10

Open the data set (hhtb.dta) and explore the dataset (glimpse) or click on it in the environment tab.

Outcome : mantoux (tuberculin test result: 0 = negative, 1 = positive)
Exposure: cough (duration of cough in index case: 1 = <2 months, 2 = >=2 months)
Cluster: id (household, so = index case)
hiv - hiv (HIV status of index case: 1 = negative, 2 = positive)
age group: agegrp (age of contact, in years)

Show the code

hhtb <- read_dta("datasets/HHTB.DTA") |> 
  mutate(across(where(is.labelled),as_factor))

#glimpse(hhtb)

Q11

Each record represents an individual household contact of a TB case. The variable id indicates which household contacts belong to the same household (and hence are contacts of the same index case). Use the following commands to examine the distribution of contacts per household by HIV status of the index case.

Show the code

hhtb |> 
  group_by(id, hiv) |> 
  count(mantoux) |> 
  filter(mantoux==1) |> 
  mutate(hiv= as.factor(hiv)) |> 
  ggplot()+
  geom_histogram(aes(n, fill=hiv),
                 position = "dodge")+
  scale_x_continuous(breaks = seq(1:10))

Show the code

# same answer from stata
# hhtb |> 
#   group_by(id) |> 
#   count(hiv) |> 
#   filter(!is.na(hiv)) |>
#   ungroup() |> 
#   mutate(n = factor(n, sort(unique(n)), ordered = T)) |> 
#   tbl_summary(include = n, by = hiv)

Answer

The answer in STATA show you the table. I would rather see the distribution with a histogram. Here is showing the distribution of number of cases (mantoux=1) per household and HIV status.

Q12

Perform a similar analysis to examine the distribution of contacts per household by duration of cough of the index case. Why are only 58 households shown?

Show the code

hhtb |> 
  group_by(id) |> 
  count(cough) |> 
  filter(!is.na(cough)) |>
  ungroup() |> 
  mutate(n = factor(n, sort(unique(n)), ordered = T)) |> 
  tbl_summary(include = n, by = cough)

Characteristic	1 N = 32¹	2 N = 26¹
n
1	5 (16%)	2 (7.7%)
2	3 (9.4%)	5 (19%)
3	5 (16%)	7 (27%)
4	5 (16%)	4 (15%)
5	4 (13%)	2 (7.7%)
6	2 (6.3%)	4 (15%)
8	3 (9.4%)	1 (3.8%)
9	2 (6.3%)	1 (3.8%)
10	1 (3.1%)	0 (0%)
11	1 (3.1%)	0 (0%)
13	1 (3.1%)	0 (0%)
¹ n (%)

Q13

For household contacts, examine the distribution of tuberculin positivity by the duration of cough in the index case. Ignoring any clustering, what is the estimated odds ratio and associated 95% c.i.? What would you conclude about the association between duration of cough in the index case and tuberculin positivity in household contacts on the basis of this analysis (if you were unaware of the clustering issue)?

Show the code

table(hhtb$mantoux,hhtb$cough) |> 
  fisher.test()


    Fisher's Exact Test for Count Data

data:  table(hhtb$mantoux, hhtb$cough)
p-value = 0.04
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 1.0 3.1
sample estimates:
odds ratio 
       1.8

Show the code

tab1 <- table(hhtb$mantoux,
              hhtb$cough)
epiR::epi.2by2(dat = tab1, method = "cohort.count", conf.level = 0.95, units = 100, 
   interpret = FALSE, outcome = "as.columns")

             Outcome +    Outcome -      Total                 Inc risk *
Exposed +           72           33        105     68.57 (58.78 to 77.28)
Exposed -           82           67        149     55.03 (46.68 to 63.18)
Total              154          100        254     60.63 (54.33 to 66.68)

Point estimates and 95% CIs:
-------------------------------------------------------------------
Inc risk ratio                                 1.25 (1.03, 1.51)
Inc odds ratio                                 1.78 (1.06, 3.01)
Attrib risk in the exposed *                   13.54 (1.59, 25.48)
Attrib fraction in the exposed (%)            19.74 (2.51, 33.93)
Attrib risk in the population *                5.60 (-4.40, 15.59)
Attrib fraction in the population (%)         9.23 (0.49, 17.20)
-------------------------------------------------------------------
Uncorrected chi2 test that OR = 1: chi2(1) = 4.729 Pr>chi2 = 0.030
Fisher exact test that OR = 1: Pr>chi2 = 0.037
 Wald confidence limits
 CI: confidence interval
 * Outcomes per 100 population units

Show the code

mod <- glm(mantoux~ cough, family=binomial, data=hhtb)
parameters::model_parameters(mod, exponentiate=T)

Parameter   | Odds Ratio |   SE |       95% CI |     z |     p
--------------------------------------------------------------
(Intercept) |       0.64 | 0.25 | [0.30, 1.36] | -1.16 | 0.247
cough       |       1.78 | 0.48 | [1.06, 3.03] |  2.16 | 0.030

Coding explanation

We can get odds ratios through multiple ways. None of them matched with STATA solution (Why use MH when there is no stratification?!)

Q14

Using the logit command, obtain a 95% confidence interval for the odds ratio based on the robust standard error taking account of clustering. Are your conclusions about the association between duration of cough in the index case and tuberculin positivity in household contacts different from question 13 above?

Show the code

parameters::model_parameters(
  mod,
  vcov = "vcovCL",
  vcov_args = list(cluster=hhtb$id),
  exponentiate=T
)

Parameter   | Odds Ratio |   SE |       95% CI |     z |     p
--------------------------------------------------------------
(Intercept) |       0.64 | 0.32 | [0.24, 1.73] | -0.88 | 0.377
cough       |       1.78 | 0.62 | [0.90, 3.52] |  1.67 | 0.096

Answer

The confidence interval has widened to (0.90, 3.52) and we should be (even) more sceptical than before about the strength of evidence for an association (P having increased from 0.03 to 0.10). After taking clustering into account, there is only weak evidence against the null hypothesis.

Q15

Now estimate the odds ratio and obtain confidence intervals based on the GEE approach with robust standard errors and assuming an exchangeable correlation matrix. How do your results change from those in question 14?

Show the code

library(geepack)
# observations from the same id should be contiguous
dep_gee <- geeglm(mantoux ~ cough,
               data = drop_na(hhtb, cough),  #geeglm doesnt accept missing values
               id = id, 
               family = "binomial",
               corstr = "exchangeable")
parameters::model_parameters(dep_gee, exponentiate=T)

Parameter   | Odds Ratio |   SE |       95% CI | Chi2(1) |     p
----------------------------------------------------------------
(Intercept) |       0.58 | 0.30 | [0.21, 1.60] |    1.10 | 0.294
cough       |       1.88 | 0.65 | [0.95, 3.71] |    3.27 | 0.071

Answer

The point estimate of the odds ratio for duration of cough is 1.87 with 95% c.i. (0.97, 3.60), P=0.07. The odds ratio is slightly larger than that obtained using only robust standard errors (1.78). The standard error of the log odds ratio is very similar to that obtained from the previous “robust” analysis (0.35 ~ 0.35).

The PI and CI in Stata is different: 1.88 with 95% c.i. (0.94, 3.73) (I dont know why)

Q16

Use the xtlogit command again, but this time with the re option, to fit a random effects model for tuberculin positivity in terms of the duration of cough in the index case. Is there evidence from this model of between-household variation (within-household correlation)? Use the quadchk,nooutput command to check the reliability of the estimates from the random effects model.

Show the code

tb_mixed <-  glmer(mantoux ~ cough + (1|id),
                 data = hhtb,
                 family = "binomial")

tbl_laplace <- tb_mixed |> 
  tbl_regression(exponentiate=T)


# You can get using adptative Gaussian quadrature using this
tb_mixed <-  glmer(mantoux ~ cough + (1|id),
                 data = hhtb,
                 family = "binomial",
                 nAGQ = 12)
tbl_quadrad <- tb_mixed |> 
  tbl_regression(exponentiate=T)

tbl_merge(
  list(tbl_laplace,
       tbl_quadrad),
  tab_spanner = c("Laplace","Adaptative")
)

Characteristic	Laplace			Adaptative
Characteristic	OR	95% CI	p-value	OR	95% CI	p-value
Duration of cough	2.15	0.96, 4.80	0.061	2.19	0.94, 5.08	0.069
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

Show the code

#m2 <- GLMMadaptive::mixed_model(fixed = mantoux ~ cough, random = ~ 1 | id, data = hhtb,
 #                  family = binomial())

Answer

The estimated odds ratio is 2.19 (95% c.i. 0.94, 5.08; P=0.07). The estimate of the odds ratio is larger than that obtained from the GEE analysis (1.88).

Ps.:The default of glmer function uses a Laplace approximation and not the adaptative Gaussian quadrature

Coding explanation

I dont think there is a similar function as “quadchk” from Stata in R. You need to refit the model with different quadrature points (nAGQ)

Q17

Use a random effects model to investigate the association between duration of cough in the index case and odds of tuberculin positivity, controlling for the effect of index cases’ HIV status and the age of the household contact, taking account of within-household clustering. Write down some sentences to summarise and interpret your results.

Show the code

hhtb$agegrp <- factor(hhtb$agegrp)
tb_mixed <-  glmer(mantoux ~ cough + hiv +agegrp+ (1|id),
                 data = hhtb,
                 family = "binomial",
                 nAGQ = 12)
tb_mixed |> 
  tbl_regression(exponentiate=T)

Characteristic	OR	95% CI	p-value
Duration of cough	1.85	0.79, 4.31	0.2
HIV status of index case	0.29	0.12, 0.70	0.006
agegrp
1	—	—
2	1.45	0.69, 3.02	0.3
3	4.57	2.11, 9.89	<0.001
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

Answer

In this model we have controlled for both a cluster (household) level (HIV status of the index case) and an individual level covariate (age of the contact). “After controlling for the HIV status of the index case and the age of the contact, the OR for the association between duration of cough of 2+ months and mantoux positivity is 1.85. Compared to the OR from the unadjusted random effects model (2.19) the OR is somewhat reduced suggesting some counfounding by one or both of these variables.”

Q18

Use GEE to investigate the association between duration of cough in the index case and odds of tuberculin positivity, controlling for the effect of index cases’ HIV status and the age of the household contact, taking account of within-household clustering. What do you notice about the ORs compared to those obtained with the random effects model.

Show the code

logit_gee <- geepack::geeglm(mantoux ~ cough + hiv + agegrp,
                          data = drop_na(hhtb, cough),
                          id = id,
                          family = "binomial",
                          corstr = "exchangeable")
logit_gee_noage <- geepack::geeglm(mantoux ~ cough + hiv,
                          data = drop_na(hhtb, cough),
                          id = id,
                          family = "binomial",
                          corstr = "exchangeable")

logit_gee |> 
  tbl_regression(exponentiate=T)

Characteristic	OR	95% CI	p-value
Duration of cough	1.64	0.82, 3.30	0.2
HIV status of index case	0.36	0.18, 0.74	0.005
agegrp
1	—	—
2	1.37	0.68, 2.76	0.4
3	3.58	2.26, 5.67	<0.001
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

Answer

All of the ORs are closer to 1 than those when using random effects, as expected with a population average model.

Show the code

anova(logit_gee, logit_gee_noage)

Analysis of 'Wald statistic' Table

Model 1 mantoux ~ cough + hiv + agegrp 
Model 2 mantoux ~ cough + hiv
  Df X2  P(>|Chi|)    
1  2 31 0.00000019 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Answer

The null hypothesis is that after taking account of the index case’s HIV status and cough duration there is no association between the age of the household contact and their Mantoux test status. The alternative hypothesis is that after taking account of the index case’s HIV status and cough duration there is an association between the age of the household contact and their Mantoux test status. There is strong evidence against the null. Note that the test has 2 df because we are testing simultaneously that 2 ORs are both equal to 1.