Considerable progress in human immunodeficiency virus/acquired immunodeficiency syndrome (HIV/AIDS) prevention and treatment in recent years has allowed new HIV infections to fall by 39% and HIV-related deaths to fall by 51% between 2000 and 2019.1 Nevertheless, it continues to persist as a major global public health issue, with 38 million people living with HIV at the end of 2019.1 However, not all regions in the world are affected equally by HIV. East and southern Africa are home to about 6.2% of the world’s population – but account for over half (54%) of the total number of people living with HIV (20.6 million).2 Incidence rates also continue to be high, with 1.1 million people newly infected with HIV in 2018.3 In terms of the UNAIDS 90-90-90 goals, 81% of people living in the World Health Organization (WHO) African Region knew their HIV status, 64% of people living with HIV had access to antiretroviral therapy (ART), and 52% of people on ART had a suppressed viral load.3

Of interest within the African continent is its younger (aged 15-29) demographic. As of 2019, sub-Saharan Africa accounted for the third highest youth population (211 million) and is projected to have the largest increase (+89%) by 2050.4 This youth bulge has been attributed to improvements in nutrition and health services, which in turn have led to decreases in child mortality – thereby increasing childhood survival rates, creating consistently high fertility rates and allowing more children to survive into adulthood.5 Within this younger population, adolescent girls and younger women aged 15 to 29 years have been identified as being at a heightened risk of contracting HIV.6–9 Estimates from 2015 indicate that all younger people account for 34% of all new HIV infections, with adolescent girls and younger women comprising most of those new infections.10 Sub-Saharan Africa has an especially high burden with regards to this: 80% of all younger women with HIV infection reside there.11

The gender gap in HIV infection and the disproportionate risk towards younger women has been evident for a long time – in 2001 epidemiologist Marie Laga and others authored a commentary titled “To stem HIV in Africa, prevent transmission to younger women”.12 Women under 25 in Africa continue to be at a significantly higher risk of contracting HIV, being two to four times more likely than their male counterparts to be living with HIV.13 While HIV prevalence in the sub-Saharan African general population has decreased over the years, 1 in 5 new HIV infections occurred among adolescent girls and younger women despite only accounting for 10% of the population.14

The reasons for younger women being disproportionately affected has been studied extensively, and authors conclude the gap is created by traditional gender norms, differential use of condoms, number of sexual partners, transactional sex,15,16 intimate partner violence,17,18 and age disparate relationships.19–21 These factors all seem to interrelate with one another, ultimately putting younger women at a heightened risk of contracting HIV.

There remains an absence in the literature of studies systematically examining how risky sexual behaviors contribute to and exacerbate the gender gap. This study uses survey data from three sub-Saharan African countries, Malawi, Tanzania and Zambia, to examine how risk behaviors contribute to the gender gap. We aim to contribute to the understanding of how risky sexual behaviors, namely having multiple partners and being in age-disparate relationships, contribute to the persisting gender gap in HIV prevalence across sub-Saharan Africa.


Our main research question relates to the role of risky sexual behavior in the HIV contagion for men and women in the study countries. Namely, does risky sexual behavior affect the probability of contracting HIV? How do different patterns of sexual behavior among men and women explain the HIV gender gap - is it explained by different distributions or different risks? Our logit model hypothesizes that the probability of HIV infection is higher with age; for women, and individuals in urban environments, with higher levels of education, who have multiple partners, who have an older partner, and who have lower wealth levels.

The primary data for our study is drawn from the 2015-2017 Population-based HIV Impact Assessment (PHIA) survey22 for Malawi, Tanzania and Zambia. The PHIA project consists of HIV-focused household surveys that are nationally representative. It uses a two-stage cluster sampling design of adults and adolescents aged 15 years and older (with some coverage of children aged 0-14 years) to assess the present state of the HIV epidemic in countries affected by it the most.

Our study focused specifically on younger adults to examine the relationship between HIV prevalence and sexual behavior, and the drivers of the HIV gender gap. Multiple partners, used to explore sexual behavior, was defined by whether the participant currently has more than one concurrent partner, namely two or three partners. As there was not a direct question that asked whether the participant had multiple partners, this was determined by identifying younger adults within our sample who had responded to the question regarding their second or third partner’s age. This variable was the closest estimator of a participant currently having multiple partners; other possible proxies either had lower response rates or were only asked of male participants.

The second indicator of sexual behavior we explore is whether or not the participant had an older partner (either primary or non-primary). Age-disparate relationships have been repeatedly shown to be a risk factor for younger women across Africa. This age gap variable, older partner, accounts of participants having a partner at least five years their senior.

In the analysis of sexual behavior and risk we adjust for several factors: age, education levels, wealth and urbanicity. Although education levels were categorized differently across the countries, all countries categorized low-level education as having completed primary education, or not. Household wealth status was measured by being in the lowest 20% quintile, or not (which was calculated from household assets and reported in the survey data). We also controlled for age and urban environment.

We use prevalence rather than incidence (recent infections) to measure HIV risk, as the survey data do not contain sufficient observations of recent infections (last 12 months) to reliably conduct the modeling work. We estimated a logit model for the probability of having HIV, using covariates including multiple partners, having an older partner, urbanicity, education level, age, and wealth:


wherein Y is a binary variable for HIV status, and X is a vector for all the above covariates. We then used the linear decomposition approach first developed by Blinder23 and Oaxaca24 to quantify gender differences in prevalence. The Blinder-Oaxaca (BO) decomposition is a statistical method which explains the difference in means of a dependent variable between two groups by decomposing the gap into that part that is due to differences in means values of the independent variables within the groups, and the group differences in the effects of the independent variables. It has been extensively used in economics and sociology, e.g. for studying the gender wage gap. In line with our model, prevalence modeled separately for both men (A) and women (B):

YA=XAβA+εA and YB=XBβB+εB

Since the expected value of the error term (εi) in a linear regression containing a constant term will be zero, the difference in the mean values of the dependent variable between the two groups can be evaluated as:


This equation measures the gender gap as the difference between men’s and women’s expected prevalence rates, other things the same. The decomposition method segments the gender gap in outcome means (i.e. ΔE(Y)) into three components (1) the difference in the characteristics of each group, i.e. the values of explanatory variables, (2) the gender difference in the riskiness of the risk factors themselves (e.g. the effect of the characteristics or model coefficients of each gender group (age, sexual behaviors, etc.). (3) the residual difference in prevalence between genders accounted for by the interaction between (1) and (2). Rewriting Eq. (3) using these three terms yields:


(gap)=1st component+2nd component+3rd component

Equation (4) is known as the BO threefold decomposition written with respect to group B (women).25 In other words, group B’s mean outcome (level of the dependent variable for women) is viewed as the baseline, and we are imagining what it would take for the women’s mean prevalence to converge to that of Group A (comparable men).

The first term in the decomposition shows the part of the gap related to men-women differences in the explanatory variables or endowments, i.e. it denotes the mean change in the level of prevalence of women if they had the men’s values of the explanatory variables (age, wealth, urbanicity, education, sexual behaviors), while holding the coefficients of riskiness constant. The second term captures the portion of the gap stemming from the difference in the men and women coefficients for the various risk factors when estimated separately for men and women (eq 2), indicating the mean change in women prevalence if they had the riskiness experienced by men (e.g., the coefficients estimated for men), while holding endowments or risk factors constant. The third and final term denotes the residual portion of the total gap that exists due to the interaction of differences in endowments and coefficients between men and women, i.e. the portion of the gap, which remains after controlling for the endowment and coefficient portions.

Jann25 offers an approximate standard error that may be computed for each of these decomposition terms. However, not all studies using this method offer statistical test results, since one cannot equate a decomposition term to zero when it has a low statistical significance, as the summation of all terms would not be equal to the total computed gap. Nevertheless, we provide the statistical significance as additional information in the “Results and Discussion” section. We apply a standard BO decomposition using a linear probability model (LPM), which makes strong, simplifying assumptions about functional form. While other methods, such as sequential counterfactuals26,27 and linearization28 require making fewer simplifying assumptions, they are not markedly sounder than using LPM, which has the advantage of also modeling conditional probabilities. All analyses were conducted using Stata MP in its 16th version (StataCorp LLC, College Station, TX, USA).


Table 1 provides descriptive statistics on the variables of interest. In addition, we provide frequency statistics for gender-specific recent HIV infections by country to illustrate the small sample sizes we would have had for estimating our models if we had used incidence versus prevalence as the dependent variable. Furthermore, the descriptive statistics for men and women are useful to verify the endowments component in the BO decomposition tables below.

Table 1.HIV Descriptive Statistics for 2015-2017 for Malawi, Tanzania, and Zambia
Variables Malawi Tanzania Zambia
Men Women Men Women Men Women
HIV+ status, n (%) 771
Recent HIV infection, n (%) 10
Age, mean (sd) 22.17
Urban environment, n (%) 8,126
Completed primary school, n (%) 3,456
Wealth quintile,
n (% of total individuals in quintile)
Lowest 2,956
Second 3,498
Middle 3,810
Fourth 4,712
Highest 7,950
Multiple partners, n (%) 1,430
Older partner by 5 years, n (%) 111

Note. Authors’ calculations using data from PHIA (2015-2017)22 for Malawi, Tanzania and Zambia.

Table 2 presents odds ratio estimates from the logit models for each of the three countries. Per our expectations, we find that higher age, being female, and living an urban living environment increases the likelihood of contracting HIV, i.e., odds ratio coefficients that are larger than 1. On the other hand, higher educational attainment acts as a protective measure, and decreases the likelihood of having HIV. Results for the wealth variable are inconclusive, with only the Zambia model suggesting a relationship between health and wealth, as higher levels of wealth suggest higher likelihood of having HIV. Results for the sexual behavior independent variables are consistent with our hypotheses, as having an older partner (by at least five years) or having had multiple partners are both associated with an increased likelihood of having HIV. While the riskiness associated with each of these two sexual behaviors is similar in Malawi, having multiple partners is much riskier than having an older partner in both Tanzania and Zambia.

Table 2.Multivariate logistic regression modelling predicting likelihood of HIV+ serostatus
VARIABLES Malawi Tanzania Zambia
Age 1.161*** 1.167*** 1.128***
(0.019) (0.020) (0.018)
Female 2.589*** 3.054*** 3.648***
(0.430) (0.544) (0.636)
Urban 2.052*** 1.619** 1.409*
(0.302) (0.274) (0.223)
Education 0.705** 0.619** 1.004
(0.088) (0.093) (0.133)
Multiple partners 1.603** 2.267*** 2.137***
(0.273) (0.312) (0.373)
Older partner by 5 years 1.665*** 1.358* 1.241
(0.203) (0.185) (0.152)
Wealth quintile 0.952 0.960 1.142*
(0.052) (0.059) (0.074)
Constant 0.001*** 0.000*** 0.001***
(0.000) (0.000) (0.000)
pseudo R2 5,693 9,576 5,350

Note. Coefficients are odds ratios. Standard errors in parentheses. *** P<0.001, ** P<0.01, * P<0.05.

We present the LPM decomposition models individually by country in Tables 3-5. Table 3 presents these results for Malawi. Column 1 represents the overall decomposition results, with men and women having a prevalence rate of 3.1 percent (P<0.001) and 8.8 percent (P<0.001), respectively, thus a gender gap of 5.8 percentage points (almost twice the rate of men). The lower half of the column represents the overall, or net, results, i.e. their contribution to the gap. Overall, only the coefficients part is statistically significant at the 0.05 level, meaning that differences in the riskiness of the risk factors between men and women accounts for about half of the gender gap, i.e. 2.9 percent.

Table 3.Blinder-Oaxaca decomposition for Malawi (N = 5,693)
Men 0.031*** - - -
Women 0.088*** - - -
Difference -0.058*** - - -
Endowments -0.009 - - -
Coefficients -0.029* - - -
Interaction -0.020 - - -
Age - -0.002 -0.123*** 0.001
(0.001) (0.037) (0.001)
Urban - -0.001 -0.017** 0.001
(0.001) (0.007) (0.001)
Education - -0.003* 0.002 0.001
(0.001) (0.005) (0.001)
Multiple partners - 0.013** -0.003 -0.010
(0.005) (0.001) (0.005)
Older partner by 5 years - -0.016*** 0.013 -0.012
(0.004) (0.013) (0.013)
Wealth quintile - 0.000 -0.007 0.000
(0.000) (0.020) (0.000)
Constant 0.105**

Note. Standard errors in parentheses. *** P<0.001, ** P<0.01, * P<0.05.

Columns 2-4 present results for individual variables in each component of the BO decomposition. Column 2 (endowments of risk factors) shows that more men have multiple partners than women (P<0.01), more women have an older partner than men (P<0.001), and more men complete primary education than women (P<0.05). Column 3 (riskiness of the risk factors) shows that each age increment is riskier for women (P<0.001), and that living in an urban environment is also riskier for women (P<0.01).

Table 4 presents results for the BO decomposition for Tanzania. Column 1 shows that men and women have a prevalence rate of 1.6 percent (P<0.001) and 4.3 percent (P<0.001), respectively, with a gender gap of 2.8 percent, i.e. women having more than twice the rate of men. Overall, only the coefficients part is statistically significant at the 0.001 level, suggesting (as in Malawi) that there are differences in the riskiness of risk factors between men and women. Here, these riskiness differentials account for the entire gap and then some, i.e. 3.4 percentage points. Column 2 shows that a larger share of men have multiple partners than women (P<0.001), and that more women have an older partner than men (P<0.05), much like in Malawi. Moreover, there are fewer women than men who completed primary education (P<0.01), and more women live in urban environments compared to men (P<0.01). The results for Column 3 are consistent with those of Malawi, showing that having multiple partners is riskier for women (P<0.001), in addition to age (P<0.05) and urban environment (P<0.05) being riskier for women. Column 4 shows that an unexplained part of the gap is associated with both having multiple partners (P<0.001), which widens the gap, and living in an urban environment (P<0.05), which narrows the gap.

Table 4.Blinder-Oaxaca decomposition for Tanzania (N = 9,576)
Men 0.016*** - - -
Women 0.043*** - - -
Difference -0.028*** - - -
Endowments 0.005 - - -
Coefficients -0.034*** - - -
Interaction 0.001 - - -
Age - 0.000 -0.043* -0.000
(0.000) (0.020) (0.000)
Urban - -0.001** -0.009* 0.001*
(0.000) (0.003) (0.001)
Education - -0.001** 0.004 0.001
(0.000) (0.002) (0.000)
Multiple partners - 0.013*** -0.004*** -0.010***
(0.002) (0.001) (0.003)
Older partner by 5 years - -0.006* -0.010 0.009
(0.003) (0.009) (0.009)
Wealth quintile - 0.000 0.002 -0.000
(0.000) (0.010) (0.000)
Constant 0.026

Note. Standard errors in parentheses. *** P<0.001, ** P<0.01, * P<0.05.

Table 5 presents results for the BO decomposition for Zambia. Column 1 shows that men and women have prevalence rates of 2.9 percent (P<0.001) and 9.7 percent (P<0.001), respectively, with a gender gap of 6.8 percent, i.e. women having more than three times the rate of men. Overall, the interaction part is statistically significant at the 0.05 level, which suggests that differences in the unexplained component (P<0.05) between men and women account for over three quarters of the gender gap. Column 2 shows that more men have multiple partners than women (P<0.001), and that fewer men live in urban environments (P<0.01). Column 3 shows that having multiple partners is riskier for women (P<0.001), in addition to age (P<0.05) and urban environments (P<0.001) being riskier for women as well. The results for column 4 are consistent with those for Tanzania, as they show that having multiple partners (P<0.001) widens the gap and an urban environment (P<0.01) narrows it in a way that is unexplained by the model.

Table 5.Oaxaca-Blinder decomposition for Zambia (N = 5,350)
Men 0.029*** - - -
Women 0.097*** - - -
Difference -0.068*** - - -
Endowments 0.015 - - -
Coefficients -0.031 - - -
Interaction -0.052* - - -
Age - -0.001 -0.078* 0.001
(0.001) (0.040) (0.000)
Urban - -0.004** -0.031*** 0.006**
(0.001) (0.009) (0.002)
Education - -0.000 0.003 0.001
(0.001) (0.008) (0.002)
Multiple partners - 0.030*** -0.007*** -0.030***
(0.005) (0.001) (0.006)
Older partner by 5 years - -0.008 0.031 -0.030
(0.005) (0.021) (0.020)
Wealth quintile - -0.002 -0.029 0.001
(0.001) (0.023) (0.001)
Constant 0.080

Note. Standard errors in parentheses. *** P<0.001, ** P<0.01, * P<0.05.

Table 6 compares our previous LPM results to non-linear (logit) decomposition using the same predictor variables. We rely on Oaxaca and Ransom29 to apply a generalized linear decomposition model, and on Reimers30 and Cotton31 to treat the weighting matrix (omega) as a scalar matrix of both 0, assuming comparisons based upon the group with the highest expected values for the dependent variable, and 1, assuming the opposite and switching the reference group. The non-linear decomposition results for a scalar matrix of zero are mostly consistent with the linear probability model results. The overall decomposition findings are summarized in Table 7.

Table 6.Linear versus Non-linear sensitivity analysis of overall decomposition components
Prevalence Gap Components Malawi Tanzania Zambia
Overall Prevalence Gender Gap 5.8% 2.8% 6.8%
Endowments of Risk Factors
Linear Probability Model (OLS)
Non-Linear, Logit (Omega = 1)
Non-Linear, Logit (Omega = 0)
Riskiness of Risk Factors (coefficients)
Linear Probability Model (OLS)
Non-Linear, Logit (Omega = 1)
Non-Linear, Logit (Omega = 0)
Residual (interactions)
Linear Probability Model (OLS)
Non-Linear, Logit (Omega = 1)
Non-Linear, Logit (Omega = 0)

Notes: OLS denotes odds least squares.


In this study, we use nationally representative survey data from the PHIA project to document gender differences in HIV prevalence and to examine the extent to which underlying individual characteristics account for consistent differences in the probability of having HIV. We found that having multiple partners and having an older partner carries a similar risk with regard to younger adults contracting HIV in Malawi; in Tanzania and Zambia the risk associated with having multiple partners is greater than the risk associated with having an older partner. In all three countries the risk of being a younger adult woman is higher than for a man, other risk factors the same.

We examined the gender differential further using the BO decomposition method. Results show consistently that younger women exhibit some consistent factors contributing to the HIV prevalence gap in these countries, but we also find substantial differences across countries. There is consistency in some important sources of the gender gap in HIV prevalence across the three countries: (i) getting older is differentially riskier for women than men, (ii) living in urban areas is riskier for younger women than younger men, and (iii) having more partners is differentially riskier for younger women than younger men. Clarifying, younger men generally have more partners than younger women, but our data show that the incremental risk of adding a partner is higher for younger women than younger men.

As shown in Table 7, we find that observable characteristics explain over one-half (65.5%) of the differential in HIV prevalence between men and women in Malawi, and the entirety of the differential in Tanzania, but less than one-quarter (23.5%) of the differential in Zambia. The factors responsible for the ‘gap’ are not the same in these countries. In Malawi and Tanzania, the differentials in the riskiness of the risk factors for men and women is statistically significant and explains between half (Malawi) and the entire gender gap (Tanzania).

In Zambia, the endowments and interaction components were statistically significant (at the 0.1 and 0.05 levels, respectively) and explain most of the gender gap. The unexplained component is largest in Zambia, while in Malawi and Tanzania different risk distributions and differential risks drive the HIV prevalence gender gap dissimilarly. The cross-national pattern which emerges suggests that while more men have multiple partners than women, having multiple partners is riskier for women, thereby widening the gender gap. The phenomenon of women who have multiple partners also widens the gender gap in a way which is unexplained by the model.

The study has several limitations. First, the PHIA data is compiled using a self-reported survey, and missing data might well be non-random, i.e. individuals choosing not to answer certain questions or withhold information for various reasons. Second, although most survey questions and responses are consistent across countries, some exhibit differences in wording, type of answers logged, or absent questions altogether. We also lack spatial data to better capture the impact of peri-urban living conditions, which are relevant for a non-negligible part of our relevant populations, leaving us with a rather crude binary construct for urbanicity. Finally, our model does not capture an individual’s sense of agency due to limited data.

Our results suggest that governments and global aid agencies should be mindful of distinct national characteristics and disease spread patterns when considering prevention efforts. Policies aiming to encourage younger women to form relationships with men among their cohort would be most impactful in Malawi, where currently many younger women have partners who are more than five years their senior compared to younger men. An example for such a policy would be the ‘Zones’ program by the ‘Younger 1ove’32 organization in Botswana. Other policies can attempt to reduce non-monogamous relationships among younger adults, both men and women, wherein multiple partners are a key driver of the gender gap.

Table 7.Summary of Decomposition Results Including Significant Risk Factors in Each Country
HIV prevalence gap components Malawi Tanzania Zambia
Overall Prevalence Gender Gap 5.8% 2.8% 6.8%
Component 1 15.5% (not significant overall) -17.9% (not significant overall) -22% (significant overall)
Characteristics (endowment) 1. Fewer women complete primary education
2. More women have an older partner
1. More women live in urban environments
2. Fewer women complete primary education
3. More women have an older partner
1. More women live in urban environments
2. Women have less wealth than men
Component 2 50% (significant overall) 121.5% (significant overall) 45.5% (not significant overall)
Level of risk factors (coefficients) 1. Age is riskier for women
2. Urban environments are riskier for women
1. Age is riskier for women
2. Urban environments are riskier for women
1. Age is riskier for women
2. Urban environments are riskier for women
Component 3 34.5% (not significant overall) -3.6% (not significant overall) 76.5% (significant overall)
Unexplained (interactions) 1. Having multiple partners explains some of the added risk for women 1. Having multiple partners explains some of the added risk for women
2. Urban environments explain some of the added risk for women
1. Having multiple partners explains some of the added risk for women
2. Urban environments hints at added risk for men


We thank William Crown, Roya Sherafat-Kazemzadeh, Monica Jordan, Diana Bowser, and Donald Shepard of Brandeis University; Jennifer Kates and Josh Michaud of Kaiser Family Foundation; and Christopher Baum of Boston College for helpful comments and advice. Clare L. Hurley of Brandeis provided editorial and administrative assistance.


This paper was produced with funding from Centers for Disease Control and Prevention (CDC), Division of Global HIV/AIDS & TB (DGHT) under Cooperative Agreement Number U2GGH001531. Its contents are solely the responsibility of Cardno and Brandeis University and do not necessarily represent the official views of CDC.

Authorship contributions

Study design and review: ED, GG, AKN; computation and analysis: ED, FN; literature review: ED, FN; writing: ED, GG, FN.

Competing interests

The authors completed Disclosure of Potential Conflicts of Interest forms at (available upon request from the corresponding author), and declare no conflicts of interest.

Correspondence to:

Elad Daniels, PhD candidate, MA.

Institute of Global Health and Development, Schneider Institutes for Health Policy and Research, The Heller School for Social Policy and Management, Brandeis University, Waltham, Massachusetts, USA.

[email protected]