The literature on the economic status of households (wealth or income) and human immunodeficiency virus (HIV) prevalence is extensive, though inconsistent. Prevalence and household wealth have shown relationships ranging from insignificant to strong-positive. Early researchers found that the HIV epidemic appeared to be fueled by poverty.1 More recent studies show however, that household wealth is associated with higher HIV prevalence rates, 2–5 and that the epidemic has tended to grow faster in richer countries.6,7 But, the literature on prevalence is generated almost exclusively from the use of Demographic and Health Surveys (DHS) data, supplemented with the DHS biomarker blood testing program. Without a cohort design, the DHS survey samples may contain an inflated number of HIV-positive individuals surviving in the wealthier group of respondents because they tend to live longer with HIV than do poorer individuals.8 This inadvertent DHS sample bias may make it appear that prevalence rises with more wealth.9 In Tanzania, Parkhurst (2010) frames the survival problem by arguing that in the initial stages of the epidemic, high wealth may be a risk factor, but as the epidemic matures, higher levels of wealth may be a protective factor. In general, the analysis of prevalence using observational data, such as DHS, tends to oversample wealthy people, who live longer than comparable poorer individuals. This oversampling tends to bias the statistics and regression estimates.


Research shows inconsistent patterns between prevalence and wealth across countries, between males vs. females, and across different levels of wealth. The wealth effects on prevalence for women have often been shown to be stronger than for males 3,4,10 though still somewhat inconsistently.5

Some studies11,12 also report that prevalence may be higher when the level of inequality of economic status (income or wealth) is higher. Other studies confirm the importance of both wealth and inequality of wealth on outcomes such as knowledge of transmission, particularly for women.6,13–15 Fox (2012) has tied some of these threads together in multivariate work by using both wealth and wealth inequality in econometric models of prevalence. Fox found that the HIV risk associated with inequality is more important than the effects associated with the level of wealth. Furthermore, that study found that in wealthier places the poor were at higher risk, but in poor places, the wealthy were at greater risk.6

The incidence of HIV has infrequently been examined for wealth effects using cohort methods (not subject to oversampling the rich). In a large study in rural South Africa, Barnighausen (2007) used a household asset-based relative wealth measure (3 groups, top 20%, middle 40%, and bottom 40%) and found no significant difference in the hazard rate of acquiring HIV between the top and bottom wealth groups.16 The study did find evidence of a significantly higher hazard for the middle wealth category relative to the low wealth group. A second large cohort study of incidence by Aulagnier (2011) in Namibia found no relationship between any of the socioeconomic status (SES) measures (household expenditures, education, insurance, employment status) and HIV incidence.17 They reported a higher incidence among women compared to men, and among women, the incidence was higher when household expenditures (an income proxy) were lower.

Wealth effects on adherence to the prescribed regimen of anti-retroviral treatment (ART) have also been studied. Peltzer (2013) performed a review of 62 studies of SES measures and adherence in low- and middle-income countries and found inconsistent impacts of wealth and other economic measures on adherence.18 Some studies found economic status measures to be positive drivers of adherence, other studies found insignificant or even negative influences on adherence. A review of adherence studies in India by Sahay (2011) echoes the inconsistency of economic drivers of adherence.19

Amid the inconsistencies from various studies of household well-being and the HIV epidemic, three things seem clear in the literature: (1) neither low or high economic status is consistently seen as driving prevalence or its contributing behaviors; (2) inequality in economic status may be more consistently associated with prevalence than is the level of economic status, though it has not been studied regarding incidence or contributing behaviors; and (3) there seems to be a possibility that DHS sampling procedures may be exacerbating some of the confusing and inconsistent aspects of the literature pertaining to understanding the relationship between HIV prevalence and household economic well-being.


Individual-level DHS survey data was obtained for 2010-2016 from 29 countries through the DHS data portal. These data link survey responses with the HIV test results from biomarker data. Household wealth is the only direct measure of socioeconomic status collected on the DHS. It is measured by enumerating assets in the household (type of plumbing, transport vehicles, appliances, water source, etc.). DHS creates weights for all assets and a numeric wealth index level for each household. We measure wealth for each respondent using this country-specific wealth index by putting each adult in the high, medium, or low tertile of the wealth index in their respective country. A measure of wealth inequality is also constructed for DHS sampling regions in each country from the wealth index disparities of surveyed households in each region. This measure of wealth inequality is a Gini coefficient, a metric often used to measure the degree of inequality in a population. The extent of wealth inequality for each household is assigned the Gini value of inequality for their region of residence.

In the analytic approach of this study, a descriptive analysis was conducted to examine the raw data on the measures to study, and logit statistical models were estimated to show the association between measures of household economic well-being and HIV prevalence in adults. All models were estimated from the person-level data set containing 403,493 adult respondents (15 years and older) from the most recent (2010-2016) DHS surveys in 29 countries. The 29 countries are taken from sub-Saharan Africa, Haiti, and the Dominican Republic. Table 1 below lists the countries.

Table 1:DHS Sample Data (2010-16) by country using DHS sample weights
Countries Biomarker sample size Adult HIV prevalence Estimated number of adults (15-49 years old) living with HIV
10 PEPFAR priority countries
Cote d'Ivoire 9,008 4.00% 412,525
Haiti 18,531 2.20% 123,861
Lesotho 6,096 24.80% 287,933
Malawi 14,779 9.00% 776,453
Namibia 8,858 14.30% 177,453
Rwanda 12,940 3.10% 177,966
Tanzania 17,745 5.10% 1,258,187
Uganda 21,416 7.30% 1,313,985
Zambia 29,007 13.50% 988,738
Zimbabwe 16,475 14.10% 1,163,460
19 other countries in the study
Angola 23,568 1.90% 243,684
Burkina Faso 15,389 1.00% 83,744
Burundi 15,904 1.00% 44,866
Cameroon 17,638 4.30% 431,926
Chad 14,202 1.60% 99,025
Congo, Democratic Republic of 18,614 1.10% 358,452
Dominican Republic 25,776 0.80% 44,066
Ethiopia 10,992 0.90% 465,793
Gabon 8,848 4.20% 39,339
Gambia 7,771 2.00% 17,675
Ghana 8,380 2.00% 292,767
Guinea 8,182 1.80% 97,101
Liberia 8,861 2.10% 41,396
Mali 11,270 1.10% 80,085
Mozambique 8,628 13.00% 1,500,895
Niger 14,600 0.40% 29,800
Senegal 9,917 0.70% 50,135
Sierra Leone 10,726 1.50% 51,253
Togo 9,172 2.50% 87,840
Total (All 29 countries) 403,294 4.80% 10,740,403

DHS – Demographic and Health Surveys; HIV – human immunodeficiency virus; PEPFAR – US President’s Emergency Plan for AIDS Relief.

Absent a cohort design, the DHS oversamples high wealth individuals, who tend to live longer with HIV than persons who are poorer.8 As might be expected, as the epidemic matures, this bias is going to become more and more pronounced, and the research on prevalence is likely to show the impact of wealth on prevalence to becoming greater over time. To rebalance the sample, the econometric modeling includes the use of an adjustment technique of inverse probability weighting (IPW).20,21 IPW has been used to adjust for sample selection bias in DHS data for respondent failure to agree to provide a blood sample for biomarker testing.22–24

We use a similar technique to adjust for the non-random assignment to wealth categories in the DHS observational data. We measure wealth using a wealth index for each household, assigned to one of three categories: low wealth (bottom country tertile, omitted in the regression model), medium tertile, and high-wealth tertile – the top third of households in the country. We use a regression adjustment technique developed by Imbens and Wooldridge (2009)25 which is increasingly used in epidemiology and economics to isolate the effect of exposure to a ‘treatment’’ (eg wealth) in observational studies like this one. This advantageous technique robustly adjusts both for the predictors of ‘treatment’ (likelihood of being in each wealth group) and adjusts for the effects of these ‘predicted wealth levels’ on the prevalence outcome. In Stata, this was done with the 'teffects-ra’ command using logit models. (StataCorp LLC, College Station, Texas, USA,26) The IPW regression adjustment technique uses three steps:

  1. Use of a logit model to predict wealth level using all covariates listed below. The inverse probability weights for each individual(1/1-pr) are computed in this step;

  2. Development of logit HIV prevalence models for each wealth category based on the inverse probability weights and the same covariates. This creates predicted HIV prevalence rates for each wealth level for each respondent;

  3. Compute the average effects on predicted prevalence for each wealth level. The differences between the average predicted level of prevalence between the middle- and low-wealth models is the average treatment effect (ATE) of having middle wealth (assuming all else is the same between the individuals in the two samples) – this is done by assuming the population of the middle wealth sample is the same as the sample in the ‘low’ wealth category across all covariates. The ATE of the high-wealth category is calculated in the same way. The potential outcome mean (POM) is also calculated - the predicted level of prevalence for the low-wealth category at mean levels for all covariates.

The independent variables for both models include:

  • Age group dummy variables The age groups are 15-17, 18-20, 21-23, 24-27, 28-30, 31-33, 34-43 and above 44. (Age 15-17 is always omitted and is the reference group for interpreting the coefficient on the other categories of age)

  • Gender – female = 1, male is the reference category (e.g. 0)

  • Place of residence Rural = 1, Urban is reference for type of place of residence

  • Living arrangement - Married or Partnered = 1, single, widowed, divorced, etc. are the reference category

  • Literacy ability to read = 1, otherwise 0

  • Insurance = Some health insurance is available to the respondent (1), or not (0).

  • Private sector health care use in the last year, some = 1, none = 0

  • Highest wealth-inequality tertile = 1, tertile with lowest wealth inequality is the reference category (low cutoff for high-wealth tertile = 0.025; the highest value is 0.361)

  • Medium wealth-inequality tertile = 1, tertile with the lowest wealth inequality is the reference category (lowest value of the wealth inequality Gini index =0.005) (low cutoff point for medium wealth tertile =0.016 )

  • Country variable = 1 if observation is from country, 0 otherwise – differenced against an excluded variable (Angola). Some models have more than one excluded variable due to singularities in the data

We examined the extent of collinearity of these covariates in the unweighted logistic regression shown in Table 3 below. We calculated the variance inflation factor (VIF)27 to measure the extent of collinearity for each coefficient in the logit model. In no case does the VIF value for any covariate exceed the conventional VIF criterion level of 5.0).28


Table 1 lists the 29 countries in the study sample. The table uses DHS sample weights. It shows considerable variations in prevalence rates, ranging from less than 1% (Niger and Senegal) to nearly 25% (Lesotho). Clearly, the PEPFAR priority countries (listed first) have much higher prevalence rates (averaging 9.0%) than the other countries in the study sample.

Table 2 shows the HIV prevalence rates for all DHS sampled adults in the study’s 29 countries, by country wealth and wealth inequality tertiles and by demographics categories (by age group, gender, and urban-rural location). The first row shows the poorest tertile, the least wealthy third of adults in each of the 29 countries. While their wealth amounts will be different across countries, they share the common feature of being the least wealthy third of people in their own country.

The first three rows (wealth tertiles) show that prevalence rates rise with wealth. The data in the right-most column show overall prevalence rates rising by wealth category. This pattern is repeated for almost all the demographic categories shown here. Within wealth categories, prevalence generally increases with age through age 35-44, then falls slightly in the 44+ age group. Prevalence is consistently higher for both women and urban residents.

The table also shows that wealth inequality appears to be systematically related to prevalence; more inequality of the wealth distribution is associated with higher prevalence. For persons residing in country regions where wealth is homogeneous across households, prevalence rates are lowest. This is more or less true for each of the demographic categories. Within levels of inequality the usual patterns are seen: higher prevalence with higher age for females, and for urban residents.

Tables 3 and 4 below show logistic regression results and the reported coefficients are log odds ratios.

In Table 3 the effects of covariates (including inequality of wealth) are estimated separately for each wealth level (the omitted reference category is the low- or poor-wealth tertile). Coefficients presented for the middle- and high-wealth tertiles show the difference in prevalence relative to persons in the ‘poor’ reference wealth category, assuming all persons have the same characteristics as the ‘poor’ group, except for the predicted wealth level. In the side-by-side columns, one can see the difference in likelihood of positive prevalence for persons in each predicted wealth category relative to each wealth category, where the counterfactual is the value of covariates for the reference (poor) category. For reference, an unadjusted logistic regression model is also included.

The basic demographic patterns seen in the descriptive analysis are confirmed in Table 3 in both the IPW models and the unadjusted models. Older persons up to age 43 and women have a higher prevalence; being married/partnered and living in rural areas are both protective of HIV.

The effect of wealth on prevalence is not consistent, particularly so in the IPW models where selection bias has been corrected. In the IPW model the computed average effects of the middle- and high-wealth categories are shown in Table 4. The effects of medium and high wealth on prevalence are computed from the models in Table 3, using an assumption that the covariates are the same as for the reference category (eg. ‘the low-wealth tertile’). As shown in Table 4, the average effect of wealth (ATET—average effected on the treated group) of the middle-wealth group on prevalence has a log odds ratio of 0.005. This means that belonging to the middle wealth category increases the odds of being HIV positive by one half a percentage point relative to being in the low wealth group, cet. par. This small increase in the odds of being HIV positive is still significant (P<0.01). This table also shows that the highest tertile of wealth does not increase the odds of having HIV, but in fact, the high wealth group has a lower prevalence rate than the middle wealth category. The high wealth group, other things the same, has a one-tenth of one percent higher prevalence rate than persons in the low wealth (poor) tertile. The last statistic in Table 4 is the potential outcome mean (POM): the expected prevalence rate for the poor wealth group is 5.7%. And, using the IPW adjustment, the mean expected level of prevalence for the medium wealth group is 6.2% (5.7 + 0.5), cet. par. The expected IPW adjusted mean for the high wealth category is the same as the poor category.

This is not the pattern one would expect if more wealth had been systematically associated with higher prevalence. The inconsistency of the wealth effects on prevalence is also evidenced in the unweighted model in the right-most column of Table 3. The medium wealth tertile shows prevalence to be six percentage points higher than the poor tertile. Further, the high wealth tertile has a negative sign. While neither the IPW or unadjusted model shows consistency for wealth effects on prevalence, the average size of the prevalence effects of being in the middle wealth category in the IPW-adjusted model is far smaller than in the unadjusted model (by as much as an order of magnitude).

The wealth interactions with the covariates in the IPW model in Table 3 offer some interesting patterns about how wealth influences HIV prevalence in segments of the population. These patterns show more wealth to be associated with higher prevalence for women, and less protection from being married /partnered. The coefficients in Table 3 for the age groups are always higher for the low wealth group. This means that the increments in prevalence for each of the age groups (relative to the reference category of Age 15-17) are always larger for the low wealth group. Or the age gradient for prevalence tends to be steeper for persons in low wealth households.

Regardless of the level of wealth, the models (both IPW and unweighted) consistently show that higher levels of inequality of wealth in the region of residence tend to increase the risk of an adult being HIV positive, other things remaining the same. The impacts of inequality on prevalence are larger for persons who have low wealth than they are for persons with more household wealth. All coefficients for inequality are somewhat smaller in size in the IPW adjusted models. Other household wellbeing factors are generally consistent across the models. Literacy is consistently a positive influence on prevalence, while private sector use and insurance are consistently negative influences on prevalence, other things constant.

Table 2:HIV prevalence rates (%) in 29 countries by category of population using DHS sample weights
Tertile rank Age group (years) Gender Place of residence All adults
15-17 18-20 21-23 24-27 28-30 31-33 34-43 44-49 Male Female Urban Rural
Wealth tertiles
Low 0.80 1.40 2.30 3.10 4.40 5.50 6.30 5.40 3.20 4.50 4.60 3.80 3.90
Medium 1.10 1.90 2.90 3.90 4.90 6.50 7.40 6.60 3.70 5.50 5.60 4.30 4.60
High 1.20 2.00 3.40 5.10 6.60 8.30 9.30 8.40 4.30 6.70 6.10 4.30 5.60
Wealth Inequality Tertiles
Low 0.60 1.10 1.60 2.20 3.00 3.30 4.00 3.50 2.10 3.00 3.40 2.40 2.50
Medium 1.00 1.90 3.40 4.70 6.40 7.80 8.90 8.30 4.30 6.60 5.50 5.50 5.50
High 1.50 2.50 4.00 6.00 7.80 10.40 11.6 9.70 5.40 8.10 7.10 6.30 6.80
Average All Countries 1.0 1.8 3.0 4.2 5.5 7.00 7.80 6.90 3.80 5.70 5.90 4.00 4.80

HIV – human immunodeficiency virus

Table 3:Prevalence regression models across wealth levels and unweighted logistic regression
    Regression-adjusted IPW coefficients   Unweighted logistic regression (log odds ratio)
Covariates Poorest wealth SE Midlevel wealth SE Richest wealth SE Regression coefficients SE
Middle wealth NA NA NA 0.060*‡ -0.027
High wealth NA NA NA -0.083**§ -0.025
Age 18-20 1.095*** -0.147 0.812*** -0.14 0.626** -0.122 0.728*** -0.133
Age 21 - 23 1.717*** -0.144 1.367*** -0.138 1.311*** -0.116 1.337*** -0.235
Age 24-27 2.129 -0.135 1.906*** -0.131 1.784*** -0.112 1.880*** -0.384
Age 28-30 2.518 -0.136 2.259*** -0.135 2.290*** -0.116 2.282*** -0.582
Age 31-33 2.685 -0.136 2.407*** -0.135 2.314*** -0.116 2.471*** -0.707
Age 34-43 2.862*** -0.129 2.603*** -0.125 2.632*** -0.11 2.694*** -0.825
Age 44+ 2.705*** -0.131 2.530*** -0.129 2.655*** -0.113 2.621*** -0.789
Female 0.255*** -0.043 0.364*** -0.046 0.398*** -0.037 0.357*** -0.029
Literacy 0.143** -0.046 0.048 -0.054 0.05 -0.06 0.140*** -0.029
Married/Partnered -0.683*** -0.042 -0.621*** -0.047 -0.450*** -0.041 -0.573*** -0.011
Rural -0.246** -0.069 -0.470*** -0.059 -0.377*** -0.036 -0.383*** -0.017
Wealth Inequality
T2 (vT1) middle inequality 0.316*** -0.053 0.274*** -0.067 0.316*** -0.061 0.400*** -0.043
T3 (vT1) highest inequality 0.387*** -0.076 0.239*** -0.075 0.275*** -0.064 0.439*** -0.048
Some private utilization -0.115** -0.042 -0.063 -0.045 -0.085** -0.038 -0.123*** -0.017
Insured -0.445*** -0.092 0.011 -0.096 -0.167** -0.058 -0.275*** -0.025
Constant   -5.593*** -0.174   -5.412*** -0.166   -5.809*** -0.153   -5.717*** 0

IPW – inverse probability weighting; SE – standard error; * P < 0.01, ** P < 0.05, *** P < 0.001; † IPW Models for 3 wealth levels are two-step regression-adjusted estimators. They are log odds ratios interpreted as: OR=1.060 indicating ~6% higher odds of being HIV seropositive (and is significant); and for example, if § OR= 0.920 indicating ~8% lower odds of being HIV seropositive (and is significant).

Table 4:Comparison of wealth effects: IPW weighted regression with (unweighted) logistic regression
Parameter   Inverse probability weighted (IPW) regression adjusted estimates Unweighted logistic regression coefficients
Effects of wealth Average effects of wealth (ATE)† (SE) Coefficient‡ (SE)
Middle-wealth category 0.005**§ (0.001) (counterfactual: if subjects were identical in all characteristics to those in ‘low-wealth’ group) 0.060* (0.027)
High-wealth category 0.001 (0.001) (counterfactual: if subjects were identical to those in ‘low-wealth’ group) -0.083** (0.025)
Potential outcome mean& 0.057***^ (0.001) (predicted value of prevalence for the low-wealth group)

ATE – average treatment effect for those in the wealth category; SE – standard error; *** P < 0.01, ** P < 0.05, * P < 0.1; † Average Treatment Effect denotes increment in prevalence for persons in this wealth group, relative to the prevalence level of persons in the low-wealth group (the reference category);; ‡ In order to calculate Odds Ratio from the regression coefficient use OR=exp (coefficient); § OR=1.005 indicating ~0.5% higher odds of being HIV seropositive (and is significant); ‖ OR=1.060 indicating ~6% higher odds of being HIV seropositive (and is significant); ¶ OR=1.001 (non-significant); – OR= 0.920 indicating ~8% lower odds of being HIV seropositive (also significant); & Potential Outcome Mean: Regression coefficient for risk of HIV seropositivity if everyone in the population belonged to lowest wealth tertile; ^ OR=1.059 indicating ~5% higher odds of being HIV seropositive (and is significant).


The importance of wealth in driving HIV prevalence does not follow a consistent pattern. The regression models, after correcting for sample selection bias and controlling for other demographic and country factors, demonstrate the wealth effects on prevalence are inconsistent. Results demonstrate either an insignificant relationship between wealth and prevalence or demonstrate larger wealth effects for the middle level of wealth than for the higher level of wealth.

Inequality of wealth in the region, however, is a positive and very consistent influence on HIV prevalence. Regardless of the level of wealth, the models show that higher levels of inequality of wealth in the local area will tend to increase the risk of an adult being HIV positive, other things constant. In all three wealth categories (tertiles of high-, medium-, low-household wealth), more wealth inequality in a region is associated with higher prevalence, other things the same.

Poverty (low wealth) is still a driving risk factor in the HIV epidemic. The age gradient suggests that the poor have consistently higher prevalence rates in all age categories up to age 40. Marriage provides more risk protection when the household is poor, and rural residence provides less protection when poor. Furthermore, the risky influence of inequality seems most pronounced among the poor.

The importance of inequality as a substantial risk factor has policy-targeting implications. Policy might differentially focus on places with high levels of wealth inequality. This research suggests that supply-oriented policies could be effective (e.g. adding more providers and programs) to target areas where wealth inequality is high and where pockets of low wealth populations are found.

Rebalancing the DHS sample for the oversampling of wealthy persons did alter the study results, and certainly has implications for using observational DHS data to study disease patterns. The regression coefficients for wealth effects on prevalence were much smaller after reweighting the sample (to reduce the oversample of wealthy persons with HIV). This confirms that some of the earlier findings of wealth’s importance in driving prevalence were the consequence of wealthy people living longer with HIV than poor counterparts; making it appear in the observational data that wealth is correlated with prevalence.


The authors would like to acknowledge Mike Ruffner at PEPFAR whose support allowed us to undertake this research. We would also like to acknowledge Tymon Sloczynski, Assistant Professor in the Department of Economics and International Business School at Brandeis University for his assistance with inverse probability weighting, and Clare L. Hurley at the Heller School for Social Policy and Management, Brandeis University for editorial assistance. No human subject’s data or identifiable information was used in this secondary analysis of the public use DHS survey data.


This paper was produced with funding from Centers for Disease Control and Prevention (CDC), Division of Global HIV/AIDS & TB (DGHT) under Cooperative Agreement Number U2GGH001531. Its contents are solely the responsibility of Cardno and Brandeis University and do not necessarily represent the official views of CDC.

Authorship contributions

All authors contributed equally to this manuscript.

Competing interests

The authors have completed the Unified Competing Interest form and declare no financial conflicts of interest.

Correspondence to:

Dr. Gary Gaumer
Institute for Global Health and Development
The Heller School for Social Policy and Management
Brandeis University, 415 South Street
Waltham, Massachusetts 02454-9110.
[email protected]