Population Assessment of Tobacco and Health (PATH) Study [United States] Master Linkage Files (ICPSR 38008)

Version Date: Apr 8, 2024 View help for published

Principal Investigator(s): View help for Principal Investigator(s)
United States Department of Health and Human Services. National Institutes of Health. National Institute on Drug Abuse; United States Department of Health and Human Services. Food and Drug Administration. Center for Tobacco Products

Series:

https://doi.org/10.3886/ICPSR38008.v13

Version V13 ()

  • V13 [2024-04-08]
  • V12 [2023-12-15] unpublished
  • V11 [2023-09-18] unpublished
  • V9 [2023-03-31] unpublished
  • V8 [2022-12-16] unpublished
  • V7 [2022-10-07] unpublished
  • V6 [2022-05-11] unpublished
  • V5 [2022-04-21] unpublished
  • V4 [2021-12-16] unpublished
  • V3 [2021-09-29] unpublished
  • V2 [2021-06-03] unpublished
  • V1 [2021-04-27] unpublished
Slide tabs to view more

PATH Study MLF

The PATH Study was launched in 2011 to inform the Food and Drug Administration's regulatory activities under the Family Smoking Prevention and Tobacco Control Act (TCA). The PATH Study is a collaboration between the National Institute on Drug Abuse (NIDA), National Institutes of Health (NIH), and the Center for Tobacco Products (CTP), Food and Drug Administration (FDA). The study sampled over 150,000 mailing addresses across the United States to create a national sample of those who do and do not use tobacco.

45,971 adults and youth constitute the first (baseline) wave, Wave 1, of data collected by this longitudinal cohort study. These 45,971 adults and youth along with 7,207 "shadow youth" (youth ages 9 to 11 sampled at Wave 1) make up the 53,178 participants that constitute the Wave 1 Cohort. Respondents are asked to complete an interview at each follow-up wave. Youth who turn 18 by the current wave of data collection are considered "aged-up adults" and are invited to complete the Adult Interview. Additionally, "shadow youth" are considered "aged-up youth" upon turning 12 years old, when they are asked to complete the Youth Interview after parental consent.

At Wave 4, a probability sample of 14,098 adults, youth, and shadow youth ages 10 to 11 was selected from the civilian, noninstitutionalized population at the time of Wave 4. This sample was recruited from residential addresses not selected for Wave 1 in the same sampled Primary Sampling Units (PSUs) and segments using similar within-household sampling procedures. This "replenishment sample" was combined for estimation and analysis purposes with Wave 4 adult and youth respondents from the Wave 1 Cohort who were in the civilian, noninstitutionalized population at the time of Wave 4. This combined set of Wave 4 participants, 52,731 participants in total, forms the Wave 4 Cohort.

At Wave 7, a probability sample of 14,863 adults, youth, and shadow youth ages 9 to 11 was selected from the civilian, noninstitutionalized population at the time of Wave 7. This sample was recruited from residential addresses not selected for Wave 1 or Wave 4 in the same sampled PSUs and segments using similar within-household sampling procedures. This second replenishment sample was combined for estimation and analysis purposes with Wave 7 adult and youth respondents from the Wave 4 Cohort who were at least age 15 and in the civilian, noninstitutionalized population at the time of Wave 7. This combined set of Wave 7 participants, 46,169 participants in total, forms the Wave 7 Cohort.

Please refer to the Restricted-Use Files User Guide that provides further details about children designated as "shadow youth" and the formation of the Wave 1, Wave 4, and Wave 7 Cohorts.

Dataset 0001 (DS0001) contains the data from the Public-Use File Master Linkage File (PUF-MLF). This file contains 77 variables and 67,276 cases. The file provides a master list of every person's unique identification number and what type of respondent they were in each wave for data that are available in the Public-Use Files and Special Collection Public-Use Files.

Dataset 0002 (DS0002) contains the data from the Restricted-Use File Master Linkage File (RUF-MLF). This file contains 174 variables and 82,139 cases. The file provides a master list of every person's unique identification number and what type of respondent they were in each wave for data that are available in the Restricted-Use Files, Special Collection Restricted-Use Files, and Biomarker Restricted-Use Files.

United States Department of Health and Human Services. National Institutes of Health. National Institute on Drug Abuse, and United States Department of Health and Human Services. Food and Drug Administration. Center for Tobacco Products. Population Assessment of Tobacco and Health (PATH) Study [United States] Master Linkage Files. Inter-university Consortium for Political and Social Research [distributor], 2024-04-08. https://doi.org/10.3886/ICPSR38008.v13

Export Citation:

  • RIS (generic format for RefWorks, EndNote, etc.)
  • EndNote
United States Department of Health and Human Services. National Institutes of Health. National Institute on Drug Abuse, United States Department of Health and Human Services. Food and Drug Administration. Center for Tobacco Products

None

Users are reminded that these data are to be used solely for statistical analysis and reporting of aggregated information, and not for the investigation of specific individuals or organizations.

Access to the RUF-MLF data is restricted. Users interested in obtaining these data must complete a Restricted Data Use Agreement. Data are provided via ICPSR's Virtual Data Enclave (VDE). Apply for access to these data through the ICPSR VDE portal. Information and instructions are available within the data portal. For further assistance please reference the VDE Guide to learn about the application process, about using the VDE, and how to request disclosure review of VDE output.

Inter-university Consortium for Political and Social Research
Hide

2013 -- 2014 (Wave 1), 2014 -- 2015 (Wave 2), 2015 -- 2016 (Wave 3), 2016 -- 2018 (Wave 4), 2017 -- 2018 (Wave 4.5), 2018 -- 2019 (Wave 5), 2020 (Wave 5.5 and PATH Adult Telephone Survey (PATH-ATS)), 2021 (Wave 6), 2022 -- 2023 (Wave 7)
2013-09 -- 2014-12 (Wave 1), 2014-10 -- 2015-10 (Wave 2), 2015-10 -- 2016-10 (Wave 3), 2016-12 -- 2018-01 (Wave 4), 2017-12 -- 2018-12 (Wave 4.5), 2018-12 -- 2019-11 (Wave 5), 2020-07 -- 2020-12 (Wave 5.5), 2020-09 -- 2020-12 (PATH-ATS), 2021-03 -- 2021-11 (Wave 6), 2022-01 -- 2023-04 (Wave 7)
  1. The PUF-MLF is available for access by the general public. For the RUF-MLF, data are provided via ICPSR's Virtual Data Enclave (VDE) where researchers will work with data stored on secure ICPSR servers. Researchers will not possess actual physical copies of the data; however, they may request permission to access selected output outside the virtual environment after review by ICPSR. See the Access Notes to apply for access. Researchers are also encouraged to read the VDE Guide.

  2. The data files contain person-level (PERSONID) across waves of data collection. The PERSONID values are random and contain no direct or indirect personally identifiable information. Chapter 7 in the Public-Use Files User Guide contains information about linking data available for public-use. Appendix G in the Restricted-Use Files User Guide also contains information and programming code on linking files together. The files are sorted by the variable PERSONID.

  3. The PUF-MLF includes indicator variables that identify cohort membership and those that identify the availability of interview data and weights for each participant. It also includes variables that identify biomarker core membership and those that indicate availability of biospecimens through the Biospecimen Access Program (BAP). The PUF-MLF can help analysts identify which Public-Use files contain data for a particular participant (or set of participants).

  4. The RUF-MLF includes indicator variables that identify cohort membership and those that identify the availability of interview data, weights, state identifier data, tobacco Universal Product Code (UPC) data, and biomarker data for each participant. It also includes variables that identify biomarker core membership and those that indicate availability of biospecimens through the BAP. The RUF-MLF can help analysts identify which Restricted-Use files contain data for a particular participant (or set of participants).

  5. The RUF-MLF will be extended as new data are released in the PATH Study RUF, Special Collection RUF, and Biomarker RUF collections. The PUF-MLF will be extended as new data are released in the PATH Study PUF and Special Collection PUF collections.

  6. The PATH Study's documentation is available for your use and may be reproduced in whole or in part without permission from NIH's National Institute on Drug Abuse or FDA's Center for Tobacco Products. Citation of the source is appreciated.

  7. Additional background information including answers to frequently asked questions for study participants and researchers can be found in the Researchers section of the PATH Study Series page.

  8. There are a variety of user guides available that describe the PATH Study as well as the use of specific types of data. Researchers can access the user guides on the PATH Study Series page or through the various collections: Restricted-Use Files, Public-Use Files, Special Collection Restricted-Use Files, Special Collection Public-Use Files, or Biomarker Restricted-Use Files.

  9. 2021-04-27 Latest versions of RUF-MLF and PUF-MLF were added to the collection, consolidating the various MLFs that were in each collection: Restricted-Use Files, Public-Use Files, Special Collection Restricted-Use Files, Special Collection Public-Use Files, or Biomarker Restricted-Use Files.

  10. The data for the PATH Study was collected and prepared by Westat. The contract numbers under which they performed their work are HHSN271201100027C and HHSN271201600001C.

Hide

The Population Assessment of Tobacco and Health (PATH) Study is a nationally representative longitudinal cohort study on tobacco use behavior, attitudes and beliefs, and tobacco-related health outcomes among adults and youth in the United States. The study's primary objectives are to:

  • Objective 1: Identify and explain between-person differences and within-person changes in tobacco-use patterns, including the rate and length of use by specific product type and brand, product/brand switching over time, uptake of new products, and dual- and poly-use of tobacco products (i.e., use of multiple products within the same time period and switching between multiple products).
  • Objective 2: Identify between-person differences and within-person changes in risk perceptions regarding harmful and potentially harmful constituents, new and emerging tobacco products, filters and other design features of tobacco products, packaging, and labeling; and identify other factors that may affect use, such as social influences and individual preferences.
  • Objective 3: Characterize the natural history of tobacco dependence, cessation, and relapse, including readiness and self-efficacy to quit, motivations for quitting, the number and length of quit attempts, and the length of abstinence related to various tobacco products.
  • Objective 4: Update the comprehensive baseline and subsequent waves of data on tobacco-use behaviors and related health conditions, including markers of exposure and tobacco-related disease processes identified from the collection and analysis of biospecimens, to assess between-person differences and within-person changes over time in health conditions potentially related to tobacco use, particularly with use of new and different tobacco products, including modified-risk tobacco products.
  • Objective 5: Assess associations between TCA-specific actions and tobacco-product use, risk perceptions and attitudes, use patterns, cessation outcomes, and tobacco-related intermediate endpoints (e.g., biomarkers of exposure and biomarkers related to disease). Analyses will attempt to account for other potential factors, such as demographics, local tobacco-control policies, and social, familial, and economic factors, that may influence the observed patterns.
  • Objective 6: Assess between-person differences and within-person changes over time in attitudes, behaviors, exposure to tobacco products, and related biomarkers among and within population sub-groups identified by such characteristics as race-ethnicity, gender, and/or age, or by risk factors, such as pregnancy or co-occurring substance use or mental health disorders.
  • Objective 7: To the extent to which sample sizes are sufficient, assess and compare samples of those who report former and never use of tobacco products for between-person differences and within-person changes in relapse and uptake, risk perceptions, and indicators of tobacco exposure and disease processes.
  • Objective 8: Use data from the PATH Study's baseline and follow-up waves on tobacco-use behaviors, attitudes, and related health conditions, including potential markers of exposure and related disease processes identified from the analysis of biospecimens, to screen and subsample respondents for participation in formative and/or nested studies conducted during and after the PATH Study's waves of data and biospecimen collection.

At Wave 1, the study sampled over 150,000 mailing addresses which, using a four-staged stratified sampling design, yielded a sample of 45,971 respondents (32,320 adults / 13,651 youth) who completed a Wave 1 interview. People who use or do not use tobacco, who were at least 9 years old living in a civilian, non-institutionalized setting were considered for participation during Wave 1. Youth who turn 18 by the next wave of data collection are considered "aged-up adults" and are invited to complete the Adult Interview. Additionally, 7,207 "shadow youth" (youth ages 9 to 11 sampled at Wave 1) are considered "aged-up youth" upon turning 12 years old when they are asked to join the study. These 53,178 participants form the Wave 1 Cohort.

At Wave 4, a probability sample of 14,098 adults, youth, and shadow youth ages 10 to 11 was selected from the civilian, noninstitutionalized population at the time of Wave 4. This sample was recruited from close to 174,000 mailing addresses not selected for Wave 1, in the same sampled PSUs and segments using similar within-household sampling procedures. To meet the needs for the Wave 4 Cohort shadow sample, a randomly selected subset of the sampled addresses (115,500 or close to two-thirds of the addresses) were screened solely to identify shadow youth ages 10 to 11. The remaining addresses (close to 58,500) were screened for adults, youth, and shadow youth ages 10 to 11. These are referred to as the "SO" (shadow youth only) and "AYS" (adults, youth, and shadow youth) replenishment samples, respectively. This "replenishment sample" was combined for estimation and analysis purposes with Wave 4 adult and youth respondents from the Wave 1 Cohort who were in the civilian, noninstitutionalized population at the time of Wave 4. This combined set of Wave 4 participants, 52,731 participants in total, forms the Wave 4 Cohort.

At Wave 7, a probability sample of 14,863 adults, youth, and shadow youth ages 9 to 11 was selected from the civilian, noninstitutionalized population at the time of Wave 7. This sample was recruited from close to 244,000 mailing addresses not selected for Wave 1 or Wave 4, in the same sampled PSUs and segments using similar within-household sampling procedures. To meet the needs for the Wave 7 youth sample and the need for a Wave 7 Cohort shadow sample, the address sample was randomly divided into three subsamples. A subset of about 111,500 addresses (or close to 45 percent) were screened solely to identify youth ages 9 to 14; another subset of about 97,000 addresses (or close to 40 percent) were screened to identify youth ages 9 to 17. The remaining addresses (close to 36,000) were screened for adults, youth, and shadow youth ages 9 to 11. These subsamples are referred to as the "YYO" (young youth only ages 9 to 14), "YO" (youth only ages 9 to 17) and "AYS" (adults ages 18 and above, youth ages 12 to 17, and shadow youth ages 9 to 11) replenishment samples, respectively. This replenishment sample was combined for estimation and analysis purposes with Wave 7 adult and youth respondents from the Wave 4 Cohort who were at least age 15 and in the civilian, noninstitutionalized population at the time of Wave 7. This combined set of Wave 7 participants, 46,169 participants in total, forms the Wave 7 Cohort.

A four-stage stratified area probability sample design was used in the PATH Study, with a two-phase design for sampling adults at the final stage. At the first stage, a stratified sample of geographical primary sampling units (PSUs) was selected, in which a PSU is a county or group of counties. For the second stage, within each selected PSU, smaller geographical segments were formed and then a sample of these segments was drawn. At the third stage, the sampling frame consisted of the residential addresses located in these segments. The fourth stage selected adults and youth from the sampled households identified at these addresses, with varying sampling rates for adults by age, race, and tobacco use status. Adults were sampled in two phases - Phase 1 sampling used information provided in the household screener and Phase 2 sampling used information provided by the adult in the Phase 2 screener at the beginning of the Adult Instrument. Please consult the Public-Use Files User Guide or Restricted-Use Files User Guide for additional details about the sampling.

Longitudinal: Panel

People who use or do not use tobacco products in the civilian, non-institutionalized household population of the United States aged 9 and older at the time of Wave 1 (Wave 1 Cohort); People who use or do not use tobacco products in the civilian, non-institutionalized household population of the United States aged 10 and older at the time of Wave 4 (Wave 4 Cohort); People who who use or do not use tobacco products in the civilian, non-institutionalized household population of the United States aged 9 or older at the time of Wave 7 (Wave 7 Cohort)

individual

In the PUF-MLF, indicator variables that identify the availability of interview data, weights, and biospecimens (through the BAP) for each participant (or set of participants) with Public-Use data.

In the RUF-MLF, indicator variables that identify the availability of interview data, weights, biomarker data, and biospecimens (through the BAP) for each participant (or set of participants) with Restricted-Use data.

Hide

2021-04-27

2024-04-08 Update to Restricted-Use Mater Linkage Files (DS0002) to include variables for new files in the RUF collection (ICPSR 36231): Wave 7 Questionnaire data and weights (DS7001, DS7002, DS7111, DS7112, DS7121, DS7122, DS7211, DS7212, DS7221, DS7222, DS7331, DS7332, DS7711, DS7712, DS7721, and DS7722), Wave 7 State Identifier data (DS7401 and DS7402), Wave 7 Tobacco UPC data (DS7601) State Design data (DS2), and Wave 6 Ever/Never Reference data (DS6503). Updated BAP variables in the Restricted-Use Master Linkage Files (DS0002) to reflect current availability of biospecimens. Updated BAP variables in the Public-Use Master Linkage Files (DS0001) to reflect current availability of biospecimens.

2023-12-15 Update to Restricted-Use Master Linkage Files (DS0002) to include variables for new files in the BRUF collection (ICPSR 36840): Wave 4 biomarker data and weights for the Wave 1 biomarker core (DS4023, DS4024, and DS4038). Previously released variables were updated to reflect the addition of Wave 4 biomarker data (DS4054, DS4055, and DS4057) and Wave 5 biomarker data and weights (DS5042, DS5051, DS5053, DS5055, DS5056, DS5057, and DS5058) for the Wave 4 biomarker core.

2023-09-18 Update to Restricted-Use Master Linkage Files (DS0002) to include new variables that indicate the availability of different types of specimens through the BAP: red blood cells, buffy coat, and PAXgene tubes for Wave 1, Wave 2, Wave 3, Wave 4, and Wave 5. A variable was also added to indicate availability of Wave 1 buccal cells through the BAP. Previously released indicators for the BAP were also updated to reflect current availability of biospecimens. Update to the Public-Use Master Linkage Files (DS0001) to include variables for new files in the PUF collection (ICPSR 36498): Wave 6 Questionnaire data and weights (DS6001, DS6002, DS6111, DS6112, DS6121, DS6122, DS6211, DS6212, DS6221, DS6222, DS6711, DS6712, DS6721, and DS6722). Added new variables that indicate the availability of different types of specimens through the BAP: red blood cells, buffy coat, and PAXgene tubes for Wave 1, Wave 2, Wave 3, Wave 4, and Wave 5. A variable was also added to indicate availability of Wave 1 buccal cells through the BAP. Previously released indicators for the BAP were also updated to reflect current availability of biospecimens.

2023-04-27 Update to Restricted-Use Master Linkage Files (DS0002) to include variables for new files in the RUF collection (ICPSR 36231): Wave 1 Survey Research Derived Variables (SRDV-RUF) for Adult and Youth / Parent Participants (DS1901 and DS1902), Wave 2 Survey Research Derived Variables (SRDV-RUF) for Adult and Youth / Parent Participants (DS2901 and DS2902), and Wave 3 Survey Research Derived Variables (SRDV-RUF) for Adult and Youth / Parent Participants (DS3901 and DS3902).

2023-03-31 Update to Restricted-Use Master Linkage Files (DS0002) to include variables for new files in the RUF collection (ICPSR 36231): Wave 6 Questionnaire data and weights (DS6001, DS6002, DS6111, DS6112, DS6121, DS6122, DS6211, DS6212, DS6221, DS6222, DS6711, DS6712, DS6721, and DS6722), Wave 6 State Identifier data (DS6401 and DS6402), and Wave 6 Tobacco UPC data (DS6601). The BAP variables for Waves 1 to 5 were updated to reflect current availability of biospecimens. Updated BAP variables in the Public-Use Master Linkage Files (DS0001) to reflect current availability of biospecimens.

2022-12-16 Update to Restricted-Use Master Linkage Files (DS0002) to include variables for new files in the BRUF collection (ICPSR 36840): Wave 4 biomarker data and weights (DS4011, DS4034, DS4043, DS4051, DS4053, and DS4056) and Wave 5 biomarker data and weights (DS5023, DS5024, DS5035, and DS5038).

2022-10-07 Update to Public-Use Master Linkage Files (DS0001) to include variables for new files in the SCPUF collection (ICPSR 37786): Wave 5.5 Questionnaire data and weights (DS2001, DS2002, DS2111, DS2112, DS2121, DS2122, DS2221, and DS2222) and PATH-ATS data and weights (DS3001, DS3111, and DS3121). Also included is one new variable to reflect the addition of a new file in the PUF collection (ICPSR 36498): Wave 5 Ever/Never Reference (DS5503). The BAP variables for Waves 1 to 5 were updated to reflect current availability of biospecimens, including urine collected from youth in Waves 4 and 5. Updated BAP variables in the Restricted-Use Master Linkage Files (DS0002) to reflect current availability of biospecimens, including urine collected from youth in Waves 4 and 5.

2022-05-11 Update to Restricted-Use Master Linkage Files to include variables for new files in the BRUF collection (ICPSR 36840) including single-wave weights for the Wave 4 Biomarker Core.

2022-04-21 Update to Restricted-Use Master Linkage Files (DS0002) to include variables for new files in the SCRUF collection (ICPSR 37519): Wave 5.5 Questionnaire data and weights (DS2001, DS2002, DS2111, DS2112, DS2121, DS2122, DS2221, and DS2222), Wave 5.5 State Identifier data (DS2401 and DS2402), PATH-ATS data and weights (DS3001, DS3111, and DS3121), and PATH-ATS State Identifier data (DS3401). Also included is one new variable to reflect addition of a new file in the RUF collection (ICPSR 36231): Wave 5 Ever/Never Reference (DS5503). The BAP variables for Waves 1 to 4 were updated to reflect current availability of biospecimens. Updated BAP variables in the Public-Use Master Linkage Files (DS0001) to reflect current availability of biospecimens.

2021-12-16 Update to Restricted-Use Master Linkage Files to include variables for new files in the BRUF collection (ICPSR 36840): additional Wave 3 Urine Panel Assays and accompanying weights (DS3038, DS3023, and DS3024) and Wave 5 Urine Collection (DS5001), Urine Weights (DS5021 and DS5022), and Urine Panel Assays (DS5032, DS5033, DS5036, and DS5037).

2021-09-29 Update to Public-Use Master Linkage Files to include variables for Wave 5 (ICPSR 36498).

2021-06-03 Update to Restricted-Use Master Linkage Files to include new variables related to Biomarker Restricted-Use Files (ICPSR 36840) additional Wave 4 Urine Panel Assays (DS4035 and DS4037).

2021-04-27 ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection:

  • Checked for undocumented or out-of-range codes.
Hide

There are no weights associated with the Master Linkage Files.

Hide

Notes

  • The public-use data files in this collection are available for access by the general public. Access does not require affiliation with an ICPSR member institution.

  • One or more files in this data collection have special restrictions. Restricted data files are not available for direct download from the website; click on the Restricted Data button to learn more.