Study Design:
- Click here for explanation of classification scheme.
Quality Rating:
Research Purpose:
  1. Evaluate the consistency of measurements in adult, healthy volunteers taken over a 3-day period using a mouthpiece/noseclips, face mask, and ventilated canopy.
  2. To systematically evaluate the reactivity of the testing procedure and whether this reactivity varies as a function of the collection system used.
  3. To determine the degree of reliability of a single measurement of varying duration using a mouthpiece, face mask, and canopy data collection system.

To do this, 30 subjects were randomly assigned to either face mask, mouthpiece/nose clip, or ventilated canopy collection systems for 45 min a day over 3 days.

Inclusion Criteria:
  1. Understand and give written consent
  2. Healthy (screened for respiratory disorders, medication usage known to affect metabolic rates, and any medical condition thought to be associated with metallic rate—diabetes, eating disorders.
  3. Adult volunteers
  4. Females
  5. Nonsmoking
  6. Weight falling between the 25th and 75th %ile norms
Exclusion Criteria:
  1. Refusal to consent
  2. Not meeting inclusion criteria
  3. Diseases in subjects that were excluded: diabetes, eating disorders
  4. Medications excluded: those known to affect metabolic rates
  5. Smokers
  6. Overweight or underweight
Description of Study Protocol:


Steady state”- not discussed


  • Ht measured? not discussed
  • Wt measured? not discussed
  • Fat-free mass measured? not discussed


  • Monitored heart rate? not discussed
  • Body temperature? not discussed
  • Medications administered? Not discussed

Resting energy expenditure: 

  • IC type: subjects randomly assigned to one of 3 collection methods: 1) mouthpiece/noseclips, 2) face mask, 3) ventilated canopy
  • Equipment of Calibration: yes prior to each test; in addition leakage carefully controlled in all 3 collection systems; throughout testing subjects were monitored to insure leakage had not occurred
  • Coefficient of variation using std gases: Yes
  • Rest before measure: 5 min prior to testing
  • Measurement length: 50 consecutive minutes a day for 3 consecutive days at the same time of day with the same collection system assigned to.
  • Steady state: -not discussed
  • Fasting length: abstain from food and caffeinated beverages for 5 hr prior to testing
  • Exercise restrictions: instructed to avoid any strenuous, aerobic-type physical activity for at least 5 hr prior to testing
  • Room temp: thermally neutral
  • No. of measures within the measurement period: intraclass correlation coefficients (ICCs) were computed for each collection method on 10-, 20- and 40-min blocks following the 5-min adaptation period (not eliminated).
  • Were some measures eliminated? no
  • Were a set of measurements averaged?
  • REE across the 3 days were averaged to evaluate whether REE measurements in the first 5 min were different than in the subsequent 40 minute period
  • Training of measurer? not mentioned
  • Subject training of measuring process? Yes; while resting before testing (5 min)

DIETARY:  not assessed

Statistical tests: 

A 3 (system) x 3 (time) x 2 first 5 min vs. 45 subsequent min) repeated measures ANOVA. Reliabilities were determined by calculating the intraclass correlation coefficients (ICC). The ICC was calculated by using a one-way ANOVA to calculate both the within-subject and between-subject variances. The ICC was then computed as the ratio of these variance components. An ICC of 1.0 represents perfect reliability and an ICC of zero or less is indicative of no reliability. An ICC of 0.7 is considered an acceptable level of reliability.

ICCs were computed for each collection method on 10-, 20- and 40-min blocks following the 5-min adaptation period.

Data Collection Summary:

Outcome(s) and other measures

  1. IC-measured oxygen consumption (VO2, ml/min), carbon dioxide production (VCO2, ml/min), respiratory quotient (RQ, l/min), and resting energy expenditure (REE, kcal/d).
  2. Independent variables of gender height, weight and age were entered during calibration to permit accurate data calculations (but not reported in paper)

Blinding used: No
Description of Actual Data Sample:
  • N= 30 healthy volunteer adults
  • N = 30 females

No data provided on age, height, weight, ethnicity

Subjects randomly assigned (based on baseline predicted resting energy expenditure) to one of 3 metabolic collection procedures:

  1. Mouthpiece/noseclips (n = 10)
  2. Facemask (n= 10)
  3. Ventilated canopy (n = 10)
Summary of Results:


  • Not provided; but inclusion requirement was weight was to be within 25th and 75th percentile norms for weight


  • Number of measurements - 5 min acclimation; 10-, 20- and 40-min measurements for 3 consecutive days
  • Measurement Length - 50 consecutive minutes a day for 3 consecutive days at the same time of day with the same collection system assigned to.
  • Length of measurement period - 10-, 20-, and 40-min periods following a 5 minute adaptation period
  • Steady state—not mentioned
  • RQ—measured by IC


  • Sleep or rest - 5 min prior to testing
  • Physical activity - instructed to avoid any strenuous, aerobic-type physical activity for at least 5 hr prior to testing
  • Food intake - abstain from food and caffeinated beverages for 5 hr prior to testing
  • Various times in the day-testing at the same time of day; 3 consecutive days


  • Circulatory hormones - not mentioned
  • Breathing ability - spontaneously breathing
  • Medical tests/procedures - IC
  • Chemicals (medications/drugs/herbs, caffeine, nicotine, alcohol)—abstain from caffeinated beverages at least 5 hrs before testing; Medications excluded: those known to affect metabolic rates
  • Results indicated no significant effects for collection procedure, day or interactions between method and day.
  • There were no significant differences between methods on the first 5 min or between methods on the last 40 min.
  • However, all 3 methods had significantly elevated REE measurements in the first 5 min of testing compared with the subsequent 40 min. (see below)

Collection Mtd* Mean REE(kcal/d) + SD


  • 1st 5 min=1521.71+270.86
  • 40 min = 1250.86+135.68
  • p<0.001                                                                                             
  • reduction=270.86 kcal/d

Face mask

  • 1st 5 min=1509.67+171.85
  • 40 min = 1337.82+182.85
  • p<0.05
  • reduction=171.85 kcal/d


  • 1st 5 min=1674.97+379.01
  • 40 min = 1295.96+162.97
  • p<0.000
  • reduction=379.01 kcal/d

*mean REE = average REE over 3 d measurement periodan REE = average REE over 3 d measurement period

The face mask had the least drop in resting energy expenditure.

ICCs For Collection Method


  • 10-min   = 0.75
  • 20-min   = 0.62
  • 40-min   = 0.40

Face mask

  • 10-min   = 0.43
  • 20-min   = 0.60
  • 40-min   = 0.79


  • 10-min   = 0.44
  • 20-min   = 0.75
  • 40-min   = 0.79

As indicated above, the mouthpiece shows acceptable reliability (>0.7) for the 10-min period, but the reliability progressively decreased as the measurement duration increased.

The face mask show poor reliability for the 10-min, an increase in reliability with a 20-min measurement, and strong reliability with a 40-min measurement length.

For the canopy, the 10-min period showed poor reliability, but both the 20- and 40-min measurement periods were strongly reliable.
Author Conclusion:
  • The results of the current study indicate that there are no significant REE differences between data collected with mouthpiece/noseclips, a face mask, or a canopy collection system. Additionally, there were no differences between systems over time.
  • However, across all 3 days of measurement, the first 5 min of data collection on all 3 systems were more reactive (i.e., significantly higher) than the subsequent 40 min of assessment. Reliability analyses indicated that following a 5-min adjustment period, acceptable reliability coefficients were obtained after 20 min of continuous data collection in the ventilated canopy and after 40 min with the face mask. Reliability coefficients with the mouthpiece and noseclips steadily decreased as the length of the testing session increased.
  • The finding that the first 5 min of data collection were significantly higher than the subsequent 40 min despite the fact that the subjects rested prior to assessment and the procedure was explained to them prior to initiation, confirms earlier observations which indicated a need to acclimate to the testing procedure once the collection system was attached and also indicated that values obtained during the first 5 min were likely to be unstable and unrepresentative regardless of the method used. Thus, it is recommended that a 5-min acclimation period following attachment of the collection system be conducted in all clinical and research studies.
  • Results also indicated that reliability of both the face mask and the canopy increased as test length increased. In contrast, the data collected with the mouthpiece/noseclips method indicated that as test duration increased, reliability decreased. This is likely due to increasing discomfort on the part of the subject with the mouthpiece and noseclips as test length increased (acceptable reliability obtained after 10-min). Given that most clinical and research metabolic evaluations require relatively longer test durations, it is recommended that either a face mask or canopy be used for testing sessions lasting longer than 10 min.
  • Whereas the canopy system achieved acceptable levels of reproducibility following a 20-min test, it appears that both the face mask and canopy are reliable collection methods during longer tests (40 min).
  • On balance, the feature of reliability, comfort, and nonobrusiveness my make the canopy system a more advantageous method for measuring REE in research and clinical applications.
  • In summary, it appears that both the face mask and canopy data collection systems yield reproducible metabolic data, provided that acclimation to the procedure be conducted.
Funding Source:
Government: NHLBI, State of Tennessee
Reviewer Comments:


  • Good description of 3 metabolic collection systems; the calibration and evaluation of leakage throughout the testing sessions.


  • Limited generalizability; sample consisted of healthy females and generalizability to males and other populations is not known
  • Sample was restricted to nonobese and nonsmokers
  • Role of factors such as smoking, obesity (?)
  • Data on age, height, weight, ethnicity not reported (potential confounders?)
Quality Criteria Checklist: Primary Research
Relevance Questions
  1. Would implementing the studied intervention or procedure (if found successful) result in improved outcomes for the patients/clients/population group? (Not Applicable for some epidemiological studies) Yes
  2. Did the authors study an outcome (dependent variable) or topic that the patients/clients/population group would care about? Yes
  3. Is the focus of the intervention or procedure (independent variable) or topic of study a common issue of concern to dieteticspractice? Yes
  4. Is the intervention or procedure feasible? (NA for some epidemiological studies) Yes
Validity Questions
1. Was the research question clearly stated? Yes
  1.1. Was (were) the specific intervention(s) or procedure(s) [independent variable(s)] identified? N/A
  1.2. Was (were) the outcome(s) [dependent variable(s)] clearly indicated? N/A
  1.3. Were the target population and setting specified? N/A
2. Was the selection of study subjects/patients free from bias? Yes
  2.1. Were inclusion/exclusion criteria specified (e.g., risk, point in disease progression, diagnostic or prognosis criteria), and with sufficient detail and without omitting criteria critical to the study? N/A
  2.2. Were criteria applied equally to all study groups? N/A
  2.3. Were health, demographics, and other characteristics of subjects described? N/A
  2.4. Were the subjects/patients a representative sample of the relevant population? N/A
3. Were study groups comparable? Yes
  3.1. Was the method of assigning subjects/patients to groups described and unbiased? (Method of randomization identified if RCT) N/A
  3.2. Were distribution of disease status, prognostic factors, and other factors (e.g., demographics) similar across study groups at baseline? N/A
  3.3. Were concurrent controls or comparisons used? (Concurrent preferred over historical control or comparison groups.) N/A
  3.4. If cohort study or cross-sectional study, were groups comparable on important confounding factors and/or were preexisting differences accounted for by using appropriate adjustments in statistical analysis? N/A
  3.5. If case control study, were potential confounding factors comparable for cases and controls? (If case series or trial with subjects serving as own control, this criterion is not applicable.) N/A
  3.6. If diagnostic test, was there an independent blind comparison with an appropriate reference standard (e.g., "gold standard")? N/A
4. Was method of handling withdrawals described? No
  4.1. Were follow-up methods described and the same for all groups? N/A
  4.2. Was the number, characteristics of withdrawals (i.e., dropouts, lost to follow up, attrition rate) and/or response rate (cross-sectional studies) described for each group? (Follow up goal for a strong study is 80%.) N/A
  4.3. Were all enrolled subjects/patients (in the original sample) accounted for? N/A
  4.4. Were reasons for withdrawals similar across groups? N/A
  4.5. If diagnostic test, was decision to perform reference test not dependent on results of test under study? N/A
5. Was blinding used to prevent introduction of bias? No
  5.1. In intervention study, were subjects, clinicians/practitioners, and investigators blinded to treatment group, as appropriate? N/A
  5.2. Were data collectors blinded for outcomes assessment? (If outcome is measured using an objective test, such as a lab value, this criterion is assumed to be met.) N/A
  5.3. In cohort study or cross-sectional study, were measurements of outcomes and risk factors blinded? N/A
  5.4. In case control study, was case definition explicit and case ascertainment not influenced by exposure status? N/A
  5.5. In diagnostic study, were test results blinded to patient history and other test results? N/A
6. Were intervention/therapeutic regimens/exposure factor or procedure and any comparison(s) described in detail? Were interveningfactors described? Yes
  6.1. In RCT or other intervention trial, were protocols described for all regimens studied? N/A
  6.2. In observational study, were interventions, study settings, and clinicians/provider described? N/A
  6.3. Was the intensity and duration of the intervention or exposure factor sufficient to produce a meaningful effect? N/A
  6.4. Was the amount of exposure and, if relevant, subject/patient compliance measured? N/A
  6.5. Were co-interventions (e.g., ancillary treatments, other therapies) described? N/A
  6.6. Were extra or unplanned treatments described? N/A
  6.7. Was the information for 6.4, 6.5, and 6.6 assessed the same way for all groups? N/A
  6.8. In diagnostic study, were details of test administration and replication sufficient? N/A
7. Were outcomes clearly defined and the measurements valid and reliable? Yes
  7.1. Were primary and secondary endpoints described and relevant to the question? N/A
  7.2. Were nutrition measures appropriate to question and outcomes of concern? N/A
  7.3. Was the period of follow-up long enough for important outcome(s) to occur? N/A
  7.4. Were the observations and measurements based on standard, valid, and reliable data collection instruments/tests/procedures? N/A
  7.5. Was the measurement of effect at an appropriate level of precision? N/A
  7.6. Were other factors accounted for (measured) that could affect outcomes? N/A
  7.7. Were the measurements conducted consistently across groups? N/A
8. Was the statistical analysis appropriate for the study design and type of outcome indicators? Yes
  8.1. Were statistical analyses adequately described and the results reported appropriately? N/A
  8.2. Were correct statistical tests used and assumptions of test not violated? N/A
  8.3. Were statistics reported with levels of significance and/or confidence intervals? N/A
  8.4. Was "intent to treat" analysis of outcomes done (and as appropriate, was there an analysis of outcomes for those maximally exposed or a dose-response analysis)? N/A
  8.5. Were adequate adjustments made for effects of confounding factors that might have affected the outcomes (e.g., multivariate analyses)? N/A
  8.6. Was clinical significance as well as statistical significance reported? N/A
  8.7. If negative findings, was a power calculation reported to address type 2 error? N/A
9. Are conclusions supported by results with biases and limitations taken into consideration? Yes
  9.1. Is there a discussion of findings? N/A
  9.2. Are biases and study limitations identified and discussed? N/A
10. Is bias due to study's funding or sponsorship unlikely? Yes
  10.1. Were sources of funding and investigators' affiliations described? N/A
  10.2. Was the study free from apparent conflict of interest? N/A