Study Design:
- Click here for explanation of classification scheme.
Quality Rating:
Research Purpose:
  1. Evaluate the consistency of measurements in adult, healthy volunteers taken over a 3-day period using a mouthpiece/noseclips, face mask, and ventilated canopy.
  2. To systematically evaluate the reactivity of the testing procedure and whether this reactivity varies as a function of the collection system used.
  3. To determine the degree of reliability of a single measurement of varying duration using a mouthpiece, face mask, and canopy data collection system.


  • Steady state: Not discussed
Inclusion Criteria:
  1. Understand and give written consent
  2. Healthy (screened for respiratory disorders, medication usage known to affect metabolic rates, and any medical condition thought to be associated with metallic rate—diabetes, eating disorders.
  3. Adult volunteers
  4. Females
  5. Nonsmoking
  6. Weight falling between the 25th and 75th percentile norms.
Exclusion Criteria:
  1. Refusal to consent
  2. Not meeting inclusion criteria
  3. Diseases in subjects that were excluded: diabetes, eating disorders
  4. Medications excluded: those known to affect metabolic rates
  5. Smokers
  6. Overweight or underweight.
Description of Study Protocol:

Subjects were randomly assigned to either face mask, mouthpiece/nose clip, or ventilated canopy collection systems for 45 min a day over 3 days.


  • Ht measured? Likely
  • Wt measured? Likely
  • Fat-free mass measured? Not discussed


  • Monitored heart rate? Not discussed
  • Body temperature? Not discussed
  • Medications administered? Not discussed

Resting energy expenditure

  • IC type: subjects randomly assigned to one of 3 collection methods: 1) mouthpiece/noseclips, 2) face mask, 3) ventilated canopy
  • Equipment of Calibration: Yes prior to each test; in addition leakage carefully controlled in all 3 collection systems; throughout testing subjects were monitored to insure leakage had not occurred
  • Coefficient of variation using std gases: Yes
  • Rest before measure: 5 min prior to testing
  • Measurement length: 50 consecutive minutes a day for 3 consecutive days at the same time of day with the same collection system
  • Length of measurement period: 10-, 20-, and 40-min periods following a 5 minute adaptation period
  • Steady state: not discussed
  • Fasting length: abstain from food and caffeinated beverages for 5 hr prior to testing
  • Exercise restrictions: instructed to avoid any strenuous, aerobic-type physical activity for at least 5 hr prior to testing
  • Room temp: thermally neutral
  • No. of measures within the measurement period: 1
  • Were some measures eliminated? No
  • Were sets of measurements averaged? Yes, 10 min increments
  • Training of measurer? Not mentioned
  • Subject training of measuring process? Yes; while resting before testing (5 min)


  • Not assessed
Data Collection Summary:

Outcome(s) and other measures

  1. VO2, ml/min, VCO2, ml/min, RQ, l/min, REE, kcal/d
  2. Gender ht, wt and age (but not reported in paper)

Blinding used: No

Description of Actual Data Sample:
  • N=30 healthy volunteer adults
  • N=30 females

Statistical tests

A 3 (system) x 3 (time) x 2 first 5 min vs. 45 subsequent min) repeated measures ANOVA. Intraclass correlation coefficients (ICC) using a one-way ANOVA for within-subject and between-subject variances. The ICC was then computed as the ratio of these variance components. An ICC of 1.0 represents perfect reliability and an ICC of zero or less is indicative of no reliability. An ICC of 0.7 is considered an acceptable level of reliability.

Summary of Results:


Not provided; but inclusion requirement was weight was to be within 25th and 75th percentile norms for weight

Results indicated no significant effects for collection procedure, day or interactions between method and day.

There were no significant differences between methods on the first 5 min or between methods on the last 40 min.

However, all 3 methods had significantly elevated REE measurements in the first 5 min of testing compared with the subsequent 40 min. (see below)

Collection Mtd* Mean REE(kcal/d)±SD, first 5 min. Mean REE(kcal/d)±SD, first 40 min. P Reduction
Mouthpiece 1521.71±270.86 1250.86±135.68 p<0.001 270.86 kcal/d
Face mask 1509.67±171.85 1337.82±182.85 p<0.05 171.85 kcal/d
Canopy 1674.97±379.01 1295.96±162.97 p<0.000 379.01 kcal/d

*mean REE=average REE over 3 d measurement period

The face mask had the least drop in resting energy expenditure.

ICCs For Collection Method

10-min 20-min 40-min
Mouthpiece/noseclips 0.75 0.62 0.40
Face mask 0.43 0.60 0.79
Canopy 0.44 0.75 0.79

As indicated above, the mouthpiece shows acceptable reliability (>0.7) for the 10-min period, but the reliability progressively decreased as the measurement duration increased.

The face mask shows poor reliability for the 10-min, an increase in reliability with a 20-min measurement, and strong reliability with a 40-min measurement length.

For the canopy, the 10-min period showed poor reliability, but both the 20- and 40-min measurement periods were strongly reliable.

Author Conclusion:

As stated by the author in body of report:

  • “The results of the current study indicate that there are no significant REE differences between data collected with mouthpiece/noseclips, a face mask, or a canopy collection system. Additionally, there were no differences between systems over time.”
  • “However, across all 3 days of measurement, the first 5 min of data collection on all 3 systems were more reactive (i.e., significantly higher) than the subsequent 40 min of assessment. Reliability analyses indicated that following a 5-min adjustment period, acceptable reliability coefficients were obtained after 20 min of continuous data collection in the ventilated canopy and after 40 min with the face mask. Reliability coefficients with the mouthpiece and noseclips steadily decreased as the length of the testing session increased.”
  • “The finding that the first 5 min of data collection were significantly higher than the subsequent 40 min despite the fact that the subjects rested prior to assessment and the procedure was explained to them prior to initiation, confirms earlier observations which indicated a need to acclimate to the testing procedure once the collection system was attached and also indicated that values obtained during the first 5 min were likely to be unstable and unrepresentative regardless of the method used.”
  • “It is recommended that a 5-min acclimation period following attachment of the collection system be conducted in all clinical and research studies.”
  • “Results also indicated that reliability of both the face mask and the canopy increased as test length increased. In contrast, the data collected with the mouthpiece/noseclips method indicated that as test duration increased, reliability decreased....likely due to increasing discomfort on the part of the subject with the mouthpiece and noseclips as test length increased (acceptable reliability obtained after 10-min). Given that most clinical and research metabolic evaluations require relatively longer test durations, it is recommended that either a face mask or canopy be used for testing sessions lasting longer than 10 min.”
  • “Whereas the canopy system achieved acceptable levels of reproducibility following a 20-min test, it appears that both the face mask and canopy are reliable collection methods during longer tests (40 min).”
  • “On balance, the feature of reliability, comfort, and non obtusiveness my make the canopy system a more advantageous method for measuring REE in research and clinical applications.”
  • “In summary, it appears that both the face mask and canopy data collection systems yield reproducible metabolic data, provided that acclimation to the procedure be conducted.”
Funding Source:
Government: NHLBI, State of Tennessee
Reviewer Comments:


  • Good description of 3 metabolic collection systems; the calibration and evaluation of leakage throughout the testing sessions.


  • Limited generalizability; sample consisted of healthy females and generalizability to males and other populations is not known
  • Sample was restricted to nonobese and nonsmokers
  • Role of factors such as smoking, obesity (?)
  • Data on age, height, weight, and ethnicity not reported (potential confounders?)
Quality Criteria Checklist: Primary Research
Relevance Questions
  1. Would implementing the studied intervention or procedure (if found successful) result in improved outcomes for the patients/clients/population group? (Not Applicable for some epidemiological studies) Yes
  2. Did the authors study an outcome (dependent variable) or topic that the patients/clients/population group would care about? Yes
  3. Is the focus of the intervention or procedure (independent variable) or topic of study a common issue of concern to dieteticspractice? Yes
  4. Is the intervention or procedure feasible? (NA for some epidemiological studies) Yes
Validity Questions
1. Was the research question clearly stated? Yes
  1.1. Was (were) the specific intervention(s) or procedure(s) [independent variable(s)] identified? N/A
  1.2. Was (were) the outcome(s) [dependent variable(s)] clearly indicated? N/A
  1.3. Were the target population and setting specified? N/A
2. Was the selection of study subjects/patients free from bias? Yes
  2.1. Were inclusion/exclusion criteria specified (e.g., risk, point in disease progression, diagnostic or prognosis criteria), and with sufficient detail and without omitting criteria critical to the study? N/A
  2.2. Were criteria applied equally to all study groups? N/A
  2.3. Were health, demographics, and other characteristics of subjects described? N/A
  2.4. Were the subjects/patients a representative sample of the relevant population? N/A
3. Were study groups comparable? No
  3.1. Was the method of assigning subjects/patients to groups described and unbiased? (Method of randomization identified if RCT) N/A
  3.2. Were distribution of disease status, prognostic factors, and other factors (e.g., demographics) similar across study groups at baseline? N/A
  3.3. Were concurrent controls or comparisons used? (Concurrent preferred over historical control or comparison groups.) N/A
  3.4. If cohort study or cross-sectional study, were groups comparable on important confounding factors and/or were preexisting differences accounted for by using appropriate adjustments in statistical analysis? N/A
  3.5. If case control study, were potential confounding factors comparable for cases and controls? (If case series or trial with subjects serving as own control, this criterion is not applicable.) N/A
  3.6. If diagnostic test, was there an independent blind comparison with an appropriate reference standard (e.g., "gold standard")? N/A
4. Was method of handling withdrawals described? No
  4.1. Were follow-up methods described and the same for all groups? N/A
  4.2. Was the number, characteristics of withdrawals (i.e., dropouts, lost to follow up, attrition rate) and/or response rate (cross-sectional studies) described for each group? (Follow up goal for a strong study is 80%.) N/A
  4.3. Were all enrolled subjects/patients (in the original sample) accounted for? N/A
  4.4. Were reasons for withdrawals similar across groups? N/A
  4.5. If diagnostic test, was decision to perform reference test not dependent on results of test under study? N/A
5. Was blinding used to prevent introduction of bias? No
  5.1. In intervention study, were subjects, clinicians/practitioners, and investigators blinded to treatment group, as appropriate? N/A
  5.2. Were data collectors blinded for outcomes assessment? (If outcome is measured using an objective test, such as a lab value, this criterion is assumed to be met.) N/A
  5.3. In cohort study or cross-sectional study, were measurements of outcomes and risk factors blinded? N/A
  5.4. In case control study, was case definition explicit and case ascertainment not influenced by exposure status? N/A
  5.5. In diagnostic study, were test results blinded to patient history and other test results? N/A
6. Were intervention/therapeutic regimens/exposure factor or procedure and any comparison(s) described in detail? Were interveningfactors described? Yes
  6.1. In RCT or other intervention trial, were protocols described for all regimens studied? N/A
  6.2. In observational study, were interventions, study settings, and clinicians/provider described? N/A
  6.3. Was the intensity and duration of the intervention or exposure factor sufficient to produce a meaningful effect? N/A
  6.4. Was the amount of exposure and, if relevant, subject/patient compliance measured? N/A
  6.5. Were co-interventions (e.g., ancillary treatments, other therapies) described? N/A
  6.6. Were extra or unplanned treatments described? N/A
  6.7. Was the information for 6.4, 6.5, and 6.6 assessed the same way for all groups? N/A
  6.8. In diagnostic study, were details of test administration and replication sufficient? N/A
7. Were outcomes clearly defined and the measurements valid and reliable? Yes
  7.1. Were primary and secondary endpoints described and relevant to the question? N/A
  7.2. Were nutrition measures appropriate to question and outcomes of concern? N/A
  7.3. Was the period of follow-up long enough for important outcome(s) to occur? N/A
  7.4. Were the observations and measurements based on standard, valid, and reliable data collection instruments/tests/procedures? N/A
  7.5. Was the measurement of effect at an appropriate level of precision? N/A
  7.6. Were other factors accounted for (measured) that could affect outcomes? N/A
  7.7. Were the measurements conducted consistently across groups? N/A
8. Was the statistical analysis appropriate for the study design and type of outcome indicators? Yes
  8.1. Were statistical analyses adequately described and the results reported appropriately? N/A
  8.2. Were correct statistical tests used and assumptions of test not violated? N/A
  8.3. Were statistics reported with levels of significance and/or confidence intervals? N/A
  8.4. Was "intent to treat" analysis of outcomes done (and as appropriate, was there an analysis of outcomes for those maximally exposed or a dose-response analysis)? N/A
  8.5. Were adequate adjustments made for effects of confounding factors that might have affected the outcomes (e.g., multivariate analyses)? N/A
  8.6. Was clinical significance as well as statistical significance reported? N/A
  8.7. If negative findings, was a power calculation reported to address type 2 error? N/A
9. Are conclusions supported by results with biases and limitations taken into consideration? Yes
  9.1. Is there a discussion of findings? N/A
  9.2. Are biases and study limitations identified and discussed? N/A
10. Is bias due to study's funding or sponsorship unlikely? Yes
  10.1. Were sources of funding and investigators' affiliations described? N/A
  10.2. Was the study free from apparent conflict of interest? N/A