EE: Measurement Interval (2005)


Delikanaki-Skaribas E. The role of sampling duration on basal metabolic rate measurement error. Thesis dissertation. 2001 (1a).

Study Design:
Retrospective Cohort Study
B - Click here for explanation of classification scheme.
Quality Rating:
Neutral NEUTRAL: See Quality Criteria Checklist below.
Research Purpose:
  • Estimate the reliability and measurement error associated with measuring BMR in elderly men
  • Examine the effects of sampling duration within a day and day-to-day variance on the accuracy of measuring BMR of elderly patients
  • Define the BMR measurement error associated with sampling duration that varies in time and can be generalized to include day-to-day measurement error.
Inclusion Criteria:
  • More than 55 years
  • Male
  • Medications allowed
  • Sampling duration of BMR for more than 12 minutes of continuous data
  • Perform both BMR tests within seven days for each patient
  • BMR performed early in the morning under standardized conditions
  • Sampling duration was recorded within 30-second intervals
  • Signed a consent form.
Exclusion Criteria:
  • Life expectancy less than 30 days (per admitting MD)
  • Anemia (HCT less than 25%), documented renal SeCreatine higher than 3.0mg per dL) or liver failure
  • Malignancy, immunodeficiency (e.g., HIV infection, steroid treatment, plasma cell dyscrasia)
  • Known chronic infection requiring antibiotic treatment
  • Estimated length of stay less than six weeks or predicted non-compliance with study measures
  • Advanced CHF or malabsorption syndromes directly affecting nutritional state
  • Refusal to sign a consent form.
Description of Study Protocol:


Selected from database.


Non-concurrent cohort study (database study).

Statistical Analysis

  •  Reliability and standard error of measurement (SEM) was calculated using generalizability (G) theory
    • G-theory differs from the classical theory because it partitions the undifferentiated error and identifies multiple sources of variability of an unlimited number of factors. Dimensions associated with the sources of error are called facets. In the present design, there are seven sources of variation: Among subjects, among intervals, among days, subjects X intervals, subjects X days, intervals X days, subjects X intervals X days. Variance components in the G-study were calculated from expected mean squares (EMS).
    • EMS = the value of the mean square that would be obtained, on average, by repeatedly analyzing samples from the same population and universe with the same design
  • Repeat measures (ANOVA) to examine significant differences between days, among intervals and day-by-intervals interaction and multivariate tests (MANOVA) was used to examine significant differences among intervals and the interaction
  • The measurement protocol was performed on two different days within a seven-day period.


Data Collection Summary:

 Timing of Measurements

  • The measurement protocol was performed on two different days within a seven-day period
  • Body composition assessment was performed with dual energy X-ray absorptiometry (DEXA) within a week of BMR measurement.

Dependent Variables/Outcomes

  • Reliability, measurement error and variation in measured REE [(VO2, L per minute STPD), VCO2 (L per minute STPD; ml per kg per minute), respiratory exchange ratio (RER), VCO2/VO2), ventilation VE (L per minute STPD)]
  • BMR (kcal per day) and percent predicted BMR (kcal per day)
  • Resting energy expenditure:
    • IC type: Medical Graphics CardiO2 System; Breeze Ex Software system with three different mask sizes with use of sealing gel
    • Equipment of Calibration: Yes, with three-point technique
    • Coefficient of variation using std gases: Yes
    • Rest before measure (state length of time rested if available): “Relax, minimize movements and breathe normally from mouth”
    • Measurement length: 30 to 40 minutes total test duration; a 12-minute interval (24- to 30-second intervals) was selected from each patient and analyzed
    • Machine measured length: 30-second interval
    • Steady state: Not specified; however, reports “sampling duration of the gas exchange was a 30-second interval to reduce large breath-by-breath variability due to tidal volume”
    • “Patients asked to minimize movements”
    • Fasting length: 12 hours
    • Exercise restrictions XX hr prior to test? Not applicable
    • Room temp: Equipment adjusted for temperature
    • No. of measures within the measurement period:
    • Were some measures eliminated?
    • Was a set of measurements averaged?
    • If average, identify length of each measure and number of measurements?
    • Coefficient of variation in subjects measures?
    • Training of measurer? “Two testers collected BMR using the same standardized testing procedures.”
    • Subject training of measuring process? Each volunteer received an explanation of the time commitment and procedures involved in the study, and these were explained again the day before the testing
    • Monitored heart rate? Not specified
    • Body temperature? Not specified
    • Medications administered? Not specified, but most likely yes, given the population. 

Independent Variables

Seven sources of variation: Among subjects, among intervals, among days, subjects X intervals, subjects X days, intervals X days, subjects X intervals X days.

Description of Actual Data Sample:
  • Final Sample: N=35 male long-term rehabilitation inpatients
  • Age: Mean 74.66±6.89 years SD (range 56 to 87 years).

Other Relevant Demographics

  • Admitting diagnoses included: BKA, DM, HTN, CVA, Dementia, CHF, CAD, Parkinson's, PVD, pressure ulcer, peptic ulcer, COPD, malnutrition, both BKA, depression, deconditioning, IDDM, fracture and osteoporosis
  • History of alcohol abuse was obtained in 31 of 35 subjects.



  Mean ±SD Range
Weight, kg  78.95±18.1  45.6 to 128.6
Height, m 1.77±0.07 1.60 to 1.91
BMI 25.18±5.36 15.75 to 39.15
Fat-free body mass, percent 69.89±9.13 43.88 to 85.83
Fat mass 32.87±12.3 14.17 to 63.9





Summary of Results:

Measurement Process

  • The multivariate test showed no significant interaction between days and intervals (F=0.644; P=0.824) and the variation among intervals was within chance variance (F-0.928; P=0.579)
  • The estimated variance components from a G-study reflect the magnitude of error in generalizing from a person’s average score on a single day or interval to his interval score. The variance component for subjects accounted for 31% of the total variance, showing that patients differed in the VO2 uptake during rest; About 2% of the total variance was associated with the random error variance from the interaction of subjects by days.

Number of Measurements

  • Increasing the measurement schedule to more than one day increases the reliability estimates and decreases the measurement error
  • A sampling duration of 12 minutes on a single day yields a G-coefficient of approximately 0.59, the same level of reliability would be obtained with a sampling duration of 0.5 minutes in a three-day measurement schedule
  • The 20-minute sampling duration measurement error on a single day has an expected measurement error of around 20ml per minute and about the same expected measurement error will occur in a sampling duration of about one minute in a two-day schedule or a sampling duration of 30 seconds for a three-day measurement
  • Two days of 20 minutes duration increases reliability to 0.75 and reduces the SEM to 150kcal per day; three days of 20 minutes sampling increases reliability to 0.82 with a SEM of 122kcal per day.

Length of Measurements

  • The reliability estimate for a single 30-second interval was 0.32 with standard error of 53ml per minute (i.e., approximately 373kcals per day error). As the sampling intervals increased, the G-coefficients also increased and the standard error decreased. The slope of change in the SEM is very sharp from 30 seconds up to about five minutes and then levels off.
  • The measurement error for a two-minute sampling duration on a single day has an expected measurement error around 30ml per minute
  • A 20-minute sampling duration is nearly as accurate as 40 to 60 minutes (i.e., SEM, 211kcal per day vs. 207kcal per day, respectively). There is no additional benefit of a substantial further reduction measurement error for sampling durations longer than 20 minutes. 
  • RQ was measured but not reported.

Measurement Timing

  • Sleep or rest: The mean BMR for day one was 1,376.7kcal per day (±311.7) and for day two was 1,415.9kcal per day (±360.1)
  • Physical activity: Not discussed
  • Food intake: Not discussed
  • Various times in the day: Not discussed.

Individual Characteristics

  • There is a large random variation of the patients’ BMR (kcal per day) from one day to another; approximately half of the variation (47%) was associated with the three-way interaction between people days and intervals and random source of variation
  • The sharp decrease in SEM for a single day slows down after five minutes and reaches a plateau about at a 20-minute sampling duration. There is no additional benefit for measuring BMR longer than 20 minutes.
  • A sampling duration of 20 minutes for a single-day measurement yields a G-coefficient of about 0.60 and a SEM of about 211kcal per day. The same G-coefficient and SEM is expected with a sampling duration of one minute with a two-day BMR measurement and a sampling duration of 30 seconds with a three-day BMR measurement.
  • Circulatory hormones: Not discussed
  • Breathing ability: Not discussed
  • Medical tests/procedures: None
  • Chemicals (medications/drugs/herbs, caffeine, nicotine, alcohol): Identified if alcohol was used.
Author Conclusion:
  • The standard error or measurement and G-coefficient should lead the decision-making for optimal sampling duration and numbers of test days
  • Previous findings by Atkinson et al, 1998 noted that standard error of measurement is an absolute reliability statistic that can be directly applied to future individuals to estimate the measurement error under similar conditions; the benefit of SEM statistic is that, unlike a reliability coefficient, it is unaffected by the range of measurements
  • The minimum measurement error associated with a single day measurement BMR in this population is about 200kcal per day
  • Our results show that day-to-day variation was the major source of measurement error
  • In the present study, between subjects variance is 33% and the rest is due to intra-individual variance
  • Indirect calorimetry measurement error depends on the sampling duration and the number of days measured. In order to decrease this measurement error, BMR should be measured within a minimum sampling duration of 20 minutes for a single-day testing. Greater accuracy can be achieved by measuring BMR over several days. Increasing sampling duration up to 20 minutes decreases measurement error substantially and produces a more accurate estimate of the subject’s true BMR.
Funding Source:
Reviewer Comments:


  • Had a good sample size and age range; included patients with multiple diseases
  • Strong knowledge and use of statistics producing important and applicable nutrition-focused (i.e., errors in kcal amounts) data.
  • Multi-ethnic population. 


  • Generalizable to male population residing in Veterans Sub-acute/Transitional Care Setting. 
  • Did not describe subject drop-outs, i.e., reason for non-compliance with demographic characteristics of age, disease, etc., i.e., were they related to tolerance of the mask with use of a sealing gel?
  • Included all weight categories (i.e., under, normal overweight and obese) in sample but did not discuss separately
  • Steady state was not directly defined, rather the issue was measured over time
  • Smoking in subjects was not identified
  • Statistical note: G-theory defines two-facet design and identifies the intervals and the days as the two facets; used to estimate variance components that were associated with the various source of variation
  • D-study estimated the reliability, given by the G-coefficient and the measurement error; determined how much the reliability of one-day BMR measurement was improved by forecasting G-coefficient or measurement error for different sampling intervals. 

Further Review Comments

A sentence in the Discussion section, page 55: 

"The daily amount of calories that BMR measurement is likely to differ due to chance with a 95% confidence interval in an elderly individual, following similar testing procedures and equipment is 465kcal for a sampling duration of five minutes and decreases to about 421kcal for a 20-minute sampling duration."

cannot be verified using Table 10 or Figure 6. Therefore, this comment and pertinent thesis information was submitted to an expert panel member. It was anticipated that the 465 is a typo and should read 455 and the researcher has doubled the kcal error to represent 95% confidence interval. Additional comments were: “Applicability of results are limited to measurement realities which include use of face masks with various weight classifications (i.e., inter-individual error variance related to air leaks occurring with a subject who has an emaciated face vs. not, inclusion of the first five minutes of acclimation, use of BMR conditions (i.e., 12-hour fast, six-hour rest, no recent movement), large age ranged considered to be elderly (i.e., 55 to 87 years). The quality rating worksheet was reviewed and  these limitations were reflected.

“We cannot assume that her findings are generally applicable to RMR measures
but limit our conclusions to her measurement realities: 

  • Face masks ALWAYS leak, even with sealing gel, and thus are rarely used for research or clinical measures. The error variance will also not the equal across individuals, since an emaciated face may have a totally different "fit" than an obese one.
  • I believe that she includes the first five minutes of acclimation (often ignored by others as unreliable) in all her measures higher than five minutes. Extra minutes of measurement will "dilute" out this error over time.
  • She uses BMR conditions, 12-hour fast, six-hour rest, no recent movement
  • She has quite a range of body comp realities in these patients (BMI of 15.75 = severe malnutrition, BMI of 39.15 almost severe obesity). Since she measured LBM by DXA, most investigators would "normalize" the BMR to LBM to reduce the impact of metabolically active tissue on the measure. 
  •  She also has quite the range in age, considering 55 to be elderly up to 87 years. This alone may add diversity to the measures."
Quality Criteria Checklist: Primary Research
Relevance Questions
  1. Would implementing the studied intervention or procedure (if found successful) result in improved outcomes for the patients/clients/population group? (Not Applicable for some epidemiological studies) Yes
  2. Did the authors study an outcome (dependent variable) or topic that the patients/clients/population group would care about? Yes
  3. Is the focus of the intervention or procedure (independent variable) or topic of study a common issue of concern to dieteticspractice? Yes
  4. Is the intervention or procedure feasible? (NA for some epidemiological studies) Yes
Validity Questions
1. Was the research question clearly stated? Yes
  1.1. Was (were) the specific intervention(s) or procedure(s) [independent variable(s)] identified? N/A
  1.2. Was (were) the outcome(s) [dependent variable(s)] clearly indicated? N/A
  1.3. Were the target population and setting specified? N/A
2. Was the selection of study subjects/patients free from bias? Yes
  2.1. Were inclusion/exclusion criteria specified (e.g., risk, point in disease progression, diagnostic or prognosis criteria), and with sufficient detail and without omitting criteria critical to the study? N/A
  2.2. Were criteria applied equally to all study groups? N/A
  2.3. Were health, demographics, and other characteristics of subjects described? N/A
  2.4. Were the subjects/patients a representative sample of the relevant population? N/A
3. Were study groups comparable? Yes
  3.1. Was the method of assigning subjects/patients to groups described and unbiased? (Method of randomization identified if RCT) N/A
  3.2. Were distribution of disease status, prognostic factors, and other factors (e.g., demographics) similar across study groups at baseline? N/A
  3.3. Were concurrent controls or comparisons used? (Concurrent preferred over historical control or comparison groups.) N/A
  3.4. If cohort study or cross-sectional study, were groups comparable on important confounding factors and/or were preexisting differences accounted for by using appropriate adjustments in statistical analysis? N/A
  3.5. If case control study, were potential confounding factors comparable for cases and controls? (If case series or trial with subjects serving as own control, this criterion is not applicable.) N/A
  3.6. If diagnostic test, was there an independent blind comparison with an appropriate reference standard (e.g., "gold standard")? N/A
4. Was method of handling withdrawals described? Yes
  4.1. Were follow-up methods described and the same for all groups? N/A
  4.2. Was the number, characteristics of withdrawals (i.e., dropouts, lost to follow up, attrition rate) and/or response rate (cross-sectional studies) described for each group? (Follow up goal for a strong study is 80%.) N/A
  4.3. Were all enrolled subjects/patients (in the original sample) accounted for? N/A
  4.4. Were reasons for withdrawals similar across groups? N/A
  4.5. If diagnostic test, was decision to perform reference test not dependent on results of test under study? N/A
5. Was blinding used to prevent introduction of bias? Yes
  5.1. In intervention study, were subjects, clinicians/practitioners, and investigators blinded to treatment group, as appropriate? N/A
  5.2. Were data collectors blinded for outcomes assessment? (If outcome is measured using an objective test, such as a lab value, this criterion is assumed to be met.) N/A
  5.3. In cohort study or cross-sectional study, were measurements of outcomes and risk factors blinded? N/A
  5.4. In case control study, was case definition explicit and case ascertainment not influenced by exposure status? N/A
  5.5. In diagnostic study, were test results blinded to patient history and other test results? N/A
6. Were intervention/therapeutic regimens/exposure factor or procedure and any comparison(s) described in detail? Were interveningfactors described? No
  6.1. In RCT or other intervention trial, were protocols described for all regimens studied? N/A
  6.2. In observational study, were interventions, study settings, and clinicians/provider described? N/A
  6.3. Was the intensity and duration of the intervention or exposure factor sufficient to produce a meaningful effect? N/A
  6.4. Was the amount of exposure and, if relevant, subject/patient compliance measured? N/A
  6.5. Were co-interventions (e.g., ancillary treatments, other therapies) described? N/A
  6.6. Were extra or unplanned treatments described? N/A
  6.7. Was the information for 6.4, 6.5, and 6.6 assessed the same way for all groups? N/A
  6.8. In diagnostic study, were details of test administration and replication sufficient? N/A
7. Were outcomes clearly defined and the measurements valid and reliable? Yes
  7.1. Were primary and secondary endpoints described and relevant to the question? N/A
  7.2. Were nutrition measures appropriate to question and outcomes of concern? N/A
  7.3. Was the period of follow-up long enough for important outcome(s) to occur? N/A
  7.4. Were the observations and measurements based on standard, valid, and reliable data collection instruments/tests/procedures? N/A
  7.5. Was the measurement of effect at an appropriate level of precision? N/A
  7.6. Were other factors accounted for (measured) that could affect outcomes? N/A
  7.7. Were the measurements conducted consistently across groups? N/A
8. Was the statistical analysis appropriate for the study design and type of outcome indicators? Yes
  8.1. Were statistical analyses adequately described and the results reported appropriately? N/A
  8.2. Were correct statistical tests used and assumptions of test not violated? N/A
  8.3. Were statistics reported with levels of significance and/or confidence intervals? N/A
  8.4. Was "intent to treat" analysis of outcomes done (and as appropriate, was there an analysis of outcomes for those maximally exposed or a dose-response analysis)? N/A
  8.5. Were adequate adjustments made for effects of confounding factors that might have affected the outcomes (e.g., multivariate analyses)? N/A
  8.6. Was clinical significance as well as statistical significance reported? N/A
  8.7. If negative findings, was a power calculation reported to address type 2 error? N/A
9. Are conclusions supported by results with biases and limitations taken into consideration? No
  9.1. Is there a discussion of findings? N/A
  9.2. Are biases and study limitations identified and discussed? N/A
10. Is bias due to study's funding or sponsorship unlikely? Yes
  10.1. Were sources of funding and investigators' affiliations described? N/A
  10.2. Was the study free from apparent conflict of interest? N/A