Voice Quality Metrics and Their Clinical Interpretation in Phonalyze Reports

The assessment of voice has progressed substantially past its basic foundation of audibly hearing patient voices. Speech-language pathologists currently use advanced voice evaluation techniques to measure vocal function with exact quantitative precision. The voice quality metrics of vocal sound generate fundamental data which extends the information obtained through traditional perceptual voices evaluations using Phonalyze for complete vocal evaluation.

Why Objective Voice Metrics Matter

  1. Quantitative insights into vocal fold function that supplements subjective listening
  2. Reliable, non-invasive data on vocal fold vibration patterns and overall voice quality
  3. Ability to distinguish heathy voices from dysphonic voices with greater precision
  4. Tracking capability for subtle changes over time, creating an objective index of therapy outcomes
  5. Evidence-based approach to voice therapy with metrics that correlate with perceptual improvements
Why Voice Metrics Matter

Research shows that metrics like Cepstral Peak Prominence (CPP) and Harmonics-to-Noise Ratio have strong correlations with perceptual improvements following voice therapy interventions. By incorporating these objective measurements, clinicians can significantly reduce subjectivity in voice evaluation and strengthen their evidence-based practice.

Voice Quality Metrics

Cepstral Measures: CPP and CPPS

Definition: Cepstral Peak Prominence (CPP) measures the height of the dominant cepstral peak above the regression line of the cepstrum. It reflects the clarity of harmonic structure in the voice. Smoothed CPP (CPPS) is calculated on running speech and smooths rapid fluctuations.

Healthy Range:
Sustained vowels: CPP values typically above 14 dB (when measured with Praat)
Running speech: CPPS values >= 4.0 dB indicate normal voice
Children generally have higher CPP/CPPS, reflecting stronger harmonics

Clinical Interpretation: Voice signals characterized by higher CPP/CPPS scores display periodic stability along with reduced dysphonia manifestations. Weak and noisy harmonic structure exists in voices when CPP/CPPS numbers are low. These cases often result in breathy or rough quality of voice. Research demonstrates that CPP values below a specific dB threshold (9-14 dB) consistently verify dysphonic voice quality.

Pitch (Fundamental Frequency)

Definition: Pitch metrics describe the fundamental frequency (F0) of voicing in Hertz. Reports typically include average (mean), middle value (median), and variability (range) of F0 over the speech sample.

Healthy Range:
Adult males: 85-155 Hz (typical mean around 120-130 Hz)
Adult females: 165-255 Hz (typical mean around 200-220 Hz)
Children: Higher than adults (often >250 Hz), declining to adult levels during adolescence
Older men: Often show slight pitch increases compared to younger men
Older women: Pitch tends to remain relatively stable with age

Clinical Interpretation: Voice level during typical situations becomes visible through mean/median pitch measurements. When voice characteristics deviate from age and gender expectations it becomes a potential indicator of both functional change and pathology. When vocal flexibility is reduced voice production becomes restricted which might signal both vocal fold rigidity and structural abnormalities.

pitch-patterns

What It Identifies: Pitch measures can flag neurological disorders (monopitch in Parkinson’s), hormonal issues, or unhealthy vocal behaviors. They can also highlight strain/compensation in hyperfunctional voice or weakness in hypofunctional dysphonia.

Frequency Perturbation: Jitter

Definition: The measurement of Jitter establishes frequency variations between adjacent periods of operation. The measurement of Local Jitter reveals the average magnitude between periods that follow each other. The smoothed variants RAP (Relative Average Perturbation) and PPQ5 (Pitch Perturbation Quotient) perform an average over 3-cycle or 5-cycle periods.

Healthy Range:
Normal voices: Very low jitter, typically below 0.5% during sustained phonation
Children: Higher when very young, decreasing with age (boys: ~1.14% ages 4-6 to ~0.82% ages 10-12; girls: ~0.76% to 0.41% same ages)
Adults: Both males and females often below 0.5-1.0% in comfortable phonation
Older adults: Similar to younger adults, with minimal age effects

Clinical Interpretation: Jitter reflects vocal stability. Higher jitter indicates more irregular vibration, often perceived as roughness or aperiodicity. Abnormal jitter suggests organic or neuromuscular issues and can increase with vocal fatigue or misuse.

jitter vs shimmer
Source- ResearchGate

Amplitude Perturbation: Shimmer

Definition: Shimmer measures cycle-to-cycle variability in amplitude. Local Shimmer is the percent variation between adjacent peak amplitudes. APQ3 and APQ5 (Amplitude Perturbation Quotient) smooth shimmer across 3 or 5 cycles.

Healthy Range:
Children: Varies by age and gender (boys: from ~0.5% at age 4-9 to ~2.1% at age 10-12; girls: ~0.15%)
Adults: Typically below 3-5%
Older adults: Similar to midlife norms

Clinical Interpretation: Shimmer reflects amplitude (loudness) stability. Breathiness and air leakage appear alongside increased shimmer measurements in voice analysis. Perceptual assessments reveal that high levels of shimmer produce a breathing or hoarse quality to speech.

What It Identifies: Increased shimmer links to breathy and hoarse qualities. Pathologies causing incomplete glottal closure (nodules, edema, weak fold tone) or irregular oscillation manifest as higher shimmer. Shimmer helps detect hyperfunction vs. hypofunction, with hypofunctional (breathy) voices often showing higher values.

Noise Measures: HNR and NHR

Definition: The voice’s Harmonics-to-Noise Ratio (HNR) measures periodic (harmonic) energy against noise energy by using decibel (dB) units. The inverse linear measure called Noise-to-Harmonics Ratio (NHR) rises in proportion to increases in vocal noise level.

Healthy Range:
Adults: HNR typically around 15-20 dB or higher; NHR correspondingly low (0.01-0.1)
Children: Slightly lower HNR than adults (mean NHR ~0.10-0.14, i.e., HNR ~9-10 dB)
Gender differences: Females often have slightly higher HNR than males
Older adults: Similar to middle-aged norms, though age-related atrophy can lower HNR slightly

Clinical Interpretation: HNR/NHR index breathiness and noise. High HNR indicates a clean, periodic voice, while low HNR signals turbulent airflow or additive noise (breathiness, hoarseness). Markedly low HNR (e.g., <10 dB) suggests a breathy or strained voice quality.

Voicing Continuity Measures

Definition: The frequency of interrupted phonation identifies voice breaks. The unmeasured percentage of analysis frames reflects the amount of speech frames without detected phonation. Both assess phonation continuity stability.

Healthy Range:
Sustained vowel: Voice break count should be zero or very low
Unvoiced frames: Essentially 0% in sustained phonation
Connected speech: Some voiced-unvoiced transitions expected for unvoiced consonants

Clinical Interpretation:Voice breaks indicate severe phonatory instability. Frequent breaks or high unvoiced fraction suggest a pathological inability to maintain voicing, as seen in creaky or diplophonic voices.

What It Identifies: High voice break counts identify disorders with interrupted phonation, such as spasmodic dysphonia, severe muscle tension dysphonia, and neurological disorders affecting phonation.

Intensity Metrics

Definition: Intensity metrics describe speech signal loudness, typically in decibels (dB). They include mean intensity (average loudness), minimum intensity, and maximum intensity.

Healthy Range:
Normal conversation: About 60-70 dB SPL
Sustained vowel: Mean intensity around 60-70 dB
Maximum intensity: Adult males up to ~80-85 dB; adult females around 75-80 dB
– Children: Generally speak somewhat softer, with loudness increasing with age
– Older adults: Comparable to younger adults, though some age-related hypophonia may occur

Clinical Interpretation: Intensity reflects vocal fold adduction and respiratory support. Low mean intensity may indicate glottal insufficiency or neurological weakness. Excessively high intensity could suggest hyperadduction or strain.

What It Identifies: Abnormal intensity measures detect breathy versus pressed voice quality. Tracking intensity over therapy can quantify gains in projection or reductions in strain.

Pulses

Definition: Pulses are individual glottal cycle markers detected in the signal. Periods are the time intervals between consecutive pulses (essentially instantaneous pitch).

Healthy Range:
Pulse count should correspond to duration divided by period (no missing pulses)
Period durations should be very regular with minimal variation
Variance of period should be very low in healthy phonation

Clinical Interpretation: Irregular or missing pulses reveal subharmonics or diplophonia. Highly variable period duration indicates unstable phonation.

What It Identifies: These measures catch voicing instabilities like diplophonia or double articulation problems that distort the pulse train.

Voice Quality Metrics Reference Table

Voice Quality MetricsDefinitionHealthy RangesClinical SignificanceIdentifies
CPPHeight of dominant cepstral peak above baselineAdults: ≥14 dB
Children: Higher than adults
High CPP = clear voice
Low CPP = dysphonia
Low CPP signals dysphonia (breathy/rough voice)
CPPSCPP measured on running speechAdults: ≥4.0 dB
Children: Higher
As above; sensitive to voice quality in speechLow CPPS (<4 dB) indicates dysphonia
Pitch (F0)Average fundamental frequencyChildren: >250 Hz
Adult Male: 85-155 Hz
Adult Female: 165-255 Hz
Habitual pitch levelMonopitch suggests neurological disorders
Deviations suggest pathology
Pitch RangeDifference between highest/lowest F0Typical: 1.5-2 octavesIndicates flexibilityReduced: Parkinson’s, fatigue
Excessive: tension, dysphoria
JitterCycle-to-cycle period variabilityChildren: 0.5-1.2%
Adults: <0.5%
High jitter = irregular frequencyElevated jitter indicates aperiodicity/roughness
ShimmerCycle-to-cycle amplitude variabilityChildren: 0.1-2%
Adults: ≤3-5%
High shimmer = amplitude instabilityFlags breathy/hoarse voice
Low shimmer with high F0 may signal pressed phonation
HNRHarmonic vs. noise energy ratioAdults: 15-20+ dB
Children: 8-12 dB
High HNR = clear phonation
Low HNR = noisy voice
Low HNR (<10 dB) indicates turbulent/breathy voice
NHRNoise to harmonics ratioAdults: 0.01-0.05
Children: 0.10-0.14
High NHR = increased turbulenceSignals breathiness or hoarseness
Voice BreaksInterruptions in phonation0 (ideal) in sustained vowelEven a few breaks indicate control lossIdentifies severe dysphonia
IntensityLoudness measures (dB)Conversation: 60-70 dB
Adults comfortable: 60-70 dB
Low = hypophonia
High = hyperfunction
Low intensity = possible fold bowing
High max = potential hyperadduction
Pulses/PeriodsCount of glottal cycles and timingPulses ≈ duration × F0
Periods steady
Irregular pulses indicate double pitchesDetects diplophonia

Healthy vs. Pathological Voice Quality

Voice Quality metrics provide clear distinctions between healthy and dysphonic voices. Typically, healthy voices demonstrate:

  • Low jitter and shimmer values
  • High HNR and CPP measurements
  • Stable voicing with few or no breaks
  • Appropriate pitch for age and gender

Pathological voices deviate from these norms in characteristic patterns:

Breathy voice:

  • Increased shimmer and NHR due to turbulent airflow
  • Reduced CPP/HNR (less harmonic energy)
  • Often lower intensity values

Rough voice:

  • High jitter and shimmer values
  • Low CPP reflecting aperiodic phonation
  • Potentially irregular pulse patterns

Strained/pressed voice:

  • May maintain relatively high HNR (less noise)
  • Often exhibits subtle irregularities (mildly elevated jitter)
  • Usually higher intensity values
  • Potentially elevated pitch

In clinical practice, combinations of metrics provide the most accurate assessment. For example, low CPP combined with high jitter strongly suggests significant dysphonia. By comparing a patient’s profile to healthy norms, clinicians can interpret the specific type of voice disorder present and develop targeted intervention plans.

Conclusion

Objective voice measures have become essential tools in the modern speech-language pathologist’s practice. By understanding metrics like CPP, jitter, shimmer, and HNR, clinicians can diagnose more accurately, develop tailored therapy plans, and measure outcomes quantitatively.

The Voice analysis capabilities provided by tools like Phonalyze offer a data-driven window into voice function that complements traditional perceptual assessment. This combined approach ensures comprehensive evaluation and treatment of voice disorders across all patient populations.

Ready to experience these insights firsthand? Try our demo to analyze your own voice and check your voice Quality Metrics using Phonalyze.

Frequently Asked Questions

Scroll to Top