Objective Voice Quality Metrics: CPP, Jitter, Shimmer & HNR Explained | Phonalyze
Voice quality metrics blog thumbnail — objective acoustic assessment including CPP, jitter, shimmer, and HNR for clinical voice analysis

Objective Voice Quality Metrics: CPP, Jitter, Shimmer & HNR Explained

Phonalyze Clinical Team
Reviewed by Cognizn Speech Science Team · Updated June 2025
TL;DR
Objective voice quality metrics — including CPP, jitter, shimmer, HNR, and fundamental frequency — give speech-language pathologists quantitative data that perceptual listening alone cannot provide. Research shows CPP and CPPS are the strongest single acoustic predictors of dysphonia. This guide covers every major metric: its definition, healthy ranges by age and sex, and clinical interpretation. Phonalyze measures all of these remotely, without specialist hardware.

When a speech-language pathologist evaluates a patient’s voice, listening alone leaves too much to subjectivity. Objective voice quality metrics — measurable acoustic parameters derived from recorded voice samples — provide the quantitative foundation for accurate dysphonia diagnosis, evidence-based therapy planning, and reliable outcome tracking. This reference guide explains every major metric used in clinical voice assessment, with healthy ranges, clinical interpretation, and guidance on what each metric identifies.

CPP
Cepstral Peak Prominence — the strongest single acoustic predictor of dysphonia severity (Hillenbrand et al., 1994; Watts & Awan, 2011)
9+
Core acoustic parameters measured by Phonalyze in every remote voice session — no specialist hardware required

Why Objective Voice Metrics Matter

Traditional voice evaluation relies heavily on perceptual assessment — a trained clinician listens to the patient and rates qualities like roughness, breathiness, and strain on standardized scales such as the CAPE-V (Consensus Auditory-Perceptual Evaluation of Voice) or GRBAS. While perceptual assessment is valuable, it carries inherent subjectivity: two clinicians may rate the same voice differently, and subtle changes over time can be missed without a numerical baseline.

Objective acoustic metrics solve this problem. By analyzing the acoustic signal of a patient’s voice, clinicians gain:

  • Quantitative data on vocal fold vibration patterns that supplements subjective listening
  • Reliable, non-invasive measurements that can be repeated consistently across sessions
  • The ability to distinguish healthy from dysphonic voices with greater precision
  • An objective index of therapy outcomes — tracking subtle improvements week to week
  • An evidence-based foundation for clinical decision-making and documentation
Clinical dashboard view of objective voice assessment in Phonalyze, displaying acoustic waveform, spectrogram, and voice quality metrics including CPP, HNR, jitter, and shimmer
Objective voice assessment in Phonalyze: A single recorded voice sample generates CPP, CPPS, jitter, shimmer, HNR, fundamental frequency, intensity, and voice break data — giving clinicians a complete acoustic picture without specialist hardware.
Research evidence: A landmark study by Watts & Awan (2011) published in the Journal of Speech, Language, and Hearing Research demonstrated that CPP (Cepstral Peak Prominence) strongly correlates with perceptual dysphonia ratings on the CAPE-V. Hillenbrand et al. (1994) similarly established cepstral measures as robust discriminators of healthy versus pathological voice. These findings have positioned CPP and CPPS as primary acoustic metrics in clinical voice practice.

CPP and CPPS (Cepstral Measures)

Cepstral measures are currently the most clinically validated objective metrics for detecting and quantifying dysphonia. They capture the overall periodicity of the voice signal in a single number — making them powerful screening and outcome tools.

Cepstral Measure
CPP — Cepstral Peak Prominence

CPP measures the height of the dominant cepstral peak above a regression line fitted to the cepstrum. It directly reflects the clarity and strength of the harmonic structure in the voice — a high CPP means the voice has strong, regular periodicity; a low CPP means the harmonics are weak or masked by noise.

Healthy range
Sustained vowels: ≥ 14 dB (measured with Praat)
Children: Higher than adults
Clinical threshold
CPP below 9–14 dB consistently identifies dysphonic voice quality across multiple studies
What it identifies
Breathy or rough voice quality; general dysphonia severity across all disorder types
Cepstral Measure
CPPS — Smoothed Cepstral Peak Prominence

CPPS is CPP calculated on running connected speech rather than a sustained vowel. Temporal and spectral smoothing removes rapid fluctuations, making it sensitive to voice quality variation across naturalistic speech — the way a patient actually communicates in daily life.

Healthy range
Running speech: ≥ 4.0 dB
Children: Generally higher than adults
Clinical threshold
CPPS below 4.0 dB indicates dysphonia in conversational speech contexts
What it identifies
Dysphonia that may not appear during sustained phonation but emerges in connected speech
Clinical insight: CPP and CPPS are sensitive across all dysphonia types — breathy, rough, and strained. Unlike jitter and shimmer, which are unreliable in severely dysphonic voices, cepstral measures remain valid even when voice quality is markedly disordered. This makes them particularly valuable as primary metrics in both initial assessment and therapy outcome tracking.

Pitch (Fundamental Frequency — F0)

Side-by-side comparison of normal pitch variation patterns versus disordered pitch patterns in voice analysis — showing stable F0 contour in healthy voice and irregular, monopitch contour in dysphonic voice
Normal vs. disordered pitch patterns: A healthy voice (left) shows natural F0 variation appropriate to speech context. A disordered voice (right) may show monopitch, abnormal pitch range, or irregular F0 contour — each pointing to different underlying pathology.
Pitch Measure
Fundamental Frequency (F0) — Mean, Median & Range

Fundamental frequency describes the rate of vocal fold vibration measured in Hertz. Clinical reports typically include mean F0 (average across the sample), median F0, and F0 range (difference between highest and lowest pitch). Together these describe habitual pitch level and vocal flexibility.

Healthy ranges
Adult males: 85–155 Hz (mean ~120–130 Hz)
Adult females: 165–255 Hz (mean ~200–220 Hz)
Children: >250 Hz, declining through adolescence
Normal pitch range
Typical speakers produce a range of 1.5–2 octaves. Reduced range suggests rigidity; excessive range may indicate tension or compensatory behavior.
What it identifies
Monopitch (Parkinson’s disease, depression); abnormal habitual pitch (puberphonia, hormonal changes, nodules); reduced flexibility

Jitter (Frequency Perturbation)

Perturbation Measure
Jitter — Cycle-to-Cycle Frequency Variability

Jitter quantifies how much the duration of each glottal cycle varies from cycle to cycle. Local Jitter measures the average difference between adjacent periods. Smoothed variants — RAP (Relative Average Perturbation, averaged over 3 cycles) and PPQ5 (Pitch Perturbation Quotient, averaged over 5 cycles) — reduce sensitivity to transient noise, making them more reliable in clinical recordings.

Healthy ranges
Adults (M & F): < 0.5–1.0% during comfortable phonation
Children (4–6 yrs): Boys ~1.14%, Girls ~0.76%
Children (10–12 yrs): Boys ~0.82%, Girls ~0.41%
Clinical interpretation
Higher jitter = more irregular frequency variation, perceived as roughness or aperiodicity. Abnormal jitter suggests organic or neuromuscular pathology. Increases with vocal fatigue and misuse.
What it identifies
Rough voice quality; organic disorders (nodules, polyps); neuromuscular disorders; vocal fatigue from hyperfunctional use
Reliability note: Jitter measures become unreliable when voice quality is severely dysphonic — because the algorithm cannot accurately track individual glottal cycles in highly aperiodic signals. In these cases, cepstral measures (CPP/CPPS) should be the primary metric, with jitter used as supplementary information (Maryn et al., 2010, Journal of Voice).

Shimmer (Amplitude Perturbation)

Perturbation Measure
Shimmer — Cycle-to-Cycle Amplitude Variability

Shimmer measures how much the amplitude (loudness) of each glottal cycle varies from cycle to cycle. Local Shimmer is the percent variation between adjacent peak amplitudes. APQ3 and APQ5 (Amplitude Perturbation Quotient) average shimmer across 3 or 5 cycles respectively, providing smoother estimates.

Healthy ranges
Adults: ≤ 3–5%
Children (4–9 yrs, boys): ~0.5%
Children (10–12 yrs, boys): ~2.1%
Girls: ~0.15% across age groups
Clinical interpretation
High shimmer indicates amplitude instability, perceived as breathiness or hoarseness. Often reflects incomplete glottal closure or irregular oscillation from structural lesions.
What it identifies
Breathy/hoarse quality; incomplete glottal closure (nodules, edema, bowed folds); hypofunctional vs. hyperfunctional dysphonia patterns

HNR and NHR (Noise Measures)

Noise Measure
HNR — Harmonics-to-Noise Ratio

HNR expresses the ratio of periodic (harmonic) energy to aperiodic (noise) energy in the voice signal, measured in decibels (dB). A high HNR means the voice is dominated by regular harmonic vibration; a low HNR means turbulent airflow or noise predominates. Its inverse, NHR (Noise-to-Harmonics Ratio), rises as vocal noise increases — providing the same clinical information from the opposite direction.

Healthy ranges
Adults: HNR ≥ 15–20 dB; NHR 0.01–0.05
Children: HNR ~9–10 dB (NHR ~0.10–0.14)
Females often show slightly higher HNR than males
Clinical threshold
HNR below 10 dB strongly suggests breathy or strained voice quality. Markedly low HNR indicates significant turbulent airflow.
What it identifies
Breathiness; hoarseness; incomplete glottal closure; turbulent airflow from structural or functional voice disorders

Voicing Continuity and Voice Breaks

Continuity Measure
Voice Breaks & Unvoiced Frames

Voice break count tracks interruptions in phonation — moments where voicing stops and restarts unexpectedly during a sustained vowel or connected speech. The unvoiced frames percentage captures the proportion of analysis frames where no phonation is detected.

Healthy range
Sustained vowel: 0 voice breaks (ideal); unvoiced frames: essentially 0% during sustained phonation
Clinical interpretation
Even a single voice break during a sustained vowel is clinically significant. Frequent breaks indicate severe phonatory instability and loss of laryngeal motor control.
What it identifies
Spasmodic dysphonia (characteristic voice breaks); severe muscle tension dysphonia; diplophonia; neurological disorders affecting phonation

Intensity Metrics

Intensity Measure
Mean, Minimum & Maximum Intensity (dB)

Intensity metrics describe the loudness of the speech signal in decibels (dB SPL). Mean intensity reflects average loudness during the sample; minimum and maximum capture the dynamic range of vocal output. These parameters reflect the combined effect of subglottal air pressure, vocal fold adduction, and vocal tract configuration.

Healthy ranges
Normal conversation & sustained vowel: 60–70 dB SPL
Maximum intensity: males ~80–85 dB; females ~75–80 dB
Clinical interpretation
Low mean intensity may indicate glottal insufficiency or neurological weakness (hypophonia in Parkinson’s disease). Excessively high intensity suggests hyperadduction or vocal strain.
What it identifies
Breathy vs. pressed voice; vocal fold bowing or weakness; hyperadduction; respiratory support deficits; Parkinson’s-related hypophonia

Pulses and Periods

Cycle Measure
Glottal Pulses & Period Duration

Pulses are individual glottal cycle markers detected in the acoustic signal. Periods are the time intervals between consecutive pulses — essentially the instantaneous pitch at each cycle. Analysis of pulse regularity and period variance provides a fine-grained view of phonatory stability that complements the averaged metrics described above.

Healthy range
Pulse count ≈ recording duration × F0; period durations should be highly regular with very low variance
Clinical interpretation
Irregular or missing pulses reveal subharmonics or diplophonia. Highly variable period duration indicates unstable phonation.
What it identifies
Diplophonia (double pitch); double articulation problems; voicing instabilities that distort the glottal pulse train
Measure All These Metrics Remotely with Phonalyze
CPP, jitter, shimmer, HNR, F0, voice breaks, intensity — measured in every session. HIPAA-compliant, browser-based, no hardware required.
Start free 30-day trial
— Phonalyze Team

Complete Voice Quality Metrics Reference Table

The table below provides a quick-reference summary of all core voice quality metrics used in clinical acoustic voice analysis. All ranges represent values for comfortable sustained phonation unless otherwise noted.

Metric Definition Healthy range (adults) Children Clinical significance Identifies
CPP Height of dominant cepstral peak above baseline regression ≥ 14 dB (sustained vowel) Higher than adults High = clear voice; Low = dysphonia Dysphonia severity (breathy/rough voice)
CPPS CPP measured on running connected speech ≥ 4.0 dB Generally higher Sensitive to voice quality in naturalistic speech Dysphonia in conversational speech contexts
Pitch (F0) Average fundamental frequency of voicing M: 85–155 Hz; F: 165–255 Hz > 250 Hz, declining with age Habitual pitch level; deviations suggest pathology Monopitch (Parkinson’s); hormonal/structural changes
Pitch range Difference between highest and lowest F0 in sample 1.5–2 octaves Variable Reduced = rigidity; Excessive = tension Parkinson’s; vocal fatigue; hyperfunctional patterns
Jitter Cycle-to-cycle period (frequency) variability < 0.5–1.0% 4–6 yrs: ~0.76–1.14%; 10–12 yrs: ~0.41–0.82% High jitter = irregular frequency, roughness Aperiodicity; roughness; organic/neuromuscular disorders
Shimmer Cycle-to-cycle amplitude variability ≤ 3–5% Boys 4–9 yrs: ~0.5%; 10–12 yrs: ~2.1% High shimmer = amplitude instability Breathiness/hoarseness; incomplete glottal closure
HNR Harmonic vs. noise energy ratio (dB) 15–20+ dB ~9–10 dB High = clear phonation; < 10 dB = significant noise Turbulent/breathy voice; structural lesions
NHR Inverse of HNR — noise to harmonics ratio 0.01–0.05 ~0.10–0.14 Rising NHR = increasing turbulence Breathiness; hoarseness; additive noise
Voice breaks Count of interruptions in sustained phonation 0 (sustained vowel) 0 (sustained vowel) Any breaks = phonatory instability Spasmodic dysphonia; severe MTD; neurological disorders
Intensity (mean) Average loudness of speech signal (dB SPL) 60–70 dB SPL Softer; increases with age Low = hypophonia; High = hyperadduction Vocal fold bowing; Parkinson’s hypophonia; strain
Pulses / Periods Glottal cycle markers and inter-cycle timing Regular, count ≈ duration × F0 Regular Irregular/missing pulses = voicing instability Diplophonia; double articulation; severe aperiodicity

Healthy vs. Pathological Voice Profiles

In clinical practice, no single metric diagnoses a voice disorder. The most accurate interpretation comes from reading the pattern of multiple metrics together — a voice profile. Here is how the three primary dysphonia types present acoustically:

Healthy voice profile
  • CPP ≥ 14 dB (sustained); CPPS ≥ 4 dB (running)
  • Jitter < 0.5% in adults
  • Shimmer ≤ 3–5% in adults
  • HNR ≥ 15–20 dB
  • Zero or near-zero voice breaks
  • Pitch and intensity appropriate for age and sex
Dysphonic voice patterns
  • CPP below 9–14 dB threshold (all types)
  • Elevated jitter (roughness/aperiodicity)
  • Elevated shimmer (breathiness/hoarseness)
  • HNR below 10 dB (turbulent airflow)
  • Voice breaks present (spasmodic dysphonia, severe MTD)
  • Intensity abnormally low (hypophonia) or high (hyperfunction)

The three clinical dysphonia types each produce characteristic metric signatures:

Breathy voice
  • Increased shimmer and NHR (turbulent airflow)
  • Reduced CPP and HNR (less harmonic energy)
  • Often lower mean intensity
  • Causes: nodules, edema, bowed folds, incomplete closure
Rough voice
  • High jitter and shimmer (irregular vibration)
  • Low CPP (aperiodic phonation)
  • Irregular pulse patterns
  • Causes: nodules, polyps, laryngitis, neuromuscular disorders
Strained / pressed voice
  • Relatively maintained HNR (less noise)
  • Mildly elevated jitter; subtle CPP reduction
  • Higher mean intensity and elevated pitch
  • Causes: muscle tension dysphonia, hyperfunctional use
Clinical principle: Low CPP combined with high jitter strongly suggests significant dysphonia. Low CPP combined with high shimmer and low HNR points toward a breathy disorder pattern. By comparing a patient’s full metric profile to these characteristic signatures and to age- and sex-appropriate norms, clinicians can identify the likely disorder type and design targeted intervention — and then track objective improvement session by session.

Frequently Asked Questions

For sustained vowels measured with Praat, a CPP value of 14 dB or above indicates healthy voice quality. For running connected speech, CPPS values of 4.0 dB or higher are considered normal. CPP values below 9–14 dB are consistently associated with dysphonia across multiple peer-reviewed studies. Children typically show higher CPP values than adults due to stronger harmonic energy relative to noise.

Jitter measures cycle-to-cycle variability in frequency (pitch) — how much the duration of each glottal cycle changes from one cycle to the next. Shimmer measures cycle-to-cycle variability in amplitude (loudness). Both reflect vocal fold vibration stability but capture different dimensions of it. High jitter is perceived as roughness or aperiodicity; high shimmer is associated with breathiness or hoarseness. Normal adult jitter is below 0.5% and normal adult shimmer is below 3–5% during sustained phonation.

A low Harmonics-to-Noise Ratio (HNR) means the voice signal contains more noise energy relative to periodic harmonic energy. In healthy adults, HNR is typically 15–20 dB or higher. An HNR below 10 dB strongly indicates a breathy or strained voice quality, often associated with incomplete glottal closure, vocal nodules, Reinke’s edema, or other pathologies that allow turbulent airflow through the glottis during phonation.

Research consistently shows that CPP and CPPS are the strongest single acoustic predictors of dysphonia severity, correlating closely with perceptual ratings on the CAPE-V and GRBAS scales (Watts & Awan, 2011; Heman-Ackah et al., 2003). HNR, jitter, and shimmer provide complementary information about the specific dysphonia type — for example, high jitter indicates roughness while high shimmer indicates breathiness. Used together as a metric profile, these measures allow clinicians to characterize dysphonia type and severity with greater precision than perceptual assessment alone.

Normal fundamental frequency (F0) ranges are approximately 85–155 Hz for adult males (mean around 120–130 Hz) and 165–255 Hz for adult females (mean around 200–220 Hz). Children have higher F0 values, often above 250 Hz, which gradually decline to adult levels through adolescence. Significant deviations from age- and sex-appropriate norms may indicate vocal pathology, structural changes to the vocal folds, or neurological involvement.

Yes. Phonalyze is a HIPAA-compliant, browser-based voice analysis platform that measures CPP, CPPS, jitter, shimmer, HNR, NHR, fundamental frequency, voice breaks, intensity, and glottal pulse analysis — entirely remotely. Patients record voice samples from home via a secure SMS link. No app download or specialist recording hardware is required. Clinicians access complete acoustic reports and can compare metrics across sessions to track therapy outcomes objectively. Read our full overview of Phonalyze’s remote voice analysis capabilities.

Effective voice therapy produces measurable improvements in acoustic metrics. CPP and CPPS values typically increase as voice quality improves, reflecting better harmonic clarity and more stable vibration. HNR rises as noise energy decreases relative to harmonics. Jitter and shimmer values tend to decrease toward normal ranges as vocal fold contact becomes more regular and complete. These objective metric changes closely track perceptual improvement ratings by clinicians, validating the therapy approach and providing motivating, visible evidence of progress for patients.

A high voice break count — interruptions in phonation during sustained vowel production — indicates severe phonatory instability. In healthy adults, sustained phonation should produce zero or near-zero voice breaks. Even a single voice break during a sustained vowel is clinically significant. High break counts are characteristic of spasmodic dysphonia (where laryngeal spasms interrupt phonation), severe muscle tension dysphonia, and neurological disorders such as essential tremor or stroke affecting laryngeal motor control.

Analyze Your Patients’ Voice Quality Metrics with Phonalyze
Try our clinical-grade remote voice analysis platform — measure CPP, jitter, shimmer, HNR, and more from your first session. No hardware. No software installation. HIPAA-compliant.
Start free 30-day trial
— Phonalyze Team

Clinical References & Sources

  1. Watts, C.R. & Awan, S.N. (2011). Use of spectral/cepstral analyses for differentiating normal from hypofunctional voices in sustained vowel and continuous speech contexts. Journal of Speech, Language, and Hearing Research, 54(6), 1525–1537. doi:10.1044/1092-4388
  2. Hillenbrand, J., Cleveland, R.A., & Erickson, R.L. (1994). Acoustic correlates of breathy vocal quality. Journal of Speech and Hearing Research, 37(4), 769–778. doi:10.1044/jshr.3704.769
  3. Maryn, Y., Roy, N., De Bodt, M., Van Cauwenberge, P., & Corthals, P. (2009). Acoustic measurement of overall voice quality: A meta-analysis. Journal of the Acoustical Society of America, 126(5), 2619–2634. doi:10.1121/1.3224706
  4. Heman-Ackah, Y.D., Michael, D.D., & Goding, G.S. (2003). The relationship between cepstral peak prominence and selected parameters of dysphonia. Journal of Voice, 16(1), 20–27. doi:10.1016/S0892-1997
  5. American Speech-Language-Hearing Association (ASHA). Voice Disorders — Clinical Portal. ASHA, 2023.
  6. Boersma, P. & Weenink, D. (2024). Praat: doing phonetics by computer [Computer program]. Version 6.4. Retrieved from praat.org
  7. Phonalyze Blog. Remote Voice Analysis Tool for Speech Pathologists. Phonalyze.com, 2025.
Scroll to Top