Below is a complete, publication-ready package suitable for submission to a high-impact clinical or outcomes research journal.


METHODS

Statistical Analysis

Longitudinal patient-reported outcome (PRO) trajectories were analyzed using a latent class growth modeling (LCGM) framework. Synthetic longitudinal data were generated to reflect heterogeneous trajectory patterns across six visits. Individual growth parameters (intercept and linear slope) were estimated for each participant using ordinary least squares regression.

Latent trajectory classes were identified using Gaussian finite mixture modeling implemented via the GaussianMixture algorithm from scikit-learn. Growth parameters were standardized prior to clustering. Model fit was evaluated using Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC), with lower values indicating improved model fit.

Following class assignment, cluster profiling was performed by comparing:

  • Baseline PRO
  • Mean adverse event (AE) severity
  • Estimated intercept
  • Estimated slope

Continuous variables were standardized for visualization.

Multinomial logistic regression was used to examine predictors of latent class membership. Independent variables included age, body mass index (BMI), baseline PRO, and mean AE severity. Predictors were standardized prior to modeling. Adjusted odds ratios (ORs) with 95% confidence intervals (CIs) were derived by exponentiating regression coefficients. Model discrimination was evaluated using precision, recall, F1-score, and overall accuracy.

All analyses were conducted in Python 3.11 using:

  • NumPy
  • pandas
  • scikit-learn
  • seaborn
  • Matplotlib

A two-sided significance framework was assumed.


RESULTS

Latent Class Growth Modeling

Among 150 simulated patients with six longitudinal assessments each, growth parameter estimation identified substantial heterogeneity in PRO trajectories. A three-class Gaussian mixture solution provided optimal balance between model parsimony and fit (AIC and BIC minimized relative to 2- and 4-class solutions).

Three distinct trajectory phenotypes emerged:

  1. Class 0 (Stable trajectory) – Minimal decline over time
  2. Class 1 (Moderate decline trajectory) – Gradual PRO deterioration
  3. Class 2 (Rapid decline trajectory) – Steep longitudinal decline

Density plots of individual slopes demonstrated clear separation between the rapid-decline class and the stable group, with limited distributional overlap, supporting robust latent structure identification.


Cluster Profiling

Heatmap visualization of standardized class means revealed coherent multidimensional separation. The rapid-decline class exhibited:

  • More negative slope values
  • Higher baseline PRO
  • Modestly higher AE burden

The stable class demonstrated near-zero slopes and lower AE burden.

Boxplots of mean AE severity indicated moderate differentiation across classes, though variability overlapped partially between moderate and rapid decline groups.

Radar chart visualization confirmed distinct phenotypic signatures across baseline PRO, longitudinal slope, and AE burden dimensions.


Predictors of Class Membership

Multinomial logistic regression demonstrated good overall discrimination (accuracy = 83%; weighted F1-score = 0.82).

Class 1 vs Class 0 (Reference)

Higher baseline PRO was strongly associated with increased odds of moderate-decline trajectory membership (OR > 1). Age demonstrated a positive association, while BMI showed a modest inverse association. Mean AE severity had minimal independent effect.

Class 2 vs Class 0 (Reference)

Higher baseline PRO significantly increased the odds of rapid-decline membership. Younger age modestly increased risk relative to the stable group. AE severity demonstrated a small positive association.

Baseline PRO was the strongest independent predictor of trajectory class across comparisons.


Model Performance

Classification metrics:

  • Overall accuracy: 83%
  • Stable class recall: 94%
  • Rapid-decline class recall: 78%
  • Moderate-decline recall: 33%

The moderate-decline class showed partial overlap with adjacent classes, consistent with intermediate trajectory characteristics.


FIGURE LEGENDS

Figure 1. Boxplot of Mean Adverse Event Severity by Latent Class.
Distribution of average AE severity across trajectory classes. Boxes represent interquartile range; horizontal lines indicate medians. Whiskers denote 1.5× IQR.

Figure 2. Density Distribution of Individual Slopes by Class.
Kernel density plots demonstrating separation of longitudinal PRO slopes across latent trajectory classes.

Figure 3. Radar Plot of Standardized Class Profiles.
Multidimensional visualization of standardized intercept, slope, baseline PRO, and AE burden across classes.

Figure 4. Heatmap of Standardized Class Means.
Color-coded representation of z-scored class means across trajectory and clinical features. Warmer colors indicate values above cohort mean; cooler colors indicate below mean.

Figure 5. Multinomial Logistic Regression Odds Ratios.
Forest plot of adjusted odds ratios with 95% confidence intervals predicting trajectory class membership.


OPTIONAL: JUPYTER NOTEBOOK TEMPLATE STRUCTURE (Supplementary Material)

If submitted as supplementary reproducible material, structure the notebook as:

Section 1: Environment Setup

  • Library imports
  • Random seed specification

Section 2: Data Generation

  • Synthetic data simulation
  • Summary statistics

Section 3: Growth Parameter Estimation

  • Patient-level regression
  • Visualization of slopes

Section 4: Latent Class Modeling

  • Gaussian mixture modeling
  • AIC/BIC comparison table

Section 5: Cluster Profiling

  • Heatmap
  • Radar chart
  • Boxplots

Section 6: Multinomial Regression

  • Model fitting
  • Odds ratio table
  • Classification report
  • Forest plot

Section 7: Reproducibility

  • Package versions
  • Random seed statement

If you would like, I can now:

  • Tighten this to meet word limits (e.g., ≤600-word Results)
  • Format to NEJM / JCO / Lancet style
  • Add entropy and posterior probability reporting
  • Convert odds ratios into a formatted publication table
  • Generate a CONSORT-style statistical reporting checklist

Let me know your target journal.

Print Friendly, PDF & Email
Trajectory based clustering

Venugopal Manneni


A doctor in statistics from Osmania University. I have been working in the fields of Analytics and research for the last 15 years. My expertise is to architecting the solutions for the data driven problems using statistical methods, Machine Learning and deep learning algorithms for both structured and unstructured data. In these fields I’ve also published papers. I love to play cricket and badminton.


Post navigation