Interpreting Results¶
The pipeline produces diagnostic and summary plots organised into five categories. All per-scenario plots are generated from pre-computed per-replicate statistics and downsampled data (capped at 100,000 points) to avoid loading full phenotype files at plot time. Per-scenario plots are compiled into atlas PDFs.
Regenerating plots
Force-rebuild a scenario's atlas:
Reading the atlas¶
Start with these checks before diving into individual plot details:
| Check | What to look at | Why it matters |
|---|---|---|
| Pedigree structure | pedigree_counts.*, family_structure.* |
Confirms the simulated relationship mix, family sizes, and sampling/dropout shape |
| Liability structure | cross_trait.*, heritability.by_generation.*, additive_shared.by_generation.* |
Confirms the realized A/C/E components and cross-trait liability correlations |
| Phenotype separation | liability_violin.phenotype.*, liability_components.by_generation.* |
Shows whether affected individuals are enriched for higher liability as expected |
| Incidence and censoring | cumulative_incidence.*, censoring.*, censoring_cascade.* |
Separates true event-time behavior from age-window and death censoring effects |
| Familial signal | tetrachoric.phenotype.*, parent_offspring_liability.by_generation.* |
Shows whether observed familial correlations track the latent liability structure |
| Validation summaries | Folder-level validation plots | Compares observed simulation metrics to configured or theoretical expectations across scenarios |
The sections below describe each plot in detail. They are intentionally specific enough to support debugging and methods writeups; for routine quality control, the table above is usually the fastest path through the atlas.
Distribution Plots¶
Source: simace.plotting.plot_distributions.
Generated for Weibull frailty phenotypes.
Mortality rate by decade (death_age_distribution.png)¶
Two-panel figure (14 \(\times\) 5 in).
Left panel -- Mortality Rate by Decade.
- Bars: Per-decade mortality rate (deaths in decade / alive at start of decade), averaged across replicates. Semi-transparent blue fill with black edges.
- x-axis: Age decade labels (e.g., "0-10", "10-20", ...).
- y-axis: Mortality rate (proportion).
Right panel -- Cumulative Mortality by Decade.
- Bars: Cumulative mortality \(F(t) = 1 - \prod_{d \leq t}(1 - r_d)\), same style as left.
- Text annotations: Above each bar, survival probability \(S = 1 - F(t)\) formatted as "S=0.XX".
Age-at-onset and death-age histograms (trait_phenotype.png)¶
A \(2 \times 2\) grid (14 \(\times\) 10 in), rows = traits 1 and 2.
Left column -- Affected individuals.
- Histogram: Density histogram (50 bins) of observed time-to-event (
t_observed) for individuals with \(\delta = 1\). Orange fill (C3), black edges. - Title: "Trait \(k\): Age at Onset (affected)".
Right column -- Death-censored unaffected.
- Histogram: Density histogram of observed time for individuals with \(\delta = 0\) and death-censored = True. Blue fill (
C0), black edges. - Title: "Trait \(k\): Age at Death (death-censored, unaffected)".
Liability vs. age-at-onset (trait_regression.png)¶
Side-by-side joint plots (16 \(\times\) 7 in), one per trait. Each joint plot has:
- Central scatter: Liability (\(x\)) vs. observed age-at-onset (\(y\)) for affected individuals. Translucent blue points (\(\alpha = 0.05\), size 3, rasterised).
- Regression line: Orange (
C3) line from pre-computed slope and intercept, averaged across replicates. - \(R^2\) annotation: Upper-left corner, "R\(^2\) = X.XXXX".
- Marginal histograms: Top (liability) and right (age-at-onset), 50 bins, no edges.
- Panel title: "Trait \(k\)".
Cumulative incidence curves (cumulative_incidence.png)¶
Two-panel figure (14 \(\times\) 5 in), shared \(y\)-axis, one panel per trait.
- Blue solid line ("Observed"): Mean cumulative incidence from censored data across replicates, with shaded min-max band when \(>1\) replicate.
- Grey solid line ("True"): Mean cumulative incidence from uncensored event times, with shaded band.
- Grey dashed crosshairs: Horizontal line at half the observed lifetime prevalence; vertical line at the age when 50% of lifetime cases have occurred. An orange dot (
C3) marks the intersection. - Arrow annotation: "50% of cases by age \(X\)" pointing to the crosshair intersection.
- Text box (upper-left, white background): "Affected: \(X.X\)%", "True prev: \(X.X\)%", "Censored: \(X.X\)%".
- Legend: Lower-right, showing "True" (grey) and "Observed" (blue) line styles.
Censoring windows by generation (censoring_windows.png)¶
Grid of panels: rows = traits (\(\times 2\)), columns = generations (\(\times N\)). Panel size \(5N \times 8\) in.
- Grey solid line ("True"): True cumulative incidence within this generation, with grey fill to baseline.
- Blue solid line ("Observed"): Observed cumulative incidence, with blue fill. Min-max bands across replicates when \(>1\) rep.
- Text box (upper-left, white background): "Affected: \(X.X\)%", "Left-cens: \(X.X\)%", "Right-cens: \(X.X\)%", "Death-cens: \(X.X\)%".
- Column titles: "Gen \(g\)" with observation window annotation "[lo, hi]" when available.
- Row labels: "Trait \(k\) / Cumulative Incidence".
- Legend (top-right panel): Grey = True, Blue = Observed.
Liability Plots¶
Source: simace.plotting.plot_liability.
Generated for Weibull frailty phenotypes.
Cross-trait liability joint plots (liability_joint.png)¶
\(2 \times 2\) grid of joint plots (14 \(\times\) 12 in). Panels: Liability, \(A\) (Additive genetic), \(C\) (Common environment), \(E\) (Unique environment).
- Central scatter: Trait 1 (\(x\)) vs. Trait 2 (\(y\)) for the given component. Blue points (\(\alpha = 0.05\), size 3, rasterised).
- Marginal histograms: Top and right, 50 bins, semi-transparent.
- Pearson \(r\) annotation: Upper-left of scatter, "r = X.XXXX".
- Axis labels: "\(\text{Component}\) (Trait 1)" and "\(\text{Component}\) (Trait 2)".
Liability joint plots coloured by affection (liability_joint_affected.png)¶
Same \(2 \times 2\) layout as above, but with affection-status colouring based on trait 1:
- Blue points (\(\alpha = 0.03\)): Unaffected individuals.
- Orange points (\(\alpha = 0.15\),
C3): Affected individuals (trait 1). - Marginal histograms: Stacked blue (unaffected) and orange (affected).
- Figure legend (upper-right): Circle markers labelled "Unaffected" (blue) and "Affected (T1)" (orange).
- Pearson \(r\) annotation: Same as above.
Liability violin plots by affection status (liability_violin.png)¶
Single figure (8 \(\times\) 6 in) with split violins.
- Split violins: One per trait (\(x\)-axis). Left half = unaffected, right half = affected (coloured by seaborn hue).
- Diamond markers (\(\blacklozenge\), black): Mean liability for each group, positioned slightly left (unaffected) or right (affected) of centre.
- Text annotations: "\(\mu = X.XX\)" next to each diamond.
- Prevalence annotations: Below each trait, "Prevalence: \(X.X\)%" in italic.
- Legend: Seaborn-generated hue legend for Affected True/False.
Liability violin plots by generation (liability_violin_by_generation.png)¶
Grid: rows = traits (\(\times 2\)), columns = recorded generations. Size \(4N \times 8\) in.
- Split violins: Affected vs. unaffected within each generation, trimmed at data range (
cut=0). - Diamond markers and \(\mu\) annotations: Same as above, per panel.
- x-axis label per panel: "prev: \(X.X\)%" showing observed generation-specific prevalence.
- Column titles: "Gen \(g\) (oldest)" for the first column, "Gen \(g\) (youngest)" for the last.
- Legend: Upper-right panel only.
Joint affection heatmap (joint_affection.png)¶
Single figure (7 \(\times\) 6 in).
- \(2 \times 2\) heatmap (Blues colourmap): Rows = Trait 2 (Affected/Unaffected), Columns = Trait 1 (Affected/Unaffected).
- Cell annotations: Two-line labels showing proportion ("0.XX") and count ("(n=XXXXX)").
- Colour bar: Right side, labelled "Proportion".
- Title: "Joint Affected Status (Weibull) [\(\text{scenario}\)]" with subtitle showing:
r_tet = X.XXX: tetrachoric correlation on censored binary affected status.r_weibull = X.XXX: cross-trait liability correlation estimated from uncensored Weibull survival data (oracle reference).(stratified: X.XXX): generation-stratified Weibull estimate -- computes per-generation cross-trait correlations and combines via inverse-variance weighting (\(\hat{r} = \sum w_g r_g / \sum w_g\) where \(w_g = 1/\text{SE}_g^2\)). This reduces bias from heterogeneous censoring across generations.(naive: X.XXX): unweighted pooled censored Weibull estimate for comparison (biased when censoring varies by generation). All values averaged across replicates.
Correlation Plots¶
Source: simace.plotting.plot_correlations.
Generated for Weibull frailty phenotypes.
Tetrachoric correlations by relationship type (tetrachoric_sibling.png)¶
Two-panel figure (16 \(\times\) 6 in), one per trait. \(y\)-axis range: \([-0.1, 1.1]\).
- Coloured violins: Distribution of tetrachoric correlations (computed from censored binary affected status) across replicates, one per relationship type (as labelled on the x-axis). These represent the correlations estimable from observed data after censoring.
- Black dots: Individual per-replicate tetrachoric correlation estimates, jittered horizontally (\(\pm 0.08\)).
- Black dashed horizontal segments: Mean Pearson liability correlation for each pair type (averaged across replicates). This is the ground-truth correlation computed directly on the continuous latent liability values, serving as the theoretical reference.
- Green dash-dot horizontal segments (when available): Mean uncensored Weibull pairwise survival-time correlation. This shows what the correlation would be without censoring distortion.
- Pair count annotations: "N=\(X\)" above each violin (mean pairs per replicate).
- Legend (upper-right): "Liability r" (black dashed), "Weibull r (uncensored)" (green dash-dot, when present).
- x-axis labels: Relationship type names, rotated 15\(^\circ\).
- Interpretation: The gap between the violins and the black dashed lines reflects attenuation due to censoring and the dichotomization inherent in the tetrachoric estimate. The green dash-dot lines show the intermediate effect of censoring alone (before dichotomization).
Tetrachoric correlations by generation (tetrachoric_by_generation.png)¶
Grid: rows = traits (\(\times 2\)), columns = generations. Size \(5N \times 10\) in.
- Violins, dots, dashed segments, pair counts: Same encoding as above (violins = observed tetrachoric correlations, black dashed = true liability correlations, dots = per-replicate estimates), but computed within each generation separately.
- x-axis labels: Relationship type names, rotated 30\(^\circ\), smaller font.
- Column titles: "Gen \(g\)".
- Legend (upper-right of top-right panel): "Liability r" (black dashed).
Parent-offspring liability regressions (parent_offspring_liability.png)¶
Grid: rows = traits (\(\times 2\)), columns = last 3 non-founder generations. Size \(5N \times 8\) in.
- Scatter: Midparent liability (\(x\)) vs. offspring liability (\(y\)). Blue points (\(\alpha = 0.15\), size 3, rasterised).
- Orange regression line (
C3): Least-squares fit through the scatter, linewidth 2. - Text box (upper-left, white background): "r = X.XXX" (Pearson correlation, averaged across replicates) and "n = XXXXX" (mean pair count).
- Column titles: "Gen \(g\)".
- Row labels: "Trait \(k\) / Offspring Liability" (\(y\)-axis), "Midparent Liability" (\(x\)-axis, bottom row only).
Narrow-sense heritability by generation (heritability.by_generation.png)¶
\(1 \times 2\) figure (10 \(\times\) 5 in), one panel per trait.
- Blue dots: Per-replicate \(h^2 = \text{Var}(A) / (\text{Var}(A) + \text{Var}(C) + \text{Var}(E))\) for each generation, computed from per-generation variance components in
validation.yaml. - Orange dashed line: Configured heritability (\(A_k\) parameter), the expected \(h^2\).
- \(y\)-axis: \([0, 1]\), labelled \(h^2 = \text{Var}(A) / \text{Var}(L)\).
- \(x\)-axis: Generation number.
- Interpretation: Stable \(h^2\) across generations confirms that the ACE variance decomposition is maintained through the simulation. Founders (generation 1) have variance components set exactly; subsequent generations should converge to equivalent values under random mating.
Source: simace.plotting.plot_correlations.
Data from validation.yaml -> per_generation.
Broad-sense heritability by generation (broad_heritability.by_generation.png)¶
\(1 \times 2\) figure (10 \(\times\) 5 in), one panel per trait.
- Blue dots: Per-replicate \(H^2 = (\text{Var}(A) + \text{Var}(C)) / (\text{Var}(A) + \text{Var}(C) + \text{Var}(E))\) for each generation.
- Orange dashed line: Parametric value (\(A_k + C_k\)).
- \(y\)-axis: \([0, 1]\), labelled \(H^2 = (\text{Var}(A)+\text{Var}(C)) / \text{Var}(L)\).
- \(x\)-axis: Generation number.
- Interpretation: Comparing \(H^2\) with the narrow-sense \(h^2\) above isolates the contribution of shared environment to familial resemblance.
Source: simace.plotting.plot_correlations.
Data from validation.yaml -> per_generation.
Threshold Output¶
Every scenario also writes phenotype.simple_ltm.parquet, a binary
liability-threshold phenotype generated by simace.phenotyping.threshold.
This file is useful as a benchmark or export target when a pure threshold
phenotype is needed. The current simACE atlas is generated from
phenotype_stats.yaml and phenotype_samples.parquet, so separate Simple LTM
diagnostic plots are not part of the default scenario atlas.
Validation Summary Plots¶
Source: simace.plotting.plot_validation.
Generated from validation_summary.tsv across all scenarios in a folder. Seaborn "whitegrid" theme with "Set2" palette.
All strip plots share a common encoding:
- Blue dots (
C0, \(\alpha = 0.7\), jitter 0.15): Per-replicate observed values, one column per scenario on the \(x\)-axis. - Orange dash markers (
C1,_marker, size 200, linewidth 3): Expected value from configuration or theory, one per scenario. - \(x\)-axis labels: Rotated 45\(^\circ\) when \(>4\) scenarios.
Variance components (variance_components.png)¶
\(2 \times 3\) grid (width scales with scenario count, height 10 in). Rows = traits 1 and 2. Columns = \(A\), \(C\), \(E\).
- Blue dots: Observed founder-generation variance for each component.
- Orange dashes: Configured variance parameter.
- \(y\)-axis: "Variance Proportion".
Twin rate (twin_rate.png)¶
Single panel (width scales, height 5 in).
- Blue dots: Observed MZ twin individual rate per replicate.
- Orange dashes: Configured
p_mztwin. - \(y\)-axis: "Twin Rate".
- Title: "MZ Twin Rate: Observed vs Expected".
A-component correlations (correlations_A.png)¶
\(2 \times 2\) grid.
- Panel 1: MZ twin \(A_1\) correlation. Expected = 1.0.
- Panel 2: DZ (full-sibling) \(A_1\) correlation. Expected = 0.5.
- Panel 3: Half-sibling \(A_1\) correlation. Expected = 0.25.
- Panel 4: Midparent-offspring \(A_1\) \(R^2\). Expected = 0.5.
Each panel: blue dots + orange dashes + orange dashed horizontal reference line at the expected value.
Phenotype (liability) correlations (correlations_phenotype.png)¶
\(2 \times 2\) grid. Expected values are computed per-scenario from configured variance components:
- Panel 1: MZ twin liability\(_1\) correlation. Expected = \(A_1 + C_1\).
- Panel 2: DZ sibling liability\(_1\) correlation. Expected = \(0.5 A_1 + C_1\).
- Panel 3: Half-sibling liability\(_1\) correlation. Expected = \(0.25 A_1\).
- Panel 4: Midparent-offspring liability\(_1\) slope. Expected = \(A_1\).
Blue dots + orange dashes (scenario-specific expected, computed from config).
Heritability estimates (heritability_estimates.png)¶
\(2 \times 2\) grid. Rows = traits 1 and 2. Columns = estimation method.
- Left: Falconer's \(\hat{h}^2\) vs. configured \(A_k\).
- Right: Midparent-offspring liability slope vs. configured \(A_k\).
Blue dots + orange dashes.
Half-sibling proportions (half_sib_proportions.png)¶
\(1 \times 2\) figure.
- Left panel: Observed maternal half-sibling pair proportion. Orange dashes at expected \(1 - (1 - p_{\text{nonsocial}})^2\).
- Right panel: Proportion of offspring with at least one half-sibling (no expected reference).
Cross-trait correlations (cross_trait_correlations.png)¶
\(1 \times 3\) figure.
- Panel 1: Observed \(r_A\) vs. configured
rA. Blue dots + orange dashes. - Panel 2: Observed \(r_C\) vs. configured
rC. Blue dots + orange dashes. - Panel 3: Observed \(r_E\). No configured expected; orange dashed horizontal line at 0 (theoretical independence).
Family size (family_size.png)¶
Single panel (width scales, height 5 in).
- Blue dots (left-offset): Mean offspring per mother (among mothers with \(\geq 1\) child).
- Orange dots (
C3, right-offset): Mean offspring per father (among fathers with \(\geq 1\) child). - Orange dashes (centred): Expected ~2.0 (N / n_mothers for balanced sex ratio).
- Legend: "Mother" (blue circle), "Father" (orange circle), "Expected mean offspring" (orange dash).
Summary bias (summary_bias.png)¶
\(2 \times 3\) grid.
- Strip plots: Observed \(-\) expected for six metrics: \(A_1\) bias, \(C_1\) bias, \(E_1\) bias, twin rate bias, DZ \(A_1\) correlation bias (vs. 0.5), half-sibling \(A_1\) correlation bias (vs. 0.25).
- Red dashed horizontal line: Reference at 0 (no bias).
- \(y\)-axis: Bias (observed \(-\) expected).
Runtime (runtime.png)¶
Single log-log scatter (8 \(\times\) 6 in).
- Coloured dots: One per replicate, coloured by scenario (Set2 palette). \(x\) = population size \(N\), \(y\) = simulation wall-clock seconds.
- Axes: Log-scale with scalar (non-scientific) tick formatting.
- Legend: Scenario names.
Memory usage (memory.png)¶
Single log-log scatter (8 \(\times\) 6 in).
- Coloured dots: Same encoding as Runtime. \(y\) = peak resident set size (MB).
- Legend: Scenario names.