simace.analysis¶
validate¶
simace.analysis.validate
¶
ACE simulation validation.
Validates simulation outputs for structural integrity, statistical properties, and heritability estimates.
validate_structural
¶
Validate structural integrity of the pedigree.
Checks contiguous IDs, valid parent references, sex-parent consistency, and balanced sex ratio.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Pedigree DataFrame with columns id, sex, mother, father.
TYPE:
|
params
|
Scenario parameters; requires keys
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict of check-name to result dicts (keys: passed, details, …). |
Source code in simace/analysis/validate.py
validate_twins
¶
Validate MZ twin properties for two-trait simulation.
Checks bidirectional twin pointers, shared parents, identical A values
and sex for MZ pairs, and that the observed twin rate matches the
expected rate 2 * p_mztwin * eligible_fraction.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Pedigree DataFrame.
TYPE:
|
params
|
Scenario parameters; requires key
TYPE:
|
df_indexed
|
Pedigree DataFrame indexed by
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict of check-name to result dicts. |
Source code in simace/analysis/validate.py
143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 | |
validate_half_sibs
¶
Validate half-sibling structure under the mating-pair model.
Reports observed counts and proportions of full-sib, maternal half-sib, and paternal half-sib pairs as informational checks. With a zero-truncated Poisson mating model, both maternal and paternal half-sibs arise naturally when individuals have multiple partners.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Pedigree DataFrame with columns id, mother, father, twin.
TYPE:
|
params
|
Scenario parameters; requires key
TYPE:
|
sibling_pairs
|
Dict with keys
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict of check-name to result dicts. |
Source code in simace/analysis/validate.py
validate_consanguineous_matings
¶
Detect consanguineous matings and reconcile grandparent-link discrepancy.
When pair_partners() randomly pairs individuals, half-siblings (or
full siblings) may be matched. Their offspring have fewer than 4
distinct grandparents, which reduces the grandparent-grandchild pair
count relative to the naive expectation of 4 × n_eligible.
This check: 1. Identifies all mating pairs where partners share one or both parents. 2. Computes the expected and observed grandparent-grandchild pair counts. 3. Verifies that the discrepancy is fully explained by consanguineous matings.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Pedigree DataFrame with columns id, mother, father.
TYPE:
|
params
|
Scenario parameters (accepted for API consistency).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict of check-name to result dicts. |
Source code in simace/analysis/validate.py
358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 | |
validate_statistical
¶
Validate statistical properties of variance components for two traits.
Checks founder variances for A, C, E against configured values, total variance close to 1.0, cross-trait correlations (rA, rC, rE), C sharing within households, and E independence between siblings.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Pedigree DataFrame with variance-component columns A1, C1, E1, A2, C2, E2.
TYPE:
|
params
|
Scenario parameters; requires keys
TYPE:
|
df_indexed
|
Pedigree DataFrame indexed by
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict of check-name to result dicts. |
Source code in simace/analysis/validate.py
462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 | |
validate_heritability
¶
Validate heritability estimates for two-trait simulation.
Computes MZ twin and DZ sibling liability correlations, Falconer
heritability estimates h² = 2(r_MZ - r_DZ), and midparent-offspring
regressions, comparing each to expected values derived from the
configured A parameters.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Pedigree DataFrame.
TYPE:
|
params
|
Scenario parameters; requires keys
TYPE:
|
df_indexed
|
Pedigree DataFrame indexed by
TYPE:
|
sibling_pairs
|
Dict with keys
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict of check-name to result dicts, including MZ/DZ correlations, |
dict[str, Any]
|
Falconer estimates, and parent-offspring regression slopes. |
Source code in simace/analysis/validate.py
compute_per_generation_stats
¶
Compute per-generation statistics for two traits.
For each generation, computes liability mean/variance/sd and per-component (A, C, E) mean/variance for both traits.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Pedigree DataFrame with columns id, A1, C1, E1, A2, C2, E2.
TYPE:
|
params
|
Scenario parameters; requires keys
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict keyed by |
dict[str, Any]
|
summary statistics (n, liability mean/variance/sd, component mean/var). |
Source code in simace/analysis/validate.py
validate_population
¶
Validate population-level properties.
Checks that each generation has exactly N individuals, the number of
generations equals G_ped, and the mean offspring per mother is
approximately N / n_females (always ~2.0 for balanced sex ratios).
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Pedigree DataFrame with columns id and mother.
TYPE:
|
params
|
Scenario parameters; requires keys
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict of check-name to result dicts. |
Source code in simace/analysis/validate.py
compute_family_size_distribution
¶
Compute offspring count distributions per parent sex.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Pedigree DataFrame with columns mother and father.
TYPE:
|
params
|
Scenario parameters (unused but accepted for API consistency).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict with keys |
dict[str, Any]
|
of summary statistics (mean, median, std, n_parents). Empty dict if |
dict[str, Any]
|
no non-founders exist. |
Source code in simace/analysis/validate.py
validate_assortative_mating
¶
Validate mate correlation on liability when assortative mating is configured.
Extracts unique mating pairs from non-founders, computes Pearson
correlation of mother and father liability for each trait, and checks
against the configured assort1 / assort2 parameters.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Pedigree DataFrame.
TYPE:
|
params
|
Scenario parameters; uses keys
TYPE:
|
df_indexed
|
Pedigree DataFrame indexed by
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict of check-name to result dicts. |
Source code in simace/analysis/validate.py
971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 | |
validate_effective_size
¶
Validate Ne observed-vs-expected for the eight estimators.
Each estimator entry in stats["effective_size"] (as written by
:func:simace.analysis.stats.compute_effective_size) supplies an
expected field (None under non-standard configs) and a
scalar ne. A check passes when either expected is None
(vacuous), or abs(ne / expected − 1) < 0.20.
| PARAMETER | DESCRIPTION |
|---|---|
stats
|
Loaded
TYPE:
|
params
|
Per-rep params.yaml dict (unused, accepted for parity with other validators).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict keyed on estimator name with |
dict[str, Any]
|
|
Source code in simace/analysis/validate.py
run_validation
¶
Run all validation checks and return results.
Loads a pedigree and its parameters, then runs structural, twin, half-sibling, statistical, heritability, and population checks.
| PARAMETER | DESCRIPTION |
|---|---|
pedigree_path
|
Path to the pedigree parquet file.
TYPE:
|
params_path
|
Path to the scenario parameters YAML file.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Nested dict with keys |
dict[str, Any]
|
|
dict[str, Any]
|
|
dict[str, Any]
|
and |
dict[str, Any]
|
|
dict[str, Any]
|
|
Source code in simace/analysis/validate.py
1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 | |
cli
¶
Command-line interface for running validation.
Source code in simace/analysis/validate.py
stats¶
simace.analysis.stats
¶
Compute per-rep phenotype statistics for downstream plotting.
Reads a single phenotype.parquet and produces: - phenotype_stats.yaml: scalar and array statistics - phenotype_samples.parquet: downsampled rows for scatter/histogram plots
Public API is re-exported from focused sub-modules:
- :mod:
.tetrachoric— tetrachoric primitives - :mod:
.correlations— pairwise relationship correlations, parent-offspring regressions, observed h² estimators, mate correlation - :mod:
.incidence— prevalence, mortality, cumulative incidence, regression, joint affection - :mod:
.censoring— censoring windows, confusion, cascade, person-years - :mod:
.pedigree— family size, parent presence - :mod:
.sampling— per-rep downsampling for plots - :mod:
.runner—mainandclientry point
compute_censoring_cascade
¶
Per-trait, per-generation decomposition of true cases by censoring fate.
Source code in simace/analysis/stats/censoring.py
compute_censoring_confusion
¶
Compute per-trait 2x2 confusion matrix: true affected vs. observed affected.
Only includes individuals from phenotyped generations (observation window > 0).
Source code in simace/analysis/stats/censoring.py
compute_censoring_windows
¶
Compute per-generation censoring breakdown and incidence curves.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Phenotype DataFrame with generation, event time, and censoring columns.
TYPE:
|
censor_age
|
Maximum observation age.
TYPE:
|
gen_censoring
|
Dict mapping generation to
TYPE:
|
n_points
|
Number of age grid points for incidence curves.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any] | None
|
Dict with per-generation censoring stats, or None if no generation column. |
Source code in simace/analysis/stats/censoring.py
compute_person_years
¶
Compute person-years of follow-up, total and per-trait at-risk.
For each individual in generation g with observation window [lo, hi]: - Total follow-up ends at min(death_age, hi). - Trait-specific at-risk time ends at min(t_observed, death_age, hi).
Returns dict with total_person_years and per-trait person_years_at_risk.
Source code in simace/analysis/stats/censoring.py
compute_affected_correlations
¶
Compute Pearson correlations on binary affected status per pair type and trait.
This is the phi coefficient — Pearson r on {0, 1} data — and is the input
to observed-scale Falconer-style h² estimators (e.g. 2·(r_MZ − r_FS)).
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Phenotype DataFrame with
TYPE:
|
seed
|
Random seed (unused, kept for API consistency).
TYPE:
|
pairs
|
Pre-extracted relationship pairs.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict keyed by |
dict[str, Any]
|
None (if fewer than 10 pairs, or either side is constant). |
Source code in simace/analysis/stats/correlations.py
compute_cross_trait_tetrachoric
¶
Compute cross-trait tetrachoric correlations (trait 1 vs trait 2).
Includes same-person, same-person-by-generation, and cross-person (across relationship pair types) correlations.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Phenotype DataFrame with binary affection columns for both traits.
TYPE:
|
seed
|
Random seed (unused, kept for API consistency).
TYPE:
|
pairs
|
Pre-extracted relationship pairs.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict with keys |
dict[str, Any]
|
and |
Source code in simace/analysis/stats/correlations.py
compute_liability_correlations
¶
Compute Pearson liability correlations per pair type and trait.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Phenotype DataFrame with liability columns.
TYPE:
|
seed
|
Random seed (unused, kept for API consistency).
TYPE:
|
pairs
|
Pre-extracted relationship pairs.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict keyed by |
Source code in simace/analysis/stats/correlations.py
compute_mate_correlation
¶
Compute 2x2 Pearson correlation matrix between mated pairs' liabilities.
Each unique (mother, father) pair is counted once (not weighted by offspring). Only non-founders are considered.
Source code in simace/analysis/stats/correlations.py
compute_observed_h2_estimators
¶
Derive five naive observed-scale h² estimators from precomputed correlations.
Reads from stats["affected_correlations"] (phi r per pair type) and
stats["parent_offspring_affected_corr"] (PO regression slope on binary).
Each estimator is a closed-form combination that, under a liability-threshold
model, is an unbiased estimator of h²_liab · z(K)²/(K(1−K)) — i.e. the
observed-scale h² — where K is the affected-status prevalence.
| PARAMETER | DESCRIPTION |
|---|---|
stats
|
The in-progress stats dict with
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict keyed |
dict[str, Any]
|
float or None: |
Source code in simace/analysis/stats/correlations.py
compute_parent_offspring_affected_corr
¶
Compute pooled midparent-offspring regression on binary affected status.
Regresses offspring.affected (0/1) on midparent affected status
(mother.affected + father.affected) / 2 (values in {0, 0.5, 1}),
pooled across every non-founder individual whose parents are both in the
DataFrame. The regression slope is the observed-scale PO heritability
estimator; under LTM it can be back-transformed to liability via
Dempster-Lerner.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Phenotype DataFrame with
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict keyed |
dict[str, Any]
|
|
dict[str, Any]
|
None when fewer than 10 valid trios or midparent has zero variance. |
Source code in simace/analysis/stats/correlations.py
compute_parent_offspring_corr
¶
Compute midparent-offspring liability regression per generation and trait.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Phenotype DataFrame with liability, generation, and parent columns.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict keyed by |
dict[str, Any]
|
regression stats (slope, r, r2, intercept, stderr, pvalue, n_pairs). |
Source code in simace/analysis/stats/correlations.py
compute_parent_offspring_corr_by_sex
¶
Compute midparent-offspring regression partitioned by offspring sex.
Returns dict keyed by "female"/"male", each containing per-trait per-generation {slope, r, r2, intercept, stderr, pvalue, n_pairs}.
Source code in simace/analysis/stats/correlations.py
compute_tetrachoric
¶
Compute tetrachoric correlations per pair type and trait.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Phenotype DataFrame with binary affection columns.
TYPE:
|
seed
|
Random seed (unused, kept for API consistency).
TYPE:
|
pairs
|
Pre-extracted relationship pairs.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict keyed by |
dict[str, Any]
|
|
Source code in simace/analysis/stats/correlations.py
compute_tetrachoric_by_generation
¶
Compute tetrachoric correlations stratified by generation.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Phenotype DataFrame with generation and affection columns.
TYPE:
|
seed
|
Random seed (unused, kept for API consistency).
TYPE:
|
pairs
|
Pre-extracted relationship pairs.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict keyed by |
dict[str, Any]
|
|
Source code in simace/analysis/stats/correlations.py
compute_tetrachoric_by_sex
¶
Compute tetrachoric correlations for same-sex pairs only (FF and MM).
Returns dict keyed by "female"/"male", each containing per-trait per-pair-type {r, se, n_pairs, liability_r}.
Source code in simace/analysis/stats/correlations.py
compute_effective_size
¶
Run all eight Ne estimators on pedigree and serialize to dicts.
| PARAMETER | DESCRIPTION |
|---|---|
pedigree
|
Either a pandas DataFrame with the standard pedigree
columns or an already-built :class:
TYPE:
|
config
|
Per-rep params (e.g. loaded from
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, dict[str, Any]]
|
Dict keyed on estimator name; each value is the matching |
dict[str, dict[str, Any]]
|
dataclass's |
dict[str, dict[str, Any]]
|
field ( |
Source code in simace/analysis/stats/effective_size.py
ne_v_expected_ztp
¶
Closed-form Ne_V expectation under simACE's mating model.
Under random mating with balanced 50/50 sex, ZTP(λ) mating counts per individual, and multinomial allocation of N offspring across the resulting matings, the per-individual total-offspring count has
``E[k] = 2``,
``V(k) = 2 + 4 · Var[m] / E[m]²``,
where m ~ ZTP(λ) with
``E[m] = λ / (1 − e^(−λ))``,
``Var[m] = E[m] · (1 + λ) − E[m]²``.
Plugging into Ne_V = 2N / V(k) yields
``Ne_V = N / (1 + 2 · Var[m] / E[m]²)``.
The formula is exact in the multinomial → Poisson per-mating
offspring limit (large M); finite-sample correction is
O(1 / number_of_matings).
Limits
λ → 0⁺(degenerate at m=1, monogamous):Ne_V = N.λ → ∞(Poisson, no truncation):Ne_V = N.- Default
λ = 0.5:Ne_V ≈ 0.7349 · N.
Source code in simace/analysis/stats/effective_size.py
regression_estimator_regime_ok
¶
Whether the regression-based Ne estimators are reliable at this scale.
The slope estimate in Ne_I, Ne_C, and Ne_CT has variance
∝ 1/(N·G³); inverting the slope to get Ne incurs a Jensen bias
that scales as Ne_V² / (N · G²). We declare the regime
acceptable when the implied bias on Ne is below ~20 % of Ne_V,
which corresponds to N · G² ≥ 120 · Ne_V.
Returns False for g_ped < 2 (no slope possible) regardless
of N.
Source code in simace/analysis/stats/effective_size.py
theoretical_expectations
¶
Closed-form Ne expectations under standard random-mating assumptions.
Returns a per-estimator dict. Every entry is None when config
is missing, N is unknown, or the configuration includes a
non-standard knob (currently: nonzero assort1 / assort2).
Under random mating with 50/50 sex and ZTP(mating_lambda) family
allocation, the family-size variance correction reduces
Ne_V-family estimators below N per :func:ne_v_expected_ztp.
Three estimators (Ne_V, Ne_iΔF, Ne_H) inherit that expectation
directly — their finite-sample bias is O(1/N) and negligible at
realistic simACE scales.
Three regression-based estimators (Ne_I, Ne_C, Ne_CT) carry a Jensen
bias on the inverted slope of order Ne_V² / (N · G²) that
typically dominates at simACE's default G_ped = 6. We return
their expectation only when
:func:regression_estimator_regime_ok is satisfied, otherwise
None (validator passes vacuously).
Ne_sr stays at N (deterministic balanced sex ratio). Ne_LTC
under the Ne = 1/(2·Σc²) form is approximated as
Ne_V_expected / 2 — consistent with the WF limit where
Ne_V → N and Ne_LTC → N/2. In practice the observed
Ne_LTC is typically None (asymptote not reached within
G_ped generations under realized WF noise), so the validator
passes vacuously.
Source code in simace/analysis/stats/effective_size.py
compute_cumulative_incidence
¶
Compute observed and true cumulative incidence curves per trait.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Phenotype DataFrame with event time and affection columns.
TYPE:
|
censor_age
|
Maximum observation age for the x-axis grid.
TYPE:
|
n_points
|
Number of age grid points.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict keyed by |
dict[str, Any]
|
|
Source code in simace/analysis/stats/incidence.py
compute_cumulative_incidence_by_sex
¶
Compute cumulative incidence curves split by sex (0=female, 1=male).
Source code in simace/analysis/stats/incidence.py
compute_cumulative_incidence_by_sex_generation
¶
Compute cumulative incidence curves split by sex and generation.
Source code in simace/analysis/stats/incidence.py
compute_joint_affection
¶
Compute 2x2 contingency table for trait1 x trait2 affection status.
Source code in simace/analysis/stats/incidence.py
compute_mortality
¶
Compute decade-binned mortality rates from death ages.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Phenotype DataFrame with
TYPE:
|
censor_age
|
Maximum observation age.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict with |
Source code in simace/analysis/stats/incidence.py
compute_prevalence
¶
Compute observed prevalence for each trait.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Phenotype DataFrame with
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict with |
dict[str, Any]
|
(when |
dict[str, Any]
|
|
Source code in simace/analysis/stats/incidence.py
compute_regression
¶
Regress observed event time on liability for affected individuals.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Phenotype DataFrame with liability and observed-time columns.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict keyed by |
dict[str, Any]
|
(slope, intercept, r, r2, stderr, pvalue, n) or None. |
Source code in simace/analysis/stats/incidence.py
compute_mean_family_size
¶
Compute mean realised family size (offspring per mating pair).
Uses non-founder individuals (mother != -1) grouped by (mother, father).
Source code in simace/analysis/stats/pedigree.py
compute_parent_status
¶
Count individuals by number of parents phenotyped and in pedigree.
Returns dict with 'phenotyped' and optionally 'in_pedigree', each mapping 0/1/2 → count of individuals with that many parents present.
Source code in simace/analysis/stats/pedigree.py
cli
¶
Command-line interface for phenotype statistics computation.
Source code in simace/analysis/stats/runner.py
main
¶
main(phenotype_path, censor_age, stats_output, samples_output, seed=42, gen_censoring=None, pedigree_path=None, max_degree=2, case_ascertainment_ratio=1.0, params=None)
Compute all stats for a single rep and write outputs.
Source code in simace/analysis/stats/runner.py
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 | |
create_sample
¶
Downsample for scatter/histogram plots, preserving parent rows.
Source code in simace/analysis/stats/sampling.py
tetrachoric_corr
¶
Return the MLE tetrachoric correlation between two binary arrays.
| PARAMETER | DESCRIPTION |
|---|---|
a
|
First binary array.
TYPE:
|
b
|
Second binary array, same length as a.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
float
|
Tetrachoric correlation coefficient. |
Source code in simace/analysis/stats/tetrachoric.py
tetrachoric_corr_se
¶
Estimate tetrachoric correlation and SE from two binary arrays via MLE.
Delegates the numerical work (Brent optimization + bivariate normal CDF)
to the numba-jitted _tetrachoric_core for speed.
Source code in simace/analysis/stats/tetrachoric.py
stats.tetrachoric¶
simace.analysis.stats.tetrachoric
¶
Tetrachoric correlation primitives.
Low-level helpers for tetrachoric MLE on binary arrays plus the
_tetrachoric_for_pairs pair-subset helper used across pairwise-correlation
computations.
tetrachoric_corr
¶
Return the MLE tetrachoric correlation between two binary arrays.
| PARAMETER | DESCRIPTION |
|---|---|
a
|
First binary array.
TYPE:
|
b
|
Second binary array, same length as a.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
float
|
Tetrachoric correlation coefficient. |
Source code in simace/analysis/stats/tetrachoric.py
tetrachoric_corr_se
¶
Estimate tetrachoric correlation and SE from two binary arrays via MLE.
Delegates the numerical work (Brent optimization + bivariate normal CDF)
to the numba-jitted _tetrachoric_core for speed.
Source code in simace/analysis/stats/tetrachoric.py
stats.correlations¶
simace.analysis.stats.correlations
¶
Pairwise relationship correlations, parent-offspring regressions, and h² estimators.
Covers liability/affected pair correlations, tetrachoric correlations across pair types (overall, by generation, by sex, cross-trait), midparent-offspring regressions (overall, by sex, on affected status), the closed-form observed-scale h² estimators derived from those correlations, and the mate-pair correlation matrix.
compute_liability_correlations
¶
Compute Pearson liability correlations per pair type and trait.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Phenotype DataFrame with liability columns.
TYPE:
|
seed
|
Random seed (unused, kept for API consistency).
TYPE:
|
pairs
|
Pre-extracted relationship pairs.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict keyed by |
Source code in simace/analysis/stats/correlations.py
compute_affected_correlations
¶
Compute Pearson correlations on binary affected status per pair type and trait.
This is the phi coefficient — Pearson r on {0, 1} data — and is the input
to observed-scale Falconer-style h² estimators (e.g. 2·(r_MZ − r_FS)).
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Phenotype DataFrame with
TYPE:
|
seed
|
Random seed (unused, kept for API consistency).
TYPE:
|
pairs
|
Pre-extracted relationship pairs.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict keyed by |
dict[str, Any]
|
None (if fewer than 10 pairs, or either side is constant). |
Source code in simace/analysis/stats/correlations.py
compute_tetrachoric
¶
Compute tetrachoric correlations per pair type and trait.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Phenotype DataFrame with binary affection columns.
TYPE:
|
seed
|
Random seed (unused, kept for API consistency).
TYPE:
|
pairs
|
Pre-extracted relationship pairs.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict keyed by |
dict[str, Any]
|
|
Source code in simace/analysis/stats/correlations.py
compute_tetrachoric_by_generation
¶
Compute tetrachoric correlations stratified by generation.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Phenotype DataFrame with generation and affection columns.
TYPE:
|
seed
|
Random seed (unused, kept for API consistency).
TYPE:
|
pairs
|
Pre-extracted relationship pairs.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict keyed by |
dict[str, Any]
|
|
Source code in simace/analysis/stats/correlations.py
compute_cross_trait_tetrachoric
¶
Compute cross-trait tetrachoric correlations (trait 1 vs trait 2).
Includes same-person, same-person-by-generation, and cross-person (across relationship pair types) correlations.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Phenotype DataFrame with binary affection columns for both traits.
TYPE:
|
seed
|
Random seed (unused, kept for API consistency).
TYPE:
|
pairs
|
Pre-extracted relationship pairs.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict with keys |
dict[str, Any]
|
and |
Source code in simace/analysis/stats/correlations.py
compute_tetrachoric_by_sex
¶
Compute tetrachoric correlations for same-sex pairs only (FF and MM).
Returns dict keyed by "female"/"male", each containing per-trait per-pair-type {r, se, n_pairs, liability_r}.
Source code in simace/analysis/stats/correlations.py
compute_parent_offspring_corr
¶
Compute midparent-offspring liability regression per generation and trait.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Phenotype DataFrame with liability, generation, and parent columns.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict keyed by |
dict[str, Any]
|
regression stats (slope, r, r2, intercept, stderr, pvalue, n_pairs). |
Source code in simace/analysis/stats/correlations.py
compute_parent_offspring_affected_corr
¶
Compute pooled midparent-offspring regression on binary affected status.
Regresses offspring.affected (0/1) on midparent affected status
(mother.affected + father.affected) / 2 (values in {0, 0.5, 1}),
pooled across every non-founder individual whose parents are both in the
DataFrame. The regression slope is the observed-scale PO heritability
estimator; under LTM it can be back-transformed to liability via
Dempster-Lerner.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Phenotype DataFrame with
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict keyed |
dict[str, Any]
|
|
dict[str, Any]
|
None when fewer than 10 valid trios or midparent has zero variance. |
Source code in simace/analysis/stats/correlations.py
compute_parent_offspring_corr_by_sex
¶
Compute midparent-offspring regression partitioned by offspring sex.
Returns dict keyed by "female"/"male", each containing per-trait per-generation {slope, r, r2, intercept, stderr, pvalue, n_pairs}.
Source code in simace/analysis/stats/correlations.py
compute_observed_h2_estimators
¶
Derive five naive observed-scale h² estimators from precomputed correlations.
Reads from stats["affected_correlations"] (phi r per pair type) and
stats["parent_offspring_affected_corr"] (PO regression slope on binary).
Each estimator is a closed-form combination that, under a liability-threshold
model, is an unbiased estimator of h²_liab · z(K)²/(K(1−K)) — i.e. the
observed-scale h² — where K is the affected-status prevalence.
| PARAMETER | DESCRIPTION |
|---|---|
stats
|
The in-progress stats dict with
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict keyed |
dict[str, Any]
|
float or None: |
Source code in simace/analysis/stats/correlations.py
compute_mate_correlation
¶
Compute 2x2 Pearson correlation matrix between mated pairs' liabilities.
Each unique (mother, father) pair is counted once (not weighted by offspring). Only non-founders are considered.
Source code in simace/analysis/stats/correlations.py
stats.incidence¶
simace.analysis.stats.incidence
¶
Prevalence, mortality, cumulative incidence, and joint-affection statistics.
compute_mortality
¶
Compute decade-binned mortality rates from death ages.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Phenotype DataFrame with
TYPE:
|
censor_age
|
Maximum observation age.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict with |
Source code in simace/analysis/stats/incidence.py
compute_cumulative_incidence
¶
Compute observed and true cumulative incidence curves per trait.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Phenotype DataFrame with event time and affection columns.
TYPE:
|
censor_age
|
Maximum observation age for the x-axis grid.
TYPE:
|
n_points
|
Number of age grid points.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict keyed by |
dict[str, Any]
|
|
Source code in simace/analysis/stats/incidence.py
compute_regression
¶
Regress observed event time on liability for affected individuals.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Phenotype DataFrame with liability and observed-time columns.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict keyed by |
dict[str, Any]
|
(slope, intercept, r, r2, stderr, pvalue, n) or None. |
Source code in simace/analysis/stats/incidence.py
compute_prevalence
¶
Compute observed prevalence for each trait.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Phenotype DataFrame with
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict with |
dict[str, Any]
|
(when |
dict[str, Any]
|
|
Source code in simace/analysis/stats/incidence.py
compute_joint_affection
¶
Compute 2x2 contingency table for trait1 x trait2 affection status.
Source code in simace/analysis/stats/incidence.py
compute_cumulative_incidence_by_sex
¶
Compute cumulative incidence curves split by sex (0=female, 1=male).
Source code in simace/analysis/stats/incidence.py
compute_cumulative_incidence_by_sex_generation
¶
Compute cumulative incidence curves split by sex and generation.
Source code in simace/analysis/stats/incidence.py
stats.censoring¶
simace.analysis.stats.censoring
¶
Censoring window, confusion-matrix, cascade, and person-years statistics.
compute_censoring_windows
¶
Compute per-generation censoring breakdown and incidence curves.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Phenotype DataFrame with generation, event time, and censoring columns.
TYPE:
|
censor_age
|
Maximum observation age.
TYPE:
|
gen_censoring
|
Dict mapping generation to
TYPE:
|
n_points
|
Number of age grid points for incidence curves.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any] | None
|
Dict with per-generation censoring stats, or None if no generation column. |
Source code in simace/analysis/stats/censoring.py
compute_censoring_confusion
¶
Compute per-trait 2x2 confusion matrix: true affected vs. observed affected.
Only includes individuals from phenotyped generations (observation window > 0).
Source code in simace/analysis/stats/censoring.py
compute_censoring_cascade
¶
Per-trait, per-generation decomposition of true cases by censoring fate.
Source code in simace/analysis/stats/censoring.py
compute_person_years
¶
Compute person-years of follow-up, total and per-trait at-risk.
For each individual in generation g with observation window [lo, hi]: - Total follow-up ends at min(death_age, hi). - Trait-specific at-risk time ends at min(t_observed, death_age, hi).
Returns dict with total_person_years and per-trait person_years_at_risk.
Source code in simace/analysis/stats/censoring.py
stats.pedigree¶
simace.analysis.stats.pedigree
¶
Pedigree-structure summaries: family size and parent presence.
compute_mean_family_size
¶
Compute mean realised family size (offspring per mating pair).
Uses non-founder individuals (mother != -1) grouped by (mother, father).
Source code in simace/analysis/stats/pedigree.py
compute_parent_status
¶
Count individuals by number of parents phenotyped and in pedigree.
Returns dict with 'phenotyped' and optionally 'in_pedigree', each mapping 0/1/2 → count of individuals with that many parents present.
Source code in simace/analysis/stats/pedigree.py
stats.sampling¶
simace.analysis.stats.sampling
¶
Per-rep downsampling for scatter/histogram plot inputs.
create_sample
¶
Downsample for scatter/histogram plots, preserving parent rows.
Source code in simace/analysis/stats/sampling.py
stats.runner¶
simace.analysis.stats.runner
¶
Orchestration entry point for per-rep phenotype statistics.
Reads a single phenotype.parquet, runs every stats computation, and writes
phenotype_stats.yaml plus phenotype_samples.parquet.
main
¶
main(phenotype_path, censor_age, stats_output, samples_output, seed=42, gen_censoring=None, pedigree_path=None, max_degree=2, case_ascertainment_ratio=1.0, params=None)
Compute all stats for a single rep and write outputs.
Source code in simace/analysis/stats/runner.py
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 | |
cli
¶
Command-line interface for phenotype statistics computation.
Source code in simace/analysis/stats/runner.py
gather¶
simace.analysis.gather
¶
Gather validation results from all scenarios into a single TSV file.
extract_metrics
¶
Extract key metrics from a validation YAML file.
Source code in simace/analysis/gather.py
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 | |
main
¶
Gather all validation results into a TSV file.
Source code in simace/analysis/gather.py
cli
¶
Command-line interface for gathering validation results.