simace.phenotyping¶

Phenotype models: frailty, threshold, cure, first-passage, and bimodal variants.

phenotype¶

simace.phenotyping.phenotype ¶

Phenotype simulation for two correlated traits.

Each trait is simulated by one of the four model families registered in simace.phenotyping.models:

frailty — proportional hazards frailty (baseline hazards live in simace.phenotyping.hazards).
cure_frailty — mixture cure model: threshold determines case status, frailty determines onset time among cases.
adult — ADuLT age-dependent liability threshold.
first_passage — inverse-Gaussian first-passage time.

Adding a new model is a single new file under simace/phenotyping/models/ plus one entry in simace/phenotyping/models/__init__.py's MODELS dict.

run_phenotype ¶

run_phenotype(pedigree, *, G_pheno, seed, standardize, phenotype_model1, phenotype_model2, beta1, beta_sex1, phenotype_params1, beta2, beta_sex2, phenotype_params2)

Simulate phenotype event times for two correlated traits.

Per-trait prevalence (for adult / cure_frailty) lives inside phenotype_params{N}; frailty / first_passage do not carry one.

PARAMETER	DESCRIPTION
`pedigree`	DataFrame with `liability1`, `liability2`, `generation`, `sex`, plus the genealogy columns preserved on output. TYPE: `DataFrame`
`G_pheno`	number of trailing generations to phenotype. TYPE: `int`
`seed`	RNG seed (trait 2 uses `seed + 100`). TYPE: `int`
`standardize`	global liability-standardization mode applied to threshold-style consumers (`threshold`, `adult.ltm`, `cure_frailty`'s threshold step). One of `"none" \| "global" \| "per_generation"`; legacy bools accepted (`True` → `"global"`, `False` → `"none"`). Hazard-bearing models (`frailty`, `cure_frailty`, `first_passage`, `adult.cox`) inherit this for their hazard step unless the per-trait `phenotype_params{N}["standardize_hazard"]` overrides it. TYPE: `StandardizeMode \| bool`
`phenotype_model1`	trait-1 model family (`frailty`, `cure_frailty`, `adult`, `first_passage`). TYPE: `str`
`phenotype_model2`	trait-2 model family (same options). TYPE: `str`
`beta1`	trait-1 liability → log-hazard slope. TYPE: `float`
`beta_sex1`	trait-1 sex → log-hazard slope. TYPE: `float`
`phenotype_params1`	trait-1 model-specific parameter dict (e.g. `{"distribution": "weibull", "scale": ..., "rho": ...}`; optionally `"standardize_hazard": "..."`). TYPE: `dict`
`beta2`	trait-2 liability → log-hazard slope. TYPE: `float`
`beta_sex2`	trait-2 sex → log-hazard slope. TYPE: `float`
`phenotype_params2`	trait-2 model-specific parameter dict. TYPE: `dict`

RETURNS	DESCRIPTION
`DataFrame`	Phenotype DataFrame with columns `t1`, `t2` (raw event times)
`DataFrame`	plus the preserved pedigree columns.

Source code in simace/phenotyping/phenotype.py

@stage(reads=PEDIGREE, writes=PHENOTYPE)
def run_phenotype(
    pedigree: pd.DataFrame,
    *,
    G_pheno: int,
    seed: int,
    standardize: StandardizeMode | bool,
    phenotype_model1: str,
    phenotype_model2: str,
    beta1: float,
    beta_sex1: float,
    phenotype_params1: dict,
    beta2: float,
    beta_sex2: float,
    phenotype_params2: dict,
) -> pd.DataFrame:
    """Simulate phenotype event times for two correlated traits.

    Per-trait prevalence (for adult / cure_frailty) lives inside
    ``phenotype_params{N}``; frailty / first_passage do not carry one.

    Args:
        pedigree: DataFrame with ``liability1``, ``liability2``, ``generation``,
            ``sex``, plus the genealogy columns preserved on output.
        G_pheno: number of trailing generations to phenotype.
        seed: RNG seed (trait 2 uses ``seed + 100``).
        standardize: global liability-standardization mode applied to
            threshold-style consumers (``threshold``, ``adult.ltm``,
            ``cure_frailty``'s threshold step). One of ``"none" | "global" |
            "per_generation"``; legacy bools accepted (``True`` → ``"global"``,
            ``False`` → ``"none"``). Hazard-bearing models (``frailty``,
            ``cure_frailty``, ``first_passage``, ``adult.cox``) inherit this
            for their hazard step unless the per-trait
            ``phenotype_params{N}["standardize_hazard"]`` overrides it.
        phenotype_model1: trait-1 model family (``frailty``, ``cure_frailty``,
            ``adult``, ``first_passage``).
        phenotype_model2: trait-2 model family (same options).
        beta1: trait-1 liability → log-hazard slope.
        beta_sex1: trait-1 sex → log-hazard slope.
        phenotype_params1: trait-1 model-specific parameter dict (e.g.
            ``{"distribution": "weibull", "scale": ..., "rho": ...}``;
            optionally ``"standardize_hazard": "..."``).
        beta2: trait-2 liability → log-hazard slope.
        beta_sex2: trait-2 sex → log-hazard slope.
        phenotype_params2: trait-2 model-specific parameter dict.

    Returns:
        Phenotype DataFrame with columns ``t1``, ``t2`` (raw event times)
        plus the preserved pedigree columns.
    """
    logger.info("Running phenotype simulation for %d individuals", len(pedigree))
    t0 = time.perf_counter()

    max_gen = pedigree["generation"].max()
    min_gen = max_gen - G_pheno + 1
    if min_gen < 0:
        raise ValueError(f"G_pheno ({G_pheno}) exceeds available generations ({max_gen + 1})")
    pedigree = pedigree[pedigree["generation"] >= min_gen].reset_index(drop=True)

    sex = pedigree["sex"].to_numpy() if "sex" in pedigree.columns else None
    generation = pedigree["generation"].to_numpy()
    t1 = _simulate_one_trait(
        pedigree,
        trait_num=1,
        model_name=phenotype_model1,
        phenotype_params=phenotype_params1,
        beta=beta1,
        beta_sex=beta_sex1,
        seed=seed,
        standardize=standardize,
        sex=sex,
        generation=generation,
    )
    t2 = _simulate_one_trait(
        pedigree,
        trait_num=2,
        model_name=phenotype_model2,
        phenotype_params=phenotype_params2,
        beta=beta2,
        beta_sex=beta_sex2,
        seed=seed + 100,
        standardize=standardize,
        sex=sex,
        generation=generation,
    )

    phenotype = pedigree.assign(t1=t1, t2=t2)

    logger.info(
        "Phenotype simulation complete in %.1fs: %d individuals",
        time.perf_counter() - t0,
        len(phenotype),
    )
    return phenotype

cli ¶

cli()

Command-line entry point for phenotype simulation.

Eager-registration scheme: every model's flag set is registered up front so --help shows them all in clearly-labeled per-family argument groups. Each model's from_cli rejects flags belonging to a different family when invoked alongside that model's selection.

Source code in simace/phenotyping/phenotype.py

def cli() -> None:
    """Command-line entry point for phenotype simulation.

    Eager-registration scheme: every model's flag set is registered up
    front so ``--help`` shows them all in clearly-labeled per-family
    argument groups. Each model's ``from_cli`` rejects flags belonging to
    a different family when invoked alongside that model's selection.
    """
    from simace.core.cli_base import add_logging_args, init_logging

    parser = argparse.ArgumentParser(
        description="Simulate phenotype event times for two correlated traits",
        formatter_class=argparse.ArgumentDefaultsHelpFormatter,
    )
    add_logging_args(parser)
    parser.add_argument("--pedigree", required=True)
    parser.add_argument("--output", required=True)
    parser.add_argument("--seed", type=int, default=42)
    parser.add_argument("--G-pheno", type=int, default=3)
    parser.add_argument(
        "--standardize",
        choices=list(STANDARDIZE_CHOICES),
        default="global",
        help="Liability standardization mode (default: global)",
    )

    for trait in (1, 2):
        shared = parser.add_argument_group(f"Trait {trait} — shared")
        shared.add_argument(
            f"--phenotype-model{trait}",
            choices=sorted(MODELS),
            required=True,
            help=f"Phenotype model family for trait {trait}",
        )
        shared.add_argument(f"--beta{trait}", type=float, default=1.0)
        shared.add_argument(f"--beta-sex{trait}", type=float, default=0.0)
        for model_cls in MODELS.values():
            model_cls.add_cli_args(parser, trait)

    args = parser.parse_args()
    init_logging(args)

    kwargs: dict[str, Any] = {
        "G_pheno": args.G_pheno,
        "seed": args.seed,
        "standardize": args.standardize,
    }
    for trait in (1, 2):
        model_name = getattr(args, f"phenotype_model{trait}")
        model_cls = MODELS[model_name]
        instance = model_cls.from_cli(args, trait)
        kwargs[f"phenotype_model{trait}"] = model_name
        kwargs[f"phenotype_params{trait}"] = instance.to_params_dict()
        kwargs[f"beta{trait}"] = getattr(args, f"beta{trait}")
        kwargs[f"beta_sex{trait}"] = getattr(args, f"beta_sex{trait}")

    pedigree = pd.read_parquet(args.pedigree)
    phenotype = run_phenotype(pedigree, **kwargs)
    save_parquet(phenotype, args.output)

threshold¶

simace.phenotyping.threshold ¶

Liability threshold phenotype model for two correlated traits.

Converts liability to binary affected status using a probit threshold derived from prevalence: threshold = ndtri(1 - K). Standardization is controlled by the global standardize flag with three modes:

"global" (default) — z-score across the whole cohort once, then compare to ndtri(1 - K). Preserves prevalence at K only when the cohort liability is already centered with unit variance.
"per_generation" — z-score within each generation independently before comparison; preserves K per generation regardless of how liability variance drifts across generations.
"none" — compare raw liability to the N(0,1)-scale threshold; realised prevalence drifts with the liability variance.

For sex-specific prevalence dicts ({"female": K_f, "male": K_m}), liability is standardized once across both sexes (per the selected mode); sex-shifted liability means therefore translate into sex-specific realised prevalences within each generation.

No time-to-event or censoring -- purely binary outcome.

apply_threshold ¶

apply_threshold(liability, generation, prevalence, standardize='global')

Apply liability threshold model per generation.

The threshold is ndtri(1 - K) where K is the prevalence. Liability is standardized once according to standardize ("none", "global", or "per_generation"); the per-generation comparison then uses the already-standardized values. Bools accepted for back-compat (True → "global", False → "none").

PARAMETER	DESCRIPTION
`liability`	array of liability values TYPE: `ndarray`
`generation`	array of generation labels (same length as liability) TYPE: `ndarray`
`prevalence`	fraction affected per generation — either a single float applied to all generations, or a dict mapping generation number to prevalence (e.g. {0: 0.05, 1: 0.08, 2: 0.10}) TYPE: `float \| dict[int, float]`
`standardize`	standardization mode for liability before thresholding TYPE: `StandardizeMode \| bool` DEFAULT: `'global'`

RETURNS	DESCRIPTION
`affected`	boolean array (True = affected) TYPE: `ndarray`

RAISES	DESCRIPTION
`ValueError`	if any prevalence value is not in (0, 1), or if a dict is provided but is missing entries for observed generations

Source code in simace/phenotyping/threshold.py

def apply_threshold(
    liability: np.ndarray,
    generation: np.ndarray,
    prevalence: float | dict[int, float],
    standardize: StandardizeMode | bool = "global",
) -> np.ndarray:
    """Apply liability threshold model per generation.

    The threshold is ``ndtri(1 - K)`` where *K* is the prevalence.  Liability
    is standardized once according to *standardize* (``"none"``, ``"global"``,
    or ``"per_generation"``); the per-generation comparison then uses the
    already-standardized values.  Bools accepted for back-compat
    (``True`` → ``"global"``, ``False`` → ``"none"``).

    Args:
        liability: array of liability values
        generation: array of generation labels (same length as liability)
        prevalence: fraction affected per generation — either a single float
            applied to all generations, or a dict mapping generation number
            to prevalence (e.g. {0: 0.05, 1: 0.08, 2: 0.10})
        standardize: standardization mode for liability before thresholding

    Returns:
        affected: boolean array (True = affected)

    Raises:
        ValueError: if any prevalence value is not in (0, 1), or if a dict
            is provided but is missing entries for observed generations
    """
    mode = coerce_standardize_mode(standardize)
    L_all = standardize_liability(liability, mode, generation)
    return _apply_thresholds_to_standardized(L_all, generation, sex=None, prev_spec=prevalence)

run_threshold ¶

run_threshold(pedigree, *, phenotype_params1, phenotype_params2, G_pheno, standardize='global')

Orchestrate threshold phenotype from pedigree and per-trait params.

Prevalence is extracted from the per-trait phenotype_params{N} dicts (the canonical home for adult / cure_frailty model prevalence after PR3). Traits whose primary model doesn't carry one (frailty / first_passage) fall back to a documented default so the model-agnostic threshold path still has a prevalence to use.

PARAMETER	DESCRIPTION
`pedigree`	DataFrame with pedigree data. TYPE: `DataFrame`
`phenotype_params1`	trait-1 model-specific param dict. Its `"prevalence"` value (scalar, per-generation dict, or sex-specific dict; see :func:`apply_threshold`) is used as the threshold target. TYPE: `dict \| None`
`phenotype_params2`	trait-2 model-specific param dict (same shape options as `phenotype_params1`). TYPE: `dict \| None`
`G_pheno`	number of trailing generations to phenotype. TYPE: `int`
`standardize`	standardization mode for liability before thresholding; one of `"none"`, `"global"`, `"per_generation"`. Bools accepted for back-compat (`True` → `"global"`, `False` → `"none"`). TYPE: `StandardizeMode \| bool` DEFAULT: `'global'`

RETURNS	DESCRIPTION
`DataFrame`	phenotype DataFrame

Source code in simace/phenotyping/threshold.py

@stage(reads=PEDIGREE, writes=PHENOTYPE)
def run_threshold(
    pedigree: pd.DataFrame,
    *,
    phenotype_params1: dict | None,
    phenotype_params2: dict | None,
    G_pheno: int,
    standardize: StandardizeMode | bool = "global",
) -> pd.DataFrame:
    """Orchestrate threshold phenotype from pedigree and per-trait params.

    Prevalence is extracted from the per-trait ``phenotype_params{N}`` dicts
    (the canonical home for adult / cure_frailty model prevalence after PR3).
    Traits whose primary model doesn't carry one (frailty / first_passage)
    fall back to a documented default so the model-agnostic threshold path
    still has a prevalence to use.

    Args:
        pedigree: DataFrame with pedigree data.
        phenotype_params1: trait-1 model-specific param dict.  Its
            ``"prevalence"`` value (scalar, per-generation dict, or
            sex-specific dict; see :func:`apply_threshold`) is used as the
            threshold target.
        phenotype_params2: trait-2 model-specific param dict (same shape
            options as ``phenotype_params1``).
        G_pheno: number of trailing generations to phenotype.
        standardize: standardization mode for liability before thresholding;
            one of ``"none"``, ``"global"``, ``"per_generation"``.  Bools
            accepted for back-compat (``True`` → ``"global"``,
            ``False`` → ``"none"``).

    Returns:
        phenotype DataFrame
    """
    pp1 = dict(phenotype_params1 or {})
    pp2 = dict(phenotype_params2 or {})
    prevalence1 = pp1.get("prevalence", _DEFAULT_THRESHOLD_PREVALENCE[0])
    prevalence2 = pp2.get("prevalence", _DEFAULT_THRESHOLD_PREVALENCE[1])

    if isinstance(prevalence1, dict) or isinstance(prevalence2, dict):
        logger.info("Applying threshold model: prevalence1=%s, prevalence2=%s", prevalence1, prevalence2)
    else:
        logger.info("Applying threshold model: prevalence1=%.3f, prevalence2=%.3f", prevalence1, prevalence2)
    t0 = time.perf_counter()
    # Filter to last G_pheno generations
    max_gen = pedigree["generation"].max()
    min_pheno_gen = max_gen - G_pheno + 1
    assert min_pheno_gen >= 0, f"G_pheno ({G_pheno}) > available generations ({max_gen + 1})"
    pedigree = pedigree[pedigree["generation"] >= min_pheno_gen].reset_index(drop=True)

    generation = pedigree["generation"].values
    sex = pedigree["sex"].values

    affected1 = _apply_threshold_sex_aware(
        pedigree["liability1"].values,
        generation,
        sex,
        prevalence=prevalence1,
        standardize=standardize,
    )
    affected2 = _apply_threshold_sex_aware(
        pedigree["liability2"].values,
        generation,
        sex,
        prevalence=prevalence2,
        standardize=standardize,
    )

    nan_t = np.full(len(pedigree), np.nan, dtype=np.float64)
    phenotype = pedigree.assign(affected1=affected1, affected2=affected2, t1=nan_t, t2=nan_t)

    elapsed = time.perf_counter() - t0
    logger.info("Threshold model complete in %.1fs: %d individuals", elapsed, len(phenotype))

    return phenotype

cli ¶

cli()

Command-line interface for threshold phenotype simulation.

Source code in simace/phenotyping/threshold.py

def cli() -> None:
    """Command-line interface for threshold phenotype simulation."""
    from simace.core.cli_base import add_logging_args, init_logging

    parser = argparse.ArgumentParser(description="Apply liability threshold model")
    add_logging_args(parser)
    parser.add_argument("--pedigree", required=True, help="Input pedigree parquet")
    parser.add_argument("--output", required=True, help="Output phenotype parquet")
    parser.add_argument("--G-pheno", type=int, default=3, help="Number of generations to assign phenotypes")
    parser.add_argument("--prevalence1", type=float, default=0.1, help="Disease prevalence for trait 1")
    parser.add_argument("--prevalence2", type=float, default=0.1, help="Disease prevalence for trait 2")
    parser.add_argument(
        "--prevalence1-by-gen",
        type=str,
        default=None,
        help='Per-gen prevalence for trait 1 as JSON, e.g. \'{"0":0.05,"1":0.10}\'',
    )
    parser.add_argument(
        "--prevalence2-by-gen",
        type=str,
        default=None,
        help='Per-gen prevalence for trait 2 as JSON, e.g. \'{"0":0.05,"1":0.10}\'',
    )
    parser.add_argument(
        "--standardize",
        choices=list(STANDARDIZE_CHOICES),
        default="global",
        help="Liability standardization mode (default: global)",
    )
    args = parser.parse_args()

    init_logging(args)

    pedigree = pd.read_parquet(args.pedigree)
    phenotype = run_threshold(
        pedigree,
        G_pheno=args.G_pheno,
        phenotype_params1={"prevalence": _parse_prevalence_arg(args.prevalence1, args.prevalence1_by_gen)},
        phenotype_params2={"prevalence": _parse_prevalence_arg(args.prevalence2, args.prevalence2_by_gen)},
        standardize=args.standardize,
    )
    save_parquet(phenotype, args.output)

hazards¶

simace.phenotyping.hazards ¶

Baseline hazard registry for parametric survival models.

Each baseline hazard supplies a vectorized inverter that converts -log(U) draws (Exp(1)) and a liability array into event times under the proportional-hazards frailty model:

L            = additive genetic + shared + unique liability
z            = exp(scaled_beta * (L - mean))
S(t | z)     = exp(-H0(t) * z)
t            = H0^{-1}(-log(U) / z)

compute_event_times is the public dispatch over the registry. The numba kernels and per-distribution wrappers stay module-private; consumers should go through compute_event_times (or, for tooling that needs the dispatch table itself, BASELINE_HAZARDS).

Supported distributions and required params keys: "weibull" : {"scale": s, "rho": rho} "exponential" : {"rate": lam} | {"scale": s} "gompertz" : {"rate": b, "gamma": g} "lognormal" : {"mu": mu, "sigma": sigma} "loglogistic" : {"scale": alpha, "shape": k} "gamma" : {"shape": k, "scale": theta}

validate_hazard_params ¶

validate_hazard_params(distribution, hazard_params, model_name)

Validate distribution name and required hazard_params keys.

Exponential accepts either rate or scale as the canonical key; others must contain every key listed in BASELINE_PARAMS[distribution].

Source code in simace/phenotyping/hazards.py

def validate_hazard_params(
    distribution: str,
    hazard_params: dict[str, float],
    model_name: str,
) -> None:
    """Validate distribution name and required ``hazard_params`` keys.

    Exponential accepts either ``rate`` or ``scale`` as the canonical key;
    others must contain every key listed in ``BASELINE_PARAMS[distribution]``.
    """
    if distribution not in BASELINE_HAZARDS:
        raise ValueError(f"unknown {model_name} distribution {distribution!r}; valid: {sorted(BASELINE_HAZARDS)}")
    required = set(BASELINE_PARAMS[distribution])
    if distribution == "exponential" and "scale" in hazard_params:
        required = (required - {"rate"}) | {"scale"}
    missing = required - set(hazard_params)
    if missing:
        raise ValueError(
            f"{model_name} distribution {distribution!r} missing required hazard params: {sorted(missing)}"
        )

add_hazard_cli_args ¶

add_hazard_cli_args(parser, trait, *, name)

Register --{name}-distribution{trait} + one flag per HAZARD_FLAG_ROOTS.

name is the kebab-form model name (e.g. "frailty", "cure-frailty"). Returns the argument group so callers can attach model-specific flags (e.g. cure_frailty's --cure-frailty-prevalence{trait}).

Source code in simace/phenotyping/hazards.py

def add_hazard_cli_args(
    parser: argparse.ArgumentParser,
    trait: int,
    *,
    name: str,
) -> argparse._ArgumentGroup:
    """Register ``--{name}-distribution{trait}`` + one flag per HAZARD_FLAG_ROOTS.

    ``name`` is the kebab-form model name (e.g. ``"frailty"``, ``"cure-frailty"``).
    Returns the argument group so callers can attach model-specific flags
    (e.g. cure_frailty's ``--cure-frailty-prevalence{trait}``).
    """
    attr_name = name.replace("-", "_")
    group = parser.add_argument_group(f"Trait {trait} — {attr_name}")
    group.add_argument(
        f"--{name}-distribution{trait}",
        default=None,
        choices=sorted(BASELINE_HAZARDS),
        help=f"Baseline hazard for trait {trait} when phenotype-model{trait}={attr_name}",
    )
    for flag_root in HAZARD_FLAG_ROOTS:
        group.add_argument(f"--{name}-{flag_root}{trait}", type=float, default=None)
    return group

parse_hazard_cli ¶

parse_hazard_cli(args, trait, *, name)

Pull hazard-distribution + per-param values from argparse Namespace.

name is the kebab-form model name; the snake form is derived for attribute lookup. Returns (distribution, hazard_params). Raises if the distribution flag is missing or any required per-distribution param flag is unset.

Source code in simace/phenotyping/hazards.py

def parse_hazard_cli(
    args: argparse.Namespace,
    trait: int,
    *,
    name: str,
) -> tuple[str, dict[str, float]]:
    """Pull hazard-distribution + per-param values from argparse ``Namespace``.

    ``name`` is the kebab-form model name; the snake form is derived for attribute lookup.
    Returns ``(distribution, hazard_params)``.  Raises if the distribution flag is missing
    or any required per-distribution param flag is unset.
    """
    attr_name = name.replace("-", "_")
    distribution = getattr(args, f"{attr_name}_distribution{trait}")
    if distribution is None:
        raise ValueError(f"--{name}-distribution{trait} is required when --phenotype-model{trait}={attr_name}")
    hazard_params: dict[str, float] = {}
    for key in BASELINE_PARAMS[distribution]:
        val = getattr(args, f"{attr_name}_{key}{trait}", None)
        if val is None:
            raise ValueError(f"--{name}-{key}{trait} is required for --{name}-distribution{trait}={distribution}")
        hazard_params[key] = val
    return distribution, hazard_params

hazard_cli_flag_attrs ¶

hazard_cli_flag_attrs(trait, *, name)

Return the argparse attrs registered by add_hazard_cli_args.

Source code in simace/phenotyping/hazards.py

def hazard_cli_flag_attrs(trait: int, *, name: str) -> set[str]:
    """Return the argparse attrs registered by ``add_hazard_cli_args``."""
    attr_name = name.replace("-", "_")
    attrs = {f"{attr_name}_distribution{trait}"}
    attrs.update(f"{attr_name}_{root}{trait}" for root in HAZARD_FLAG_ROOTS)
    return attrs

add_standardize_hazard_cli_arg ¶

add_standardize_hazard_cli_arg(parser_or_group, trait, *, name)

Register --{name}-standardize-hazard{trait} flag.

The flag is namespaced per model so foreign-flag detection (:func:check_no_foreign_flags) cleanly distinguishes overrides intended for one model from those intended for another.

Source code in simace/phenotyping/hazards.py

def add_standardize_hazard_cli_arg(
    parser_or_group: argparse.ArgumentParser | argparse._ArgumentGroup,
    trait: int,
    *,
    name: str,
) -> None:
    """Register ``--{name}-standardize-hazard{trait}`` flag.

    The flag is namespaced per model so foreign-flag detection
    (:func:`check_no_foreign_flags`) cleanly distinguishes overrides
    intended for one model from those intended for another.
    """
    parser_or_group.add_argument(
        f"--{name}-standardize-hazard{trait}",
        default=None,
        choices=list(STANDARDIZE_CHOICES),
        help=(
            f"Per-trait override of the hazard-step standardization mode for trait {trait} "
            f"when phenotype-model{trait}={name.replace('-', '_')} (default: inherit from --standardize)"
        ),
    )

standardize_hazard_cli_attr ¶

standardize_hazard_cli_attr(trait, *, name)

Return the argparse attribute name for the standardize_hazard CLI flag.

Source code in simace/phenotyping/hazards.py

def standardize_hazard_cli_attr(trait: int, *, name: str) -> str:
    """Return the argparse attribute name for the standardize_hazard CLI flag."""
    attr_name = name.replace("-", "_")
    return f"{attr_name}_standardize_hazard{trait}"

coerce_standardize_mode ¶

coerce_standardize_mode(value)

Resolve a user-supplied standardize value to one of the canonical modes.

Accepts the legacy bool form (True → "global", False → "none") or one of the three string modes. Raises ValueError otherwise.

Source code in simace/phenotyping/hazards.py

def coerce_standardize_mode(value: object) -> StandardizeMode:
    """Resolve a user-supplied standardize value to one of the canonical modes.

    Accepts the legacy bool form (``True`` → ``"global"``, ``False`` → ``"none"``)
    or one of the three string modes. Raises ``ValueError`` otherwise.
    """
    if isinstance(value, bool):
        return "global" if value else "none"
    if isinstance(value, str) and value in _VALID_STD_MODES:
        return value  # type: ignore[return-value]
    raise ValueError(f"standardize must be one of {sorted(_VALID_STD_MODES)} or bool; got {value!r}")

resolve_hazard_mode ¶

resolve_hazard_mode(standardize, standardize_hazard)

Pick the mode used for hazard-step beta/L scaling.

standardize_hazard is the per-trait override stored inside phenotype_params{N}. None means "inherit from the global standardize flag". Bools accepted on either argument for the same legacy reason as coerce_standardize_mode.

Source code in simace/phenotyping/hazards.py

def resolve_hazard_mode(
    standardize: StandardizeMode | bool,
    standardize_hazard: StandardizeMode | bool | None,
) -> StandardizeMode:
    """Pick the mode used for hazard-step beta/L scaling.

    ``standardize_hazard`` is the per-trait override stored inside
    ``phenotype_params{N}``. ``None`` means "inherit from the global
    ``standardize`` flag". Bools accepted on either argument for the same
    legacy reason as ``coerce_standardize_mode``.
    """
    if standardize_hazard is None:
        return coerce_standardize_mode(standardize)
    return coerce_standardize_mode(standardize_hazard)

standardize_liability ¶

standardize_liability(liability, mode, generation=None)

Return liability transformed per mode.

Modes

"none" — returned unchanged.
"global" — (L - L.mean()) / L.std() across all rows.
"per_generation" — same z-score applied within each unique generation value; generation must be supplied.

A degenerate group (singleton or all-equal liability) gets L - mean rather than NaN; the caller's downstream comparison against an N(0,1) threshold then reduces to the centered raw value.

Source code in simace/phenotyping/hazards.py

def standardize_liability(
    liability: np.ndarray,
    mode: StandardizeMode | bool,
    generation: np.ndarray | None = None,
) -> np.ndarray:
    """Return liability transformed per ``mode``.

    Modes:
      * ``"none"`` — returned unchanged.
      * ``"global"`` — ``(L - L.mean()) / L.std()`` across all rows.
      * ``"per_generation"`` — same z-score applied within each unique
        ``generation`` value; ``generation`` must be supplied.

    A degenerate group (singleton or all-equal liability) gets
    ``L - mean`` rather than NaN; the caller's downstream comparison
    against an N(0,1) threshold then reduces to the centered raw value.
    """
    mode = coerce_standardize_mode(mode)
    if mode == "none":
        return liability
    if mode == "global":
        std = float(liability.std())
        mean = float(liability.mean())
        if std < 1e-12:
            return liability - mean
        return (liability - mean) / std
    if generation is None:
        raise ValueError("standardize_liability: generation is required for mode='per_generation'")
    out = np.asarray(liability, dtype=np.float64).copy()
    for g in np.unique(generation):
        mask = generation == g
        sub = liability[mask]
        m = float(sub.mean())
        s = float(sub.std())
        if s < 1e-12:
            out[mask] = sub - m
        else:
            out[mask] = (sub - m) / s
    return out

standardize_beta ¶

standardize_beta(liability, beta, mode, generation=None)

Return per-individual (mean, scaled_beta) arrays of length n.

Modes

"none" — mean[i] = 0, scaled_beta[i] = beta everywhere.
"global" — mean[i] = L.mean(), scaled_beta[i] = beta/L.std() broadcast to all rows (scaled_beta = 0 when std is degenerate).
"per_generation" — per individual filled from their generation's mean and beta / std_g; generation must be supplied.

Returning arrays (not scalars) lets callers slice mean[mask][0] / scaled_beta[mask][0] inside an iter_generation_groups loop and pass the per-group scalars to compute_event_times without branching.

Source code in simace/phenotyping/hazards.py

def standardize_beta(
    liability: np.ndarray,
    beta: float,
    mode: StandardizeMode | bool,
    generation: np.ndarray | None = None,
) -> tuple[np.ndarray, np.ndarray]:
    """Return per-individual ``(mean, scaled_beta)`` arrays of length ``n``.

    Modes:
      * ``"none"`` — ``mean[i] = 0``, ``scaled_beta[i] = beta`` everywhere.
      * ``"global"`` — ``mean[i] = L.mean()``, ``scaled_beta[i] = beta/L.std()``
        broadcast to all rows (``scaled_beta = 0`` when std is degenerate).
      * ``"per_generation"`` — per individual filled from their generation's
        mean and ``beta / std_g``; ``generation`` must be supplied.

    Returning arrays (not scalars) lets callers slice ``mean[mask][0]`` /
    ``scaled_beta[mask][0]`` inside an ``iter_generation_groups`` loop and
    pass the per-group scalars to ``compute_event_times`` without branching.
    """
    mode = coerce_standardize_mode(mode)
    n = len(liability)
    if mode == "none":
        return np.zeros(n, dtype=np.float64), np.full(n, beta, dtype=np.float64)
    if mode == "global":
        std = float(liability.std())
        mean = float(liability.mean())
        scaled = 0.0 if std < 1e-12 else beta / std
        return np.full(n, mean, dtype=np.float64), np.full(n, scaled, dtype=np.float64)
    if generation is None:
        raise ValueError("standardize_beta: generation is required for mode='per_generation'")
    mean_arr = np.zeros(n, dtype=np.float64)
    beta_arr = np.zeros(n, dtype=np.float64)
    for g in np.unique(generation):
        mask = generation == g
        sub = liability[mask]
        m = float(sub.mean())
        s = float(sub.std())
        mean_arr[mask] = m
        beta_arr[mask] = 0.0 if s < 1e-12 else beta / s
    return mean_arr, beta_arr

iter_generation_groups ¶

iter_generation_groups(mode, generation)

Yield boolean masks for the per-group loop pattern in model simulate().

Under "none" or "global" yields a single full-coverage mask, so the caller's loop runs once over all rows. Under "per_generation" yields one mask per unique value of generation.

Source code in simace/phenotyping/hazards.py

def iter_generation_groups(
    mode: StandardizeMode | bool,
    generation: np.ndarray,
) -> Iterator[np.ndarray]:
    """Yield boolean masks for the per-group loop pattern in model ``simulate()``.

    Under ``"none"`` or ``"global"`` yields a single full-coverage mask, so
    the caller's loop runs once over all rows. Under ``"per_generation"``
    yields one mask per unique value of ``generation``.
    """
    mode = coerce_standardize_mode(mode)
    if mode != "per_generation":
        yield np.ones(len(generation), dtype=bool)
        return
    for g in np.unique(generation):
        yield generation == g

compute_event_times ¶

compute_event_times(neg_log_u, liability, mean, scaled_beta, distribution, params)

Convert -log(U) draws to event times under the named baseline hazard.

PARAMETER	DESCRIPTION
`neg_log_u`	-log(U) draws, U ~ Uniform(0, 1], shape (n,). TYPE: `ndarray`
`liability`	quantitative liability, shape (n,). TYPE: `ndarray`
`mean`	liability mean (used when standardize=True; pass 0.0 otherwise). TYPE: `float`
`scaled_beta`	liability coefficient on log-hazard (already divided by std if standardize=True). TYPE: `float`
`distribution`	baseline hazard name; one of `BASELINE_HAZARDS` keys. TYPE: `str`
`params`	distribution-specific parameter dict; see `BASELINE_PARAMS`. TYPE: `dict[str, float]`

RETURNS	DESCRIPTION
`ndarray`	Event-time array, shape (n,), clamped to `[1e-10, 1e6]`.

RAISES	DESCRIPTION
`ValueError`	unknown `distribution` name.
`KeyError`	missing required parameter for the selected distribution.

Source code in simace/phenotyping/hazards.py

def compute_event_times(
    neg_log_u: np.ndarray,
    liability: np.ndarray,
    mean: float,
    scaled_beta: float,
    distribution: str,
    params: dict[str, float],
) -> np.ndarray:
    """Convert -log(U) draws to event times under the named baseline hazard.

    Args:
        neg_log_u:    -log(U) draws, U ~ Uniform(0, 1], shape (n,).
        liability:    quantitative liability, shape (n,).
        mean:         liability mean (used when standardize=True; pass 0.0 otherwise).
        scaled_beta:  liability coefficient on log-hazard (already divided by std
                      if standardize=True).
        distribution: baseline hazard name; one of ``BASELINE_HAZARDS`` keys.
        params:       distribution-specific parameter dict; see ``BASELINE_PARAMS``.

    Returns:
        Event-time array, shape (n,), clamped to ``[1e-10, 1e6]``.

    Raises:
        ValueError: unknown ``distribution`` name.
        KeyError:   missing required parameter for the selected distribution.
    """
    if distribution not in BASELINE_HAZARDS:
        raise ValueError(f"Unknown baseline hazard {distribution!r}; valid: {sorted(BASELINE_HAZARDS)}")
    return BASELINE_HAZARDS[distribution](neg_log_u, liability, mean, scaled_beta, params)

models¶

simace.phenotyping.models ¶

Phenotype model registry.

To add a new phenotype model:

Write a new module under simace/phenotyping/models/ defining a class that subclasses PhenotypeModel (see _base.py).
Implement the abstract methods (from_config, add_cli_args, from_cli, cli_flag_attrs, to_params_dict, simulate).
Import the class here and add it to the MODELS dict below.

That's the entire surface — there is no auto-discovery and no decorator. The hand-authored dict is the single source of truth for which model names the dispatcher accepts.

PhenotypeModel ¶

Bases: ABC

Abstract base class for phenotype model families.

Subclasses are frozen dataclasses; their fields are the model's typed parameters. Validation happens in __post_init__.

from_config `abstractmethod` `classmethod` ¶

from_config(params, trait_num)

Build an instance from the simulation parameter dict for trait trait_num.

Source code in simace/phenotyping/models/_base.py

@classmethod
@abstractmethod
def from_config(cls, params: dict[str, Any], trait_num: int) -> Self:
    """Build an instance from the simulation parameter dict for trait ``trait_num``."""

add_cli_args `abstractmethod` `classmethod` ¶

add_cli_args(parser, trait)

Register this model's CLI flags on parser for the given trait.

Source code in simace/phenotyping/models/_base.py

@classmethod
@abstractmethod
def add_cli_args(cls, parser: argparse.ArgumentParser, trait: int) -> None:
    """Register this model's CLI flags on ``parser`` for the given trait."""

from_cli `abstractmethod` `classmethod` ¶

from_cli(args, trait)

Build an instance from parsed CLI args for the given trait.

Must raise ValueError (with trait context) if flags belonging to another family are populated alongside this model's selection, or if a required flag for this model is missing.

Source code in simace/phenotyping/models/_base.py

@classmethod
@abstractmethod
def from_cli(cls, args: argparse.Namespace, trait: int) -> Self:
    """Build an instance from parsed CLI args for the given trait.

    Must raise ``ValueError`` (with trait context) if flags belonging
    to another family are populated alongside this model's selection,
    or if a required flag for this model is missing.
    """

cli_flag_attrs `abstractmethod` `classmethod` ¶

cli_flag_attrs(trait)

Return the argparse attribute names this model owns for the given trait.

Used by sibling models' from_cli to detect foreign-flag conflicts.

Source code in simace/phenotyping/models/_base.py

@classmethod
@abstractmethod
def cli_flag_attrs(cls, trait: int) -> set[str]:
    """Return the argparse attribute names this model owns for the given trait.

    Used by sibling models' ``from_cli`` to detect foreign-flag conflicts.
    """

to_params_dict `abstractmethod` ¶

to_params_dict()

Round-trip back to the per-trait phenotype_params{N} dict shape.

Source code in simace/phenotyping/models/_base.py

@abstractmethod
def to_params_dict(self) -> dict[str, Any]:
    """Round-trip back to the per-trait ``phenotype_params{N}`` dict shape."""

simulate `abstractmethod` ¶

simulate(liability, *, seed, standardize, sex, generation)

Simulate event times for one trait.

standardize is the global liability-standardization mode ("none" | "global" | "per_generation"); legacy True/False bools are accepted via the same shim used at config-load.

Concrete subclasses route standardize differently:

Threshold-style models (threshold, adult.ltm) consume it directly to standardize liability before the threshold comparison.
Hazard-bearing models (frailty, cure_frailty, first_passage, adult.cox) carry an additional per-trait field standardize_hazard (defaulting to None → inherit) and resolve it via :func:simace.phenotyping.hazards.resolve_hazard_mode for the hazard step. cure_frailty is the only model that honors both knobs (threshold step + hazard step).

Source code in simace/phenotyping/models/_base.py

@abstractmethod
def simulate(
    self,
    liability: np.ndarray,
    *,
    seed: int,
    standardize: StandardizeMode | bool,
    sex: np.ndarray | None,
    generation: np.ndarray,
) -> np.ndarray:
    """Simulate event times for one trait.

    ``standardize`` is the global liability-standardization mode
    (``"none" | "global" | "per_generation"``); legacy ``True``/``False``
    bools are accepted via the same shim used at config-load.

    Concrete subclasses route ``standardize`` differently:

    * Threshold-style models (``threshold``, ``adult.ltm``) consume it
      directly to standardize liability before the threshold comparison.
    * Hazard-bearing models (``frailty``, ``cure_frailty``,
      ``first_passage``, ``adult.cox``) carry an additional per-trait
      field ``standardize_hazard`` (defaulting to ``None`` → inherit) and
      resolve it via :func:`simace.phenotyping.hazards.resolve_hazard_mode`
      for the hazard step. ``cure_frailty`` is the only model that
      honors both knobs (threshold step + hazard step).
    """

AdultModel `dataclass` ¶

AdultModel(method, prevalence, cip_x0=50.0, cip_k=0.2, beta=1.0, beta_sex=0.0, standardize_hazard=None)

Bases: PhenotypeModel

ADuLT phenotype model.

PARAMETER	DESCRIPTION
`method`	`"ltm"` or `"cox"`. TYPE: `str`
`prevalence`	case fraction K (scalar, per-generation, or sex-specific). TYPE: `Any`
`cip_x0`	logistic CIP midpoint age (default 50.0). TYPE: `float` DEFAULT: `50.0`
`cip_k`	logistic CIP growth rate (default 0.2). TYPE: `float` DEFAULT: `0.2`
`beta`	liability scaling factor on the probit (ltm) or log-hazard (cox) scale (1.0 = no scaling). TYPE: `float` DEFAULT: `1.0`
`beta_sex`	sex coefficient (0.0 = no effect). TYPE: `float` DEFAULT: `0.0`
`standardize_hazard`	per-trait override for the hazard-step L standardization mode under `method="cox"`. `None` inherits from the global `standardize` flag. Must be `None` when `method="ltm"` (threshold-style models honor the global flag directly). TYPE: `StandardizeMode \| None` DEFAULT: `None`

CureFrailtyModel `dataclass` ¶

CureFrailtyModel(distribution, prevalence, hazard_params=dict(), beta=1.0, beta_sex=0.0, standardize_hazard=None)

Bases: PhenotypeModel

Mixture cure-frailty model.

PARAMETER	DESCRIPTION
`distribution`	baseline hazard for cases (see `simace.phenotyping.hazards`). TYPE: `str`
`hazard_params`	dict of distribution-specific parameter values. TYPE: `dict[str, float]` DEFAULT: `dict()`
`prevalence`	case fraction; scalar, per-generation dict, or sex-specific dict. TYPE: `Any`
`beta`	coefficient on liability in the log-hazard among cases. TYPE: `float` DEFAULT: `1.0`
`beta_sex`	coefficient on sex in the log-hazard (0 = no effect). TYPE: `float` DEFAULT: `0.0`
`standardize_hazard`	per-trait override for the case-onset hazard step. `None` inherits from the global `standardize` flag passed to `simulate`. This is the only model that honors both knobs: `standardize` controls the threshold step (case status), while `standardize_hazard` controls the hazard step (case onset). Setting them to different modes (e.g. `standardize='per_generation'` + `standardize_hazard='global'`) preserves per-gen prevalence while keeping a constant hazard slope across generations. TYPE: `StandardizeMode \| None` DEFAULT: `None`

FirstPassageModel `dataclass` ¶

FirstPassageModel(drift, shape, beta=1.0, beta_sex=0.0, standardize_hazard=None)

Bases: PhenotypeModel

First-passage-time phenotype model.

PARAMETER	DESCRIPTION
`drift`	drift rate μ; must be non-zero. Negative drift → toward boundary (everyone onsets). Positive drift → emergent cure fraction. TYPE: `float`
`shape`	y0² (initial distance from boundary, squared). TYPE: `float`
`beta`	coefficient on liability for log(y0); positive β → worse. TYPE: `float` DEFAULT: `1.0`
`beta_sex`	coefficient on sex for log(y0) (0 = no effect). TYPE: `float` DEFAULT: `0.0`

FrailtyModel `dataclass` ¶

FrailtyModel(distribution, hazard_params=dict(), beta=1.0, beta_sex=0.0, standardize_hazard=None)

Bases: PhenotypeModel

Frailty phenotype model.

PARAMETER	DESCRIPTION
`distribution`	baseline hazard name (see `simace.phenotyping.hazards`). TYPE: `str`
`hazard_params`	dict of distribution-specific parameter values. TYPE: `dict[str, float]` DEFAULT: `dict()`
`beta`	coefficient on liability in the log-hazard. TYPE: `float` DEFAULT: `1.0`
`beta_sex`	coefficient on sex in the log-hazard (0 = no effect). TYPE: `float` DEFAULT: `0.0`
`standardize_hazard`	per-trait override for the hazard-step standardization mode (`"none" \| "global" \| "per_generation"`). `None` inherits from the global `standardize` flag passed to `simulate`. TYPE: `StandardizeMode \| None` DEFAULT: `None`

models.frailty¶

simace.phenotyping.models.frailty ¶

Proportional-hazards frailty phenotype model.

Per-individual onset time is drawn from a parametric baseline hazard modulated by an exp(beta * L) frailty multiplier. Liability translates into earlier onset for higher beta * L.

FrailtyModel `dataclass` ¶

FrailtyModel(distribution, hazard_params=dict(), beta=1.0, beta_sex=0.0, standardize_hazard=None)

Bases: PhenotypeModel

Frailty phenotype model.

PARAMETER	DESCRIPTION
`distribution`	baseline hazard name (see `simace.phenotyping.hazards`). TYPE: `str`
`hazard_params`	dict of distribution-specific parameter values. TYPE: `dict[str, float]` DEFAULT: `dict()`
`beta`	coefficient on liability in the log-hazard. TYPE: `float` DEFAULT: `1.0`
`beta_sex`	coefficient on sex in the log-hazard (0 = no effect). TYPE: `float` DEFAULT: `0.0`
`standardize_hazard`	per-trait override for the hazard-step standardization mode (`"none" \| "global" \| "per_generation"`). `None` inherits from the global `standardize` flag passed to `simulate`. TYPE: `StandardizeMode \| None` DEFAULT: `None`

models.cure_frailty¶

simace.phenotyping.models.cure_frailty ¶

Mixture cure-frailty phenotype model.

Liability above a threshold sets case status (WHO has the disease); a proportional hazards frailty model with a configurable baseline hazard determines onset time among cases (WHEN). Controls are censored at 1e6.

CureFrailtyModel `dataclass` ¶

CureFrailtyModel(distribution, prevalence, hazard_params=dict(), beta=1.0, beta_sex=0.0, standardize_hazard=None)

Bases: PhenotypeModel

Mixture cure-frailty model.

PARAMETER	DESCRIPTION
`distribution`	baseline hazard for cases (see `simace.phenotyping.hazards`). TYPE: `str`
`hazard_params`	dict of distribution-specific parameter values. TYPE: `dict[str, float]` DEFAULT: `dict()`
`prevalence`	case fraction; scalar, per-generation dict, or sex-specific dict. TYPE: `Any`
`beta`	coefficient on liability in the log-hazard among cases. TYPE: `float` DEFAULT: `1.0`
`beta_sex`	coefficient on sex in the log-hazard (0 = no effect). TYPE: `float` DEFAULT: `0.0`
`standardize_hazard`	per-trait override for the case-onset hazard step. `None` inherits from the global `standardize` flag passed to `simulate`. This is the only model that honors both knobs: `standardize` controls the threshold step (case status), while `standardize_hazard` controls the hazard step (case onset). Setting them to different modes (e.g. `standardize='per_generation'` + `standardize_hazard='global'`) preserves per-gen prevalence while keeping a constant hazard slope across generations. TYPE: `StandardizeMode \| None` DEFAULT: `None`

models.adult¶

simace.phenotyping.models.adult ¶

ADuLT phenotype model.

Two sub-methods, selected by method:

ltm — liability threshold model: case status from raw liability vs. threshold; case onset age via logistic-CIP transform of the case CIR computed from a probit scaling of liability + sex.
cox — Weibull(shape=2) proportional hazards: case status by rank/(N+1) capped at K; case onset age via logistic CIP inverse.

Both share the CIP shape parameters cip_x0 and cip_k. Reference: Pedersen et al., Nat Commun 2023 (ADuLT).

Standardization routing is asymmetric across the two sub-methods:

method="ltm" is a threshold-on-liability model. It honors the global standardize flag for the liability z-score and rejects standardize_hazard in its params.
method="cox" is a hazard-on-liability model. It honors standardize_hazard (defaulting to the global standardize when unset) for the L scaling that feeds np.exp(beta * L).

Toggling method therefore silently changes which knob controls L scaling for that trait. The validation in __post_init__ surfaces this in the error message when the field is set on an LTM-method instance.

AdultModel `dataclass` ¶

AdultModel(method, prevalence, cip_x0=50.0, cip_k=0.2, beta=1.0, beta_sex=0.0, standardize_hazard=None)

Bases: PhenotypeModel

ADuLT phenotype model.

PARAMETER	DESCRIPTION
`method`	`"ltm"` or `"cox"`. TYPE: `str`
`prevalence`	case fraction K (scalar, per-generation, or sex-specific). TYPE: `Any`
`cip_x0`	logistic CIP midpoint age (default 50.0). TYPE: `float` DEFAULT: `50.0`
`cip_k`	logistic CIP growth rate (default 0.2). TYPE: `float` DEFAULT: `0.2`
`beta`	liability scaling factor on the probit (ltm) or log-hazard (cox) scale (1.0 = no scaling). TYPE: `float` DEFAULT: `1.0`
`beta_sex`	sex coefficient (0.0 = no effect). TYPE: `float` DEFAULT: `0.0`
`standardize_hazard`	per-trait override for the hazard-step L standardization mode under `method="cox"`. `None` inherits from the global `standardize` flag. Must be `None` when `method="ltm"` (threshold-style models honor the global flag directly). TYPE: `StandardizeMode \| None` DEFAULT: `None`

models.first_passage¶

simace.phenotyping.models.first_passage ¶

First-passage time phenotype model.

A latent health process Y(t) = y0 + driftt + W(t) starts at y0 = sqrt(shape) * exp(-scaled_beta(L - mean) - beta_sexsex). Disease onset is the first time Y(t) ≤ 0. When drift < 0 the process hits the boundary in finite time (everyone eventually onsets); when drift > 0 an emergent cure fraction P(never hit) = 1 - exp(-2y0*drift) arises.

References

Lee & Whitmore 2006 (threshold regression). Aalen & Gjessing 2001 (first-passage models in survival).

FirstPassageModel `dataclass` ¶

FirstPassageModel(drift, shape, beta=1.0, beta_sex=0.0, standardize_hazard=None)

Bases: PhenotypeModel

First-passage-time phenotype model.

PARAMETER	DESCRIPTION
`drift`	drift rate μ; must be non-zero. Negative drift → toward boundary (everyone onsets). Positive drift → emergent cure fraction. TYPE: `float`
`shape`	y0² (initial distance from boundary, squared). TYPE: `float`
`beta`	coefficient on liability for log(y0); positive β → worse. TYPE: `float` DEFAULT: `1.0`
`beta_sex`	coefficient on sex for log(y0) (0 = no effect). TYPE: `float` DEFAULT: `0.0`

bimodal_phenotype (prototype)¶

simace.phenotyping._prototypes.bimodal_phenotype ¶

Bimodal phenotype models (prototype).

Three models that produce bimodal age-of-onset distributions

mixture_cip: Two logistic CIP components, shared β
mixture_cure_frailty: Two cure-frailty hazards, shared β
two_threshold: Two liability thresholds with separate CIP per stratum

Standardization: each function accepts standardize as one of "none", "global", or a legacy bool (True → "global", False → "none"). "per_generation" is not supported here — these prototypes do not take a generation array.

phenotype_mixture_cip ¶

phenotype_mixture_cip(liability, prevalence, beta=1.0, pi=0.5, cip_x0_1=10.0, cip_k_1=0.3, cip_x0_2=40.0, cip_k_2=0.2, seed=42, standardize='global', sex=None, beta_sex=0.0)

Mixture of two logistic CIP curves with shared liability threshold.

Each case is randomly assigned to component 1 (probability π) or component 2 (probability 1−π). Both components share the same prevalence K and liability scaling (β, β_sex), but have different CIP midpoints (x₀) and steepness (k), producing bimodal onset.

PARAMETER	DESCRIPTION
`liability`	quantitative liability, shape (n,) TYPE: `ndarray`
`prevalence`	population prevalence K; scalar or per-individual array TYPE: `float \| ndarray`
`beta`	probit scaling factor for liability TYPE: `float` DEFAULT: `1.0`
`pi`	mixing weight for component 1 (0–1) TYPE: `float` DEFAULT: `0.5`
`cip_x0_1`	logistic CIP midpoint for component 1 (early) TYPE: `float` DEFAULT: `10.0`
`cip_k_1`	logistic CIP growth rate for component 1 TYPE: `float` DEFAULT: `0.3`
`cip_x0_2`	logistic CIP midpoint for component 2 (late) TYPE: `float` DEFAULT: `40.0`
`cip_k_2`	logistic CIP growth rate for component 2 TYPE: `float` DEFAULT: `0.2`
`seed`	random seed for component assignment TYPE: `int` DEFAULT: `42`
`standardize`	standardization mode for liability (`"none"`, `"global"`, or legacy bool). `"per_generation"` is not accepted. TYPE: `StandardizeMode \| bool` DEFAULT: `'global'`
`sex`	binary sex covariate (0/1), shape (n,), or None TYPE: `ndarray \| None` DEFAULT: `None`
`beta_sex`	probit-scale coefficient for sex TYPE: `float` DEFAULT: `0.0`

RETURNS	DESCRIPTION
`ndarray`	Array of simulated event times, shape (n,)

Source code in simace/phenotyping/_prototypes/bimodal_phenotype.py

def phenotype_mixture_cip(
    liability: np.ndarray,
    prevalence: float | np.ndarray,
    beta: float = 1.0,
    pi: float = 0.5,
    cip_x0_1: float = 10.0,
    cip_k_1: float = 0.3,
    cip_x0_2: float = 40.0,
    cip_k_2: float = 0.2,
    seed: int = 42,
    standardize: StandardizeMode | bool = "global",
    sex: np.ndarray | None = None,
    beta_sex: float = 0.0,
) -> np.ndarray:
    """Mixture of two logistic CIP curves with shared liability threshold.

    Each case is randomly assigned to component 1 (probability π) or
    component 2 (probability 1−π).  Both components share the same
    prevalence K and liability scaling (β, β_sex), but have different
    CIP midpoints (x₀) and steepness (k), producing bimodal onset.

    Args:
        liability:   quantitative liability, shape (n,)
        prevalence:  population prevalence K; scalar or per-individual array
        beta:        probit scaling factor for liability
        pi:          mixing weight for component 1 (0–1)
        cip_x0_1:    logistic CIP midpoint for component 1 (early)
        cip_k_1:     logistic CIP growth rate for component 1
        cip_x0_2:    logistic CIP midpoint for component 2 (late)
        cip_k_2:     logistic CIP growth rate for component 2
        seed:        random seed for component assignment
        standardize: standardization mode for liability (``"none"``, ``"global"``,
                     or legacy bool).  ``"per_generation"`` is not accepted.
        sex:         binary sex covariate (0/1), shape (n,), or None
        beta_sex:    probit-scale coefficient for sex

    Returns:
        Array of simulated event times, shape (n,)
    """
    mode = _resolve_mode(standardize)
    L = standardize_liability(liability, mode)

    _ndtri_vec = np.vectorize(_ndtri_approx)
    threshold = _ndtri_vec(1.0 - prevalence)
    is_case = threshold < L

    t = np.full(len(L), 1e6)
    n_cases = is_case.sum()
    if n_cases > 0:
        prev_case = prevalence[is_case] if isinstance(prevalence, np.ndarray) else prevalence
        L_eff = beta * L[is_case]
        if beta_sex != 0.0 and sex is not None:
            L_eff = L_eff + beta_sex * sex[is_case]

        cir = 0.5 * erfc(L_eff / np.sqrt(2.0))
        valid = cir < prev_case
        cir_clipped = np.clip(cir, 1e-10, np.asarray(prev_case) - 1e-10)

        # Component assignment
        rng = np.random.default_rng(seed)
        comp1 = rng.random(n_cases) < pi

        # Invert through each component's logistic CIP
        onset_1 = cip_x0_1 + (1.0 / cip_k_1) * np.log(cir_clipped / (prev_case - cir_clipped))
        onset_2 = cip_x0_2 + (1.0 / cip_k_2) * np.log(cir_clipped / (prev_case - cir_clipped))

        onset = np.where(comp1, onset_1, onset_2)
        onset[~valid] = 1e6
        t[is_case] = onset

    np.clip(t, 0.01, 1e6, out=t)
    return t

phenotype_mixture_cure_frailty ¶

phenotype_mixture_cure_frailty(liability, prevalence, beta, pi, baseline, hazard_params_1, hazard_params_2, seed, standardize='global', sex=None, beta_sex=0.0)

Mixture of two cure-frailty components with shared threshold.

Cases (L > threshold) are randomly assigned to component 1 (prob π) or component 2 (prob 1−π). Each component has its own baseline hazard parameters, producing different onset timing distributions.

PARAMETER	DESCRIPTION
`liability`	quantitative liability, shape (n,) TYPE: `ndarray`
`prevalence`	population prevalence K TYPE: `float \| ndarray`
`beta`	effect of liability on log-hazard among cases TYPE: `float`
`pi`	mixing weight for component 1 TYPE: `float`
`baseline`	baseline hazard model name (shared) TYPE: `str`
`hazard_params_1`	hazard parameters for component 1 TYPE: `dict[str, float]`
`hazard_params_2`	hazard parameters for component 2 TYPE: `dict[str, float]`
`seed`	random seed TYPE: `int`
`standardize`	standardization mode for liability (`"none"`, `"global"`, or legacy bool). `"per_generation"` is not accepted. TYPE: `StandardizeMode \| bool` DEFAULT: `'global'`
`sex`	binary sex covariate (0/1), shape (n,), or None TYPE: `ndarray \| None` DEFAULT: `None`
`beta_sex`	effect of sex on log-hazard TYPE: `float` DEFAULT: `0.0`

RETURNS	DESCRIPTION
`ndarray`	Array of simulated event times, shape (n,)

Source code in simace/phenotyping/_prototypes/bimodal_phenotype.py

def phenotype_mixture_cure_frailty(
    liability: np.ndarray,
    prevalence: float | np.ndarray,
    beta: float,
    pi: float,
    baseline: str,
    hazard_params_1: dict[str, float],
    hazard_params_2: dict[str, float],
    seed: int,
    standardize: StandardizeMode | bool = "global",
    sex: np.ndarray | None = None,
    beta_sex: float = 0.0,
) -> np.ndarray:
    """Mixture of two cure-frailty components with shared threshold.

    Cases (L > threshold) are randomly assigned to component 1 (prob π)
    or component 2 (prob 1−π).  Each component has its own baseline hazard
    parameters, producing different onset timing distributions.

    Args:
        liability:       quantitative liability, shape (n,)
        prevalence:      population prevalence K
        beta:            effect of liability on log-hazard among cases
        pi:              mixing weight for component 1
        baseline:        baseline hazard model name (shared)
        hazard_params_1: hazard parameters for component 1
        hazard_params_2: hazard parameters for component 2
        seed:            random seed
        standardize:     standardization mode for liability (``"none"``,
                         ``"global"``, or legacy bool).  ``"per_generation"``
                         is not accepted.
        sex:             binary sex covariate (0/1), shape (n,), or None
        beta_sex:        effect of sex on log-hazard

    Returns:
        Array of simulated event times, shape (n,)
    """
    mode = _resolve_mode(standardize)
    L = standardize_liability(liability, mode)
    mean_arr, beta_arr = standardize_beta(liability, beta, mode)
    mean = float(mean_arr[0])
    scaled_beta = float(beta_arr[0])

    _ndtri_vec = np.vectorize(_ndtri_approx)
    threshold = _ndtri_vec(1.0 - prevalence)
    is_case = threshold < L

    n = len(liability)
    t = np.full(n, 1e6)
    n_cases = is_case.sum()
    if n_cases > 0:
        rng = np.random.default_rng(seed)
        neg_log_u = rng.exponential(size=n_cases)
        comp1 = rng.random(n_cases) < pi

        if beta_sex != 0.0 and sex is not None:
            neg_log_u = neg_log_u / np.exp(beta_sex * sex[is_case])

        # Pass RAW liability to the hazard kernel (mirrors the cure_frailty
        # fix in models/cure_frailty.py): the kernel computes
        # ``z = exp(scaled_beta * (L - mean))`` and expects the raw L; passing
        # the standardized L_z would double-shift.
        L_cases = liability[is_case]
        t1 = BASELINE_HAZARDS[baseline](neg_log_u, L_cases, mean, scaled_beta, hazard_params_1)
        t2 = BASELINE_HAZARDS[baseline](neg_log_u, L_cases, mean, scaled_beta, hazard_params_2)

        t[is_case] = np.where(comp1, t1, t2)

    return t

phenotype_two_threshold ¶

phenotype_two_threshold(liability, prevalence_early, prevalence_late, beta=1.0, cip_x0_1=10.0, cip_k_1=0.3, cip_x0_2=40.0, cip_k_2=0.2, seed=42, standardize='global', sex=None, beta_sex=0.0)

Two-threshold liability model with separate CIP per stratum.

High-liability individuals (L > τ₁) are early-onset cases mapped through CIP₁. Moderate-liability individuals (τ₂ < L ≤ τ₁) are late-onset cases mapped through CIP₂. This preserves a single liability dimension while producing bimodal age-of-onset.

PARAMETER	DESCRIPTION
`liability`	quantitative liability, shape (n,) TYPE: `ndarray`
`prevalence_early`	K_early (fraction with L > τ₁) TYPE: `float \| ndarray`
`prevalence_late`	K_late (fraction with τ₂ < L ≤ τ₁); total prevalence = K_early + K_late TYPE: `float \| ndarray`
`beta`	probit scaling factor for liability TYPE: `float` DEFAULT: `1.0`
`cip_x0_1`	logistic CIP midpoint for early component TYPE: `float` DEFAULT: `10.0`
`cip_k_1`	logistic CIP growth rate for early component TYPE: `float` DEFAULT: `0.3`
`cip_x0_2`	logistic CIP midpoint for late component TYPE: `float` DEFAULT: `40.0`
`cip_k_2`	logistic CIP growth rate for late component TYPE: `float` DEFAULT: `0.2`
`seed`	unused (deterministic model) TYPE: `int` DEFAULT: `42`
`standardize`	standardization mode for liability (`"none"`, `"global"`, or legacy bool). `"per_generation"` is not accepted. TYPE: `StandardizeMode \| bool` DEFAULT: `'global'`
`sex`	binary sex covariate (0/1), shape (n,), or None TYPE: `ndarray \| None` DEFAULT: `None`
`beta_sex`	probit-scale coefficient for sex TYPE: `float` DEFAULT: `0.0`

RETURNS	DESCRIPTION
`ndarray`	Array of simulated event times, shape (n,)

Source code in simace/phenotyping/_prototypes/bimodal_phenotype.py

def phenotype_two_threshold(
    liability: np.ndarray,
    prevalence_early: float | np.ndarray,
    prevalence_late: float | np.ndarray,
    beta: float = 1.0,
    cip_x0_1: float = 10.0,
    cip_k_1: float = 0.3,
    cip_x0_2: float = 40.0,
    cip_k_2: float = 0.2,
    seed: int = 42,
    standardize: StandardizeMode | bool = "global",
    sex: np.ndarray | None = None,
    beta_sex: float = 0.0,
) -> np.ndarray:
    """Two-threshold liability model with separate CIP per stratum.

    High-liability individuals (L > τ₁) are early-onset cases mapped
    through CIP₁.  Moderate-liability individuals (τ₂ < L ≤ τ₁) are
    late-onset cases mapped through CIP₂.  This preserves a single
    liability dimension while producing bimodal age-of-onset.

    Args:
        liability:        quantitative liability, shape (n,)
        prevalence_early: K_early (fraction with L > τ₁)
        prevalence_late:  K_late (fraction with τ₂ < L ≤ τ₁);
                          total prevalence = K_early + K_late
        beta:             probit scaling factor for liability
        cip_x0_1:        logistic CIP midpoint for early component
        cip_k_1:         logistic CIP growth rate for early component
        cip_x0_2:        logistic CIP midpoint for late component
        cip_k_2:         logistic CIP growth rate for late component
        seed:             unused (deterministic model)
        standardize:      standardization mode for liability (``"none"``,
                          ``"global"``, or legacy bool).  ``"per_generation"``
                          is not accepted.
        sex:              binary sex covariate (0/1), shape (n,), or None
        beta_sex:         probit-scale coefficient for sex

    Returns:
        Array of simulated event times, shape (n,)
    """
    mode = _resolve_mode(standardize)
    L = standardize_liability(liability, mode)

    _ndtri_vec = np.vectorize(_ndtri_approx)
    K_total = np.asarray(prevalence_early) + np.asarray(prevalence_late)
    tau1 = _ndtri_vec(1.0 - prevalence_early)  # early threshold (higher)
    tau2 = _ndtri_vec(1.0 - K_total)  # late threshold (lower)

    early = tau1 < L
    late = (tau2 < L) & (~early)

    t = np.full(len(L), 1e6)

    for mask, prev_param, x0, k in [
        (early, prevalence_early, cip_x0_1, cip_k_1),
        (late, prevalence_late, cip_x0_2, cip_k_2),
    ]:
        n_grp = mask.sum()
        if n_grp == 0:
            continue
        prev_grp = prev_param[mask] if isinstance(prev_param, np.ndarray) else prev_param
        L_eff = beta * L[mask]
        if beta_sex != 0.0 and sex is not None:
            L_eff = L_eff + beta_sex * sex[mask]

        cir = 0.5 * erfc(L_eff / np.sqrt(2.0))
        valid = cir < prev_grp
        cir = np.clip(cir, 1e-10, np.asarray(prev_grp) - 1e-10)
        onset = x0 + (1.0 / k) * np.log(cir / (prev_grp - cir))
        onset[~valid] = 1e6
        t[mask] = onset

    np.clip(t, 0.01, 1e6, out=t)
    return t

simace.phenotyping¶

phenotype¶

simace.phenotyping.phenotype ¶

run_phenotype ¶

cli ¶

threshold¶

simace.phenotyping.threshold ¶

apply_threshold ¶

run_threshold ¶

cli ¶

hazards¶

simace.phenotyping.hazards ¶

validate_hazard_params ¶

add_hazard_cli_args ¶

parse_hazard_cli ¶

hazard_cli_flag_attrs ¶

add_standardize_hazard_cli_arg ¶

standardize_hazard_cli_attr ¶

coerce_standardize_mode ¶

resolve_hazard_mode ¶

standardize_liability ¶

standardize_beta ¶

iter_generation_groups ¶

compute_event_times ¶

models¶

simace.phenotyping.models ¶

PhenotypeModel ¶

from_config abstractmethod classmethod ¶

add_cli_args abstractmethod classmethod ¶

from_cli abstractmethod classmethod ¶

cli_flag_attrs abstractmethod classmethod ¶

to_params_dict abstractmethod ¶

simulate abstractmethod ¶

AdultModel dataclass ¶

CureFrailtyModel dataclass ¶

FirstPassageModel dataclass ¶

FrailtyModel dataclass ¶

models.frailty¶

simace.phenotyping.models.frailty ¶

FrailtyModel dataclass ¶

models.cure_frailty¶

simace.phenotyping.models.cure_frailty ¶

CureFrailtyModel dataclass ¶

models.adult¶

simace.phenotyping.models.adult ¶

AdultModel dataclass ¶

models.first_passage¶

simace.phenotyping.models.first_passage ¶

FirstPassageModel dataclass ¶

bimodal_phenotype (prototype)¶

simace.phenotyping._prototypes.bimodal_phenotype ¶

phenotype_mixture_cip ¶

phenotype_mixture_cure_frailty ¶

phenotype_two_threshold ¶

from_config `abstractmethod` `classmethod` ¶

add_cli_args `abstractmethod` `classmethod` ¶

from_cli `abstractmethod` `classmethod` ¶

cli_flag_attrs `abstractmethod` `classmethod` ¶

to_params_dict `abstractmethod` ¶

simulate `abstractmethod` ¶

AdultModel `dataclass` ¶

CureFrailtyModel `dataclass` ¶

FirstPassageModel `dataclass` ¶

FrailtyModel `dataclass` ¶

FrailtyModel `dataclass` ¶

CureFrailtyModel `dataclass` ¶

AdultModel `dataclass` ¶

FirstPassageModel `dataclass` ¶