Skip to content

simace.phenotyping

Phenotype models: frailty, threshold, cure, first-passage, and bimodal variants.

phenotype

simace.phenotyping.phenotype

Phenotype simulation for two correlated traits.

Each trait is simulated by one of the four model families registered in simace.phenotyping.models:

  • frailty — proportional hazards frailty (baseline hazards live in simace.phenotyping.hazards).
  • cure_frailty — mixture cure model: threshold determines case status, frailty determines onset time among cases.
  • adult — ADuLT age-dependent liability threshold.
  • first_passage — inverse-Gaussian first-passage time.

Adding a new model is a single new file under simace/phenotyping/models/ plus one entry in simace/phenotyping/models/__init__.py's MODELS dict.

run_phenotype

run_phenotype(pedigree, *, G_pheno, seed, standardize, phenotype_model1, phenotype_model2, beta1, beta_sex1, phenotype_params1, beta2, beta_sex2, phenotype_params2)

Simulate phenotype event times for two correlated traits.

Per-trait prevalence (for adult / cure_frailty) lives inside phenotype_params{N}; frailty / first_passage do not carry one.

PARAMETER DESCRIPTION
pedigree

DataFrame with liability1, liability2, generation, sex, plus the genealogy columns preserved on output.

TYPE: DataFrame

G_pheno

number of trailing generations to phenotype.

TYPE: int

seed

RNG seed (trait 2 uses seed + 100).

TYPE: int

standardize

global liability-standardization mode applied to threshold-style consumers (threshold, adult.ltm, cure_frailty's threshold step). One of "none" | "global" | "per_generation"; legacy bools accepted (True"global", False"none"). Hazard-bearing models (frailty, cure_frailty, first_passage, adult.cox) inherit this for their hazard step unless the per-trait phenotype_params{N}["standardize_hazard"] overrides it.

TYPE: StandardizeMode | bool

phenotype_model1

trait-1 model family (frailty, cure_frailty, adult, first_passage).

TYPE: str

phenotype_model2

trait-2 model family (same options).

TYPE: str

beta1

trait-1 liability → log-hazard slope.

TYPE: float

beta_sex1

trait-1 sex → log-hazard slope.

TYPE: float

phenotype_params1

trait-1 model-specific parameter dict (e.g. {"distribution": "weibull", "scale": ..., "rho": ...}; optionally "standardize_hazard": "...").

TYPE: dict

beta2

trait-2 liability → log-hazard slope.

TYPE: float

beta_sex2

trait-2 sex → log-hazard slope.

TYPE: float

phenotype_params2

trait-2 model-specific parameter dict.

TYPE: dict

RETURNS DESCRIPTION
DataFrame

Phenotype DataFrame with columns t1, t2 (raw event times)

DataFrame

plus the preserved pedigree columns.

Source code in simace/phenotyping/phenotype.py
@stage(reads=PEDIGREE, writes=PHENOTYPE)
def run_phenotype(
    pedigree: pd.DataFrame,
    *,
    G_pheno: int,
    seed: int,
    standardize: StandardizeMode | bool,
    phenotype_model1: str,
    phenotype_model2: str,
    beta1: float,
    beta_sex1: float,
    phenotype_params1: dict,
    beta2: float,
    beta_sex2: float,
    phenotype_params2: dict,
) -> pd.DataFrame:
    """Simulate phenotype event times for two correlated traits.

    Per-trait prevalence (for adult / cure_frailty) lives inside
    ``phenotype_params{N}``; frailty / first_passage do not carry one.

    Args:
        pedigree: DataFrame with ``liability1``, ``liability2``, ``generation``,
            ``sex``, plus the genealogy columns preserved on output.
        G_pheno: number of trailing generations to phenotype.
        seed: RNG seed (trait 2 uses ``seed + 100``).
        standardize: global liability-standardization mode applied to
            threshold-style consumers (``threshold``, ``adult.ltm``,
            ``cure_frailty``'s threshold step). One of ``"none" | "global" |
            "per_generation"``; legacy bools accepted (``True`` → ``"global"``,
            ``False`` → ``"none"``). Hazard-bearing models (``frailty``,
            ``cure_frailty``, ``first_passage``, ``adult.cox``) inherit this
            for their hazard step unless the per-trait
            ``phenotype_params{N}["standardize_hazard"]`` overrides it.
        phenotype_model1: trait-1 model family (``frailty``, ``cure_frailty``,
            ``adult``, ``first_passage``).
        phenotype_model2: trait-2 model family (same options).
        beta1: trait-1 liability → log-hazard slope.
        beta_sex1: trait-1 sex → log-hazard slope.
        phenotype_params1: trait-1 model-specific parameter dict (e.g.
            ``{"distribution": "weibull", "scale": ..., "rho": ...}``;
            optionally ``"standardize_hazard": "..."``).
        beta2: trait-2 liability → log-hazard slope.
        beta_sex2: trait-2 sex → log-hazard slope.
        phenotype_params2: trait-2 model-specific parameter dict.

    Returns:
        Phenotype DataFrame with columns ``t1``, ``t2`` (raw event times)
        plus the preserved pedigree columns.
    """
    logger.info("Running phenotype simulation for %d individuals", len(pedigree))
    t0 = time.perf_counter()

    max_gen = pedigree["generation"].max()
    min_gen = max_gen - G_pheno + 1
    if min_gen < 0:
        raise ValueError(f"G_pheno ({G_pheno}) exceeds available generations ({max_gen + 1})")
    pedigree = pedigree[pedigree["generation"] >= min_gen].reset_index(drop=True)

    sex = pedigree["sex"].to_numpy() if "sex" in pedigree.columns else None
    generation = pedigree["generation"].to_numpy()
    t1 = _simulate_one_trait(
        pedigree,
        trait_num=1,
        model_name=phenotype_model1,
        phenotype_params=phenotype_params1,
        beta=beta1,
        beta_sex=beta_sex1,
        seed=seed,
        standardize=standardize,
        sex=sex,
        generation=generation,
    )
    t2 = _simulate_one_trait(
        pedigree,
        trait_num=2,
        model_name=phenotype_model2,
        phenotype_params=phenotype_params2,
        beta=beta2,
        beta_sex=beta_sex2,
        seed=seed + 100,
        standardize=standardize,
        sex=sex,
        generation=generation,
    )

    phenotype = pedigree.assign(t1=t1, t2=t2)

    logger.info(
        "Phenotype simulation complete in %.1fs: %d individuals",
        time.perf_counter() - t0,
        len(phenotype),
    )
    return phenotype

cli

cli()

Command-line entry point for phenotype simulation.

Eager-registration scheme: every model's flag set is registered up front so --help shows them all in clearly-labeled per-family argument groups. Each model's from_cli rejects flags belonging to a different family when invoked alongside that model's selection.

Source code in simace/phenotyping/phenotype.py
def cli() -> None:
    """Command-line entry point for phenotype simulation.

    Eager-registration scheme: every model's flag set is registered up
    front so ``--help`` shows them all in clearly-labeled per-family
    argument groups. Each model's ``from_cli`` rejects flags belonging to
    a different family when invoked alongside that model's selection.
    """
    from simace.core.cli_base import add_logging_args, init_logging

    parser = argparse.ArgumentParser(
        description="Simulate phenotype event times for two correlated traits",
        formatter_class=argparse.ArgumentDefaultsHelpFormatter,
    )
    add_logging_args(parser)
    parser.add_argument("--pedigree", required=True)
    parser.add_argument("--output", required=True)
    parser.add_argument("--seed", type=int, default=42)
    parser.add_argument("--G-pheno", type=int, default=3)
    parser.add_argument(
        "--standardize",
        choices=list(STANDARDIZE_CHOICES),
        default="global",
        help="Liability standardization mode (default: global)",
    )

    for trait in (1, 2):
        shared = parser.add_argument_group(f"Trait {trait} — shared")
        shared.add_argument(
            f"--phenotype-model{trait}",
            choices=sorted(MODELS),
            required=True,
            help=f"Phenotype model family for trait {trait}",
        )
        shared.add_argument(f"--beta{trait}", type=float, default=1.0)
        shared.add_argument(f"--beta-sex{trait}", type=float, default=0.0)
        for model_cls in MODELS.values():
            model_cls.add_cli_args(parser, trait)

    args = parser.parse_args()
    init_logging(args)

    kwargs: dict[str, Any] = {
        "G_pheno": args.G_pheno,
        "seed": args.seed,
        "standardize": args.standardize,
    }
    for trait in (1, 2):
        model_name = getattr(args, f"phenotype_model{trait}")
        model_cls = MODELS[model_name]
        instance = model_cls.from_cli(args, trait)
        kwargs[f"phenotype_model{trait}"] = model_name
        kwargs[f"phenotype_params{trait}"] = instance.to_params_dict()
        kwargs[f"beta{trait}"] = getattr(args, f"beta{trait}")
        kwargs[f"beta_sex{trait}"] = getattr(args, f"beta_sex{trait}")

    pedigree = pd.read_parquet(args.pedigree)
    phenotype = run_phenotype(pedigree, **kwargs)
    save_parquet(phenotype, args.output)

threshold

simace.phenotyping.threshold

Liability threshold phenotype model for two correlated traits.

Converts liability to binary affected status using a probit threshold derived from prevalence: threshold = ndtri(1 - K). Standardization is controlled by the global standardize flag with three modes:

  • "global" (default) — z-score across the whole cohort once, then compare to ndtri(1 - K). Preserves prevalence at K only when the cohort liability is already centered with unit variance.
  • "per_generation" — z-score within each generation independently before comparison; preserves K per generation regardless of how liability variance drifts across generations.
  • "none" — compare raw liability to the N(0,1)-scale threshold; realised prevalence drifts with the liability variance.

For sex-specific prevalence dicts ({"female": K_f, "male": K_m}), liability is standardized once across both sexes (per the selected mode); sex-shifted liability means therefore translate into sex-specific realised prevalences within each generation.

No time-to-event or censoring -- purely binary outcome.

apply_threshold

apply_threshold(liability, generation, prevalence, standardize='global')

Apply liability threshold model per generation.

The threshold is ndtri(1 - K) where K is the prevalence. Liability is standardized once according to standardize ("none", "global", or "per_generation"); the per-generation comparison then uses the already-standardized values. Bools accepted for back-compat (True"global", False"none").

PARAMETER DESCRIPTION
liability

array of liability values

TYPE: ndarray

generation

array of generation labels (same length as liability)

TYPE: ndarray

prevalence

fraction affected per generation — either a single float applied to all generations, or a dict mapping generation number to prevalence (e.g. {0: 0.05, 1: 0.08, 2: 0.10})

TYPE: float | dict[int, float]

standardize

standardization mode for liability before thresholding

TYPE: StandardizeMode | bool DEFAULT: 'global'

RETURNS DESCRIPTION
affected

boolean array (True = affected)

TYPE: ndarray

RAISES DESCRIPTION
ValueError

if any prevalence value is not in (0, 1), or if a dict is provided but is missing entries for observed generations

Source code in simace/phenotyping/threshold.py
def apply_threshold(
    liability: np.ndarray,
    generation: np.ndarray,
    prevalence: float | dict[int, float],
    standardize: StandardizeMode | bool = "global",
) -> np.ndarray:
    """Apply liability threshold model per generation.

    The threshold is ``ndtri(1 - K)`` where *K* is the prevalence.  Liability
    is standardized once according to *standardize* (``"none"``, ``"global"``,
    or ``"per_generation"``); the per-generation comparison then uses the
    already-standardized values.  Bools accepted for back-compat
    (``True`` → ``"global"``, ``False`` → ``"none"``).

    Args:
        liability: array of liability values
        generation: array of generation labels (same length as liability)
        prevalence: fraction affected per generation — either a single float
            applied to all generations, or a dict mapping generation number
            to prevalence (e.g. {0: 0.05, 1: 0.08, 2: 0.10})
        standardize: standardization mode for liability before thresholding

    Returns:
        affected: boolean array (True = affected)

    Raises:
        ValueError: if any prevalence value is not in (0, 1), or if a dict
            is provided but is missing entries for observed generations
    """
    mode = coerce_standardize_mode(standardize)
    L_all = standardize_liability(liability, mode, generation)
    return _apply_thresholds_to_standardized(L_all, generation, sex=None, prev_spec=prevalence)

run_threshold

run_threshold(pedigree, *, phenotype_params1, phenotype_params2, G_pheno, standardize='global')

Orchestrate threshold phenotype from pedigree and per-trait params.

Prevalence is extracted from the per-trait phenotype_params{N} dicts (the canonical home for adult / cure_frailty model prevalence after PR3). Traits whose primary model doesn't carry one (frailty / first_passage) fall back to a documented default so the model-agnostic threshold path still has a prevalence to use.

PARAMETER DESCRIPTION
pedigree

DataFrame with pedigree data.

TYPE: DataFrame

phenotype_params1

trait-1 model-specific param dict. Its "prevalence" value (scalar, per-generation dict, or sex-specific dict; see :func:apply_threshold) is used as the threshold target.

TYPE: dict | None

phenotype_params2

trait-2 model-specific param dict (same shape options as phenotype_params1).

TYPE: dict | None

G_pheno

number of trailing generations to phenotype.

TYPE: int

standardize

standardization mode for liability before thresholding; one of "none", "global", "per_generation". Bools accepted for back-compat (True"global", False"none").

TYPE: StandardizeMode | bool DEFAULT: 'global'

RETURNS DESCRIPTION
DataFrame

phenotype DataFrame

Source code in simace/phenotyping/threshold.py
@stage(reads=PEDIGREE, writes=PHENOTYPE)
def run_threshold(
    pedigree: pd.DataFrame,
    *,
    phenotype_params1: dict | None,
    phenotype_params2: dict | None,
    G_pheno: int,
    standardize: StandardizeMode | bool = "global",
) -> pd.DataFrame:
    """Orchestrate threshold phenotype from pedigree and per-trait params.

    Prevalence is extracted from the per-trait ``phenotype_params{N}`` dicts
    (the canonical home for adult / cure_frailty model prevalence after PR3).
    Traits whose primary model doesn't carry one (frailty / first_passage)
    fall back to a documented default so the model-agnostic threshold path
    still has a prevalence to use.

    Args:
        pedigree: DataFrame with pedigree data.
        phenotype_params1: trait-1 model-specific param dict.  Its
            ``"prevalence"`` value (scalar, per-generation dict, or
            sex-specific dict; see :func:`apply_threshold`) is used as the
            threshold target.
        phenotype_params2: trait-2 model-specific param dict (same shape
            options as ``phenotype_params1``).
        G_pheno: number of trailing generations to phenotype.
        standardize: standardization mode for liability before thresholding;
            one of ``"none"``, ``"global"``, ``"per_generation"``.  Bools
            accepted for back-compat (``True`` → ``"global"``,
            ``False`` → ``"none"``).

    Returns:
        phenotype DataFrame
    """
    pp1 = dict(phenotype_params1 or {})
    pp2 = dict(phenotype_params2 or {})
    prevalence1 = pp1.get("prevalence", _DEFAULT_THRESHOLD_PREVALENCE[0])
    prevalence2 = pp2.get("prevalence", _DEFAULT_THRESHOLD_PREVALENCE[1])

    if isinstance(prevalence1, dict) or isinstance(prevalence2, dict):
        logger.info("Applying threshold model: prevalence1=%s, prevalence2=%s", prevalence1, prevalence2)
    else:
        logger.info("Applying threshold model: prevalence1=%.3f, prevalence2=%.3f", prevalence1, prevalence2)
    t0 = time.perf_counter()
    # Filter to last G_pheno generations
    max_gen = pedigree["generation"].max()
    min_pheno_gen = max_gen - G_pheno + 1
    assert min_pheno_gen >= 0, f"G_pheno ({G_pheno}) > available generations ({max_gen + 1})"
    pedigree = pedigree[pedigree["generation"] >= min_pheno_gen].reset_index(drop=True)

    generation = pedigree["generation"].values
    sex = pedigree["sex"].values

    affected1 = _apply_threshold_sex_aware(
        pedigree["liability1"].values,
        generation,
        sex,
        prevalence=prevalence1,
        standardize=standardize,
    )
    affected2 = _apply_threshold_sex_aware(
        pedigree["liability2"].values,
        generation,
        sex,
        prevalence=prevalence2,
        standardize=standardize,
    )

    nan_t = np.full(len(pedigree), np.nan, dtype=np.float64)
    phenotype = pedigree.assign(affected1=affected1, affected2=affected2, t1=nan_t, t2=nan_t)

    elapsed = time.perf_counter() - t0
    logger.info("Threshold model complete in %.1fs: %d individuals", elapsed, len(phenotype))

    return phenotype

cli

cli()

Command-line interface for threshold phenotype simulation.

Source code in simace/phenotyping/threshold.py
def cli() -> None:
    """Command-line interface for threshold phenotype simulation."""
    from simace.core.cli_base import add_logging_args, init_logging

    parser = argparse.ArgumentParser(description="Apply liability threshold model")
    add_logging_args(parser)
    parser.add_argument("--pedigree", required=True, help="Input pedigree parquet")
    parser.add_argument("--output", required=True, help="Output phenotype parquet")
    parser.add_argument("--G-pheno", type=int, default=3, help="Number of generations to assign phenotypes")
    parser.add_argument("--prevalence1", type=float, default=0.1, help="Disease prevalence for trait 1")
    parser.add_argument("--prevalence2", type=float, default=0.1, help="Disease prevalence for trait 2")
    parser.add_argument(
        "--prevalence1-by-gen",
        type=str,
        default=None,
        help='Per-gen prevalence for trait 1 as JSON, e.g. \'{"0":0.05,"1":0.10}\'',
    )
    parser.add_argument(
        "--prevalence2-by-gen",
        type=str,
        default=None,
        help='Per-gen prevalence for trait 2 as JSON, e.g. \'{"0":0.05,"1":0.10}\'',
    )
    parser.add_argument(
        "--standardize",
        choices=list(STANDARDIZE_CHOICES),
        default="global",
        help="Liability standardization mode (default: global)",
    )
    args = parser.parse_args()

    init_logging(args)

    pedigree = pd.read_parquet(args.pedigree)
    phenotype = run_threshold(
        pedigree,
        G_pheno=args.G_pheno,
        phenotype_params1={"prevalence": _parse_prevalence_arg(args.prevalence1, args.prevalence1_by_gen)},
        phenotype_params2={"prevalence": _parse_prevalence_arg(args.prevalence2, args.prevalence2_by_gen)},
        standardize=args.standardize,
    )
    save_parquet(phenotype, args.output)

hazards

simace.phenotyping.hazards

Baseline hazard registry for parametric survival models.

Each baseline hazard supplies a vectorized inverter that converts -log(U) draws (Exp(1)) and a liability array into event times under the proportional-hazards frailty model:

L            = additive genetic + shared + unique liability
z            = exp(scaled_beta * (L - mean))
S(t | z)     = exp(-H0(t) * z)
t            = H0^{-1}(-log(U) / z)

compute_event_times is the public dispatch over the registry. The numba kernels and per-distribution wrappers stay module-private; consumers should go through compute_event_times (or, for tooling that needs the dispatch table itself, BASELINE_HAZARDS).

Supported distributions and required params keys: "weibull" : {"scale": s, "rho": rho} "exponential" : {"rate": lam} | {"scale": s} "gompertz" : {"rate": b, "gamma": g} "lognormal" : {"mu": mu, "sigma": sigma} "loglogistic" : {"scale": alpha, "shape": k} "gamma" : {"shape": k, "scale": theta}

validate_hazard_params

validate_hazard_params(distribution, hazard_params, model_name)

Validate distribution name and required hazard_params keys.

Exponential accepts either rate or scale as the canonical key; others must contain every key listed in BASELINE_PARAMS[distribution].

Source code in simace/phenotyping/hazards.py
def validate_hazard_params(
    distribution: str,
    hazard_params: dict[str, float],
    model_name: str,
) -> None:
    """Validate distribution name and required ``hazard_params`` keys.

    Exponential accepts either ``rate`` or ``scale`` as the canonical key;
    others must contain every key listed in ``BASELINE_PARAMS[distribution]``.
    """
    if distribution not in BASELINE_HAZARDS:
        raise ValueError(f"unknown {model_name} distribution {distribution!r}; valid: {sorted(BASELINE_HAZARDS)}")
    required = set(BASELINE_PARAMS[distribution])
    if distribution == "exponential" and "scale" in hazard_params:
        required = (required - {"rate"}) | {"scale"}
    missing = required - set(hazard_params)
    if missing:
        raise ValueError(
            f"{model_name} distribution {distribution!r} missing required hazard params: {sorted(missing)}"
        )

add_hazard_cli_args

add_hazard_cli_args(parser, trait, *, name)

Register --{name}-distribution{trait} + one flag per HAZARD_FLAG_ROOTS.

name is the kebab-form model name (e.g. "frailty", "cure-frailty"). Returns the argument group so callers can attach model-specific flags (e.g. cure_frailty's --cure-frailty-prevalence{trait}).

Source code in simace/phenotyping/hazards.py
def add_hazard_cli_args(
    parser: argparse.ArgumentParser,
    trait: int,
    *,
    name: str,
) -> argparse._ArgumentGroup:
    """Register ``--{name}-distribution{trait}`` + one flag per HAZARD_FLAG_ROOTS.

    ``name`` is the kebab-form model name (e.g. ``"frailty"``, ``"cure-frailty"``).
    Returns the argument group so callers can attach model-specific flags
    (e.g. cure_frailty's ``--cure-frailty-prevalence{trait}``).
    """
    attr_name = name.replace("-", "_")
    group = parser.add_argument_group(f"Trait {trait}{attr_name}")
    group.add_argument(
        f"--{name}-distribution{trait}",
        default=None,
        choices=sorted(BASELINE_HAZARDS),
        help=f"Baseline hazard for trait {trait} when phenotype-model{trait}={attr_name}",
    )
    for flag_root in HAZARD_FLAG_ROOTS:
        group.add_argument(f"--{name}-{flag_root}{trait}", type=float, default=None)
    return group

parse_hazard_cli

parse_hazard_cli(args, trait, *, name)

Pull hazard-distribution + per-param values from argparse Namespace.

name is the kebab-form model name; the snake form is derived for attribute lookup. Returns (distribution, hazard_params). Raises if the distribution flag is missing or any required per-distribution param flag is unset.

Source code in simace/phenotyping/hazards.py
def parse_hazard_cli(
    args: argparse.Namespace,
    trait: int,
    *,
    name: str,
) -> tuple[str, dict[str, float]]:
    """Pull hazard-distribution + per-param values from argparse ``Namespace``.

    ``name`` is the kebab-form model name; the snake form is derived for attribute lookup.
    Returns ``(distribution, hazard_params)``.  Raises if the distribution flag is missing
    or any required per-distribution param flag is unset.
    """
    attr_name = name.replace("-", "_")
    distribution = getattr(args, f"{attr_name}_distribution{trait}")
    if distribution is None:
        raise ValueError(f"--{name}-distribution{trait} is required when --phenotype-model{trait}={attr_name}")
    hazard_params: dict[str, float] = {}
    for key in BASELINE_PARAMS[distribution]:
        val = getattr(args, f"{attr_name}_{key}{trait}", None)
        if val is None:
            raise ValueError(f"--{name}-{key}{trait} is required for --{name}-distribution{trait}={distribution}")
        hazard_params[key] = val
    return distribution, hazard_params

hazard_cli_flag_attrs

hazard_cli_flag_attrs(trait, *, name)

Return the argparse attrs registered by add_hazard_cli_args.

Source code in simace/phenotyping/hazards.py
def hazard_cli_flag_attrs(trait: int, *, name: str) -> set[str]:
    """Return the argparse attrs registered by ``add_hazard_cli_args``."""
    attr_name = name.replace("-", "_")
    attrs = {f"{attr_name}_distribution{trait}"}
    attrs.update(f"{attr_name}_{root}{trait}" for root in HAZARD_FLAG_ROOTS)
    return attrs

add_standardize_hazard_cli_arg

add_standardize_hazard_cli_arg(parser_or_group, trait, *, name)

Register --{name}-standardize-hazard{trait} flag.

The flag is namespaced per model so foreign-flag detection (:func:check_no_foreign_flags) cleanly distinguishes overrides intended for one model from those intended for another.

Source code in simace/phenotyping/hazards.py
def add_standardize_hazard_cli_arg(
    parser_or_group: argparse.ArgumentParser | argparse._ArgumentGroup,
    trait: int,
    *,
    name: str,
) -> None:
    """Register ``--{name}-standardize-hazard{trait}`` flag.

    The flag is namespaced per model so foreign-flag detection
    (:func:`check_no_foreign_flags`) cleanly distinguishes overrides
    intended for one model from those intended for another.
    """
    parser_or_group.add_argument(
        f"--{name}-standardize-hazard{trait}",
        default=None,
        choices=list(STANDARDIZE_CHOICES),
        help=(
            f"Per-trait override of the hazard-step standardization mode for trait {trait} "
            f"when phenotype-model{trait}={name.replace('-', '_')} (default: inherit from --standardize)"
        ),
    )

standardize_hazard_cli_attr

standardize_hazard_cli_attr(trait, *, name)

Return the argparse attribute name for the standardize_hazard CLI flag.

Source code in simace/phenotyping/hazards.py
def standardize_hazard_cli_attr(trait: int, *, name: str) -> str:
    """Return the argparse attribute name for the standardize_hazard CLI flag."""
    attr_name = name.replace("-", "_")
    return f"{attr_name}_standardize_hazard{trait}"

coerce_standardize_mode

coerce_standardize_mode(value)

Resolve a user-supplied standardize value to one of the canonical modes.

Accepts the legacy bool form (True"global", False"none") or one of the three string modes. Raises ValueError otherwise.

Source code in simace/phenotyping/hazards.py
def coerce_standardize_mode(value: object) -> StandardizeMode:
    """Resolve a user-supplied standardize value to one of the canonical modes.

    Accepts the legacy bool form (``True`` → ``"global"``, ``False`` → ``"none"``)
    or one of the three string modes. Raises ``ValueError`` otherwise.
    """
    if isinstance(value, bool):
        return "global" if value else "none"
    if isinstance(value, str) and value in _VALID_STD_MODES:
        return value  # type: ignore[return-value]
    raise ValueError(f"standardize must be one of {sorted(_VALID_STD_MODES)} or bool; got {value!r}")

resolve_hazard_mode

resolve_hazard_mode(standardize, standardize_hazard)

Pick the mode used for hazard-step beta/L scaling.

standardize_hazard is the per-trait override stored inside phenotype_params{N}. None means "inherit from the global standardize flag". Bools accepted on either argument for the same legacy reason as coerce_standardize_mode.

Source code in simace/phenotyping/hazards.py
def resolve_hazard_mode(
    standardize: StandardizeMode | bool,
    standardize_hazard: StandardizeMode | bool | None,
) -> StandardizeMode:
    """Pick the mode used for hazard-step beta/L scaling.

    ``standardize_hazard`` is the per-trait override stored inside
    ``phenotype_params{N}``. ``None`` means "inherit from the global
    ``standardize`` flag". Bools accepted on either argument for the same
    legacy reason as ``coerce_standardize_mode``.
    """
    if standardize_hazard is None:
        return coerce_standardize_mode(standardize)
    return coerce_standardize_mode(standardize_hazard)

standardize_liability

standardize_liability(liability, mode, generation=None)

Return liability transformed per mode.

Modes
  • "none" — returned unchanged.
  • "global"(L - L.mean()) / L.std() across all rows.
  • "per_generation" — same z-score applied within each unique generation value; generation must be supplied.

A degenerate group (singleton or all-equal liability) gets L - mean rather than NaN; the caller's downstream comparison against an N(0,1) threshold then reduces to the centered raw value.

Source code in simace/phenotyping/hazards.py
def standardize_liability(
    liability: np.ndarray,
    mode: StandardizeMode | bool,
    generation: np.ndarray | None = None,
) -> np.ndarray:
    """Return liability transformed per ``mode``.

    Modes:
      * ``"none"`` — returned unchanged.
      * ``"global"`` — ``(L - L.mean()) / L.std()`` across all rows.
      * ``"per_generation"`` — same z-score applied within each unique
        ``generation`` value; ``generation`` must be supplied.

    A degenerate group (singleton or all-equal liability) gets
    ``L - mean`` rather than NaN; the caller's downstream comparison
    against an N(0,1) threshold then reduces to the centered raw value.
    """
    mode = coerce_standardize_mode(mode)
    if mode == "none":
        return liability
    if mode == "global":
        std = float(liability.std())
        mean = float(liability.mean())
        if std < 1e-12:
            return liability - mean
        return (liability - mean) / std
    if generation is None:
        raise ValueError("standardize_liability: generation is required for mode='per_generation'")
    out = np.asarray(liability, dtype=np.float64).copy()
    for g in np.unique(generation):
        mask = generation == g
        sub = liability[mask]
        m = float(sub.mean())
        s = float(sub.std())
        if s < 1e-12:
            out[mask] = sub - m
        else:
            out[mask] = (sub - m) / s
    return out

standardize_beta

standardize_beta(liability, beta, mode, generation=None)

Return per-individual (mean, scaled_beta) arrays of length n.

Modes
  • "none"mean[i] = 0, scaled_beta[i] = beta everywhere.
  • "global"mean[i] = L.mean(), scaled_beta[i] = beta/L.std() broadcast to all rows (scaled_beta = 0 when std is degenerate).
  • "per_generation" — per individual filled from their generation's mean and beta / std_g; generation must be supplied.

Returning arrays (not scalars) lets callers slice mean[mask][0] / scaled_beta[mask][0] inside an iter_generation_groups loop and pass the per-group scalars to compute_event_times without branching.

Source code in simace/phenotyping/hazards.py
def standardize_beta(
    liability: np.ndarray,
    beta: float,
    mode: StandardizeMode | bool,
    generation: np.ndarray | None = None,
) -> tuple[np.ndarray, np.ndarray]:
    """Return per-individual ``(mean, scaled_beta)`` arrays of length ``n``.

    Modes:
      * ``"none"`` — ``mean[i] = 0``, ``scaled_beta[i] = beta`` everywhere.
      * ``"global"`` — ``mean[i] = L.mean()``, ``scaled_beta[i] = beta/L.std()``
        broadcast to all rows (``scaled_beta = 0`` when std is degenerate).
      * ``"per_generation"`` — per individual filled from their generation's
        mean and ``beta / std_g``; ``generation`` must be supplied.

    Returning arrays (not scalars) lets callers slice ``mean[mask][0]`` /
    ``scaled_beta[mask][0]`` inside an ``iter_generation_groups`` loop and
    pass the per-group scalars to ``compute_event_times`` without branching.
    """
    mode = coerce_standardize_mode(mode)
    n = len(liability)
    if mode == "none":
        return np.zeros(n, dtype=np.float64), np.full(n, beta, dtype=np.float64)
    if mode == "global":
        std = float(liability.std())
        mean = float(liability.mean())
        scaled = 0.0 if std < 1e-12 else beta / std
        return np.full(n, mean, dtype=np.float64), np.full(n, scaled, dtype=np.float64)
    if generation is None:
        raise ValueError("standardize_beta: generation is required for mode='per_generation'")
    mean_arr = np.zeros(n, dtype=np.float64)
    beta_arr = np.zeros(n, dtype=np.float64)
    for g in np.unique(generation):
        mask = generation == g
        sub = liability[mask]
        m = float(sub.mean())
        s = float(sub.std())
        mean_arr[mask] = m
        beta_arr[mask] = 0.0 if s < 1e-12 else beta / s
    return mean_arr, beta_arr

iter_generation_groups

iter_generation_groups(mode, generation)

Yield boolean masks for the per-group loop pattern in model simulate().

Under "none" or "global" yields a single full-coverage mask, so the caller's loop runs once over all rows. Under "per_generation" yields one mask per unique value of generation.

Source code in simace/phenotyping/hazards.py
def iter_generation_groups(
    mode: StandardizeMode | bool,
    generation: np.ndarray,
) -> Iterator[np.ndarray]:
    """Yield boolean masks for the per-group loop pattern in model ``simulate()``.

    Under ``"none"`` or ``"global"`` yields a single full-coverage mask, so
    the caller's loop runs once over all rows. Under ``"per_generation"``
    yields one mask per unique value of ``generation``.
    """
    mode = coerce_standardize_mode(mode)
    if mode != "per_generation":
        yield np.ones(len(generation), dtype=bool)
        return
    for g in np.unique(generation):
        yield generation == g

compute_event_times

compute_event_times(neg_log_u, liability, mean, scaled_beta, distribution, params)

Convert -log(U) draws to event times under the named baseline hazard.

PARAMETER DESCRIPTION
neg_log_u

-log(U) draws, U ~ Uniform(0, 1], shape (n,).

TYPE: ndarray

liability

quantitative liability, shape (n,).

TYPE: ndarray

mean

liability mean (used when standardize=True; pass 0.0 otherwise).

TYPE: float

scaled_beta

liability coefficient on log-hazard (already divided by std if standardize=True).

TYPE: float

distribution

baseline hazard name; one of BASELINE_HAZARDS keys.

TYPE: str

params

distribution-specific parameter dict; see BASELINE_PARAMS.

TYPE: dict[str, float]

RETURNS DESCRIPTION
ndarray

Event-time array, shape (n,), clamped to [1e-10, 1e6].

RAISES DESCRIPTION
ValueError

unknown distribution name.

KeyError

missing required parameter for the selected distribution.

Source code in simace/phenotyping/hazards.py
def compute_event_times(
    neg_log_u: np.ndarray,
    liability: np.ndarray,
    mean: float,
    scaled_beta: float,
    distribution: str,
    params: dict[str, float],
) -> np.ndarray:
    """Convert -log(U) draws to event times under the named baseline hazard.

    Args:
        neg_log_u:    -log(U) draws, U ~ Uniform(0, 1], shape (n,).
        liability:    quantitative liability, shape (n,).
        mean:         liability mean (used when standardize=True; pass 0.0 otherwise).
        scaled_beta:  liability coefficient on log-hazard (already divided by std
                      if standardize=True).
        distribution: baseline hazard name; one of ``BASELINE_HAZARDS`` keys.
        params:       distribution-specific parameter dict; see ``BASELINE_PARAMS``.

    Returns:
        Event-time array, shape (n,), clamped to ``[1e-10, 1e6]``.

    Raises:
        ValueError: unknown ``distribution`` name.
        KeyError:   missing required parameter for the selected distribution.
    """
    if distribution not in BASELINE_HAZARDS:
        raise ValueError(f"Unknown baseline hazard {distribution!r}; valid: {sorted(BASELINE_HAZARDS)}")
    return BASELINE_HAZARDS[distribution](neg_log_u, liability, mean, scaled_beta, params)

models

simace.phenotyping.models

Phenotype model registry.

To add a new phenotype model:

  1. Write a new module under simace/phenotyping/models/ defining a class that subclasses PhenotypeModel (see _base.py).
  2. Implement the abstract methods (from_config, add_cli_args, from_cli, cli_flag_attrs, to_params_dict, simulate).
  3. Import the class here and add it to the MODELS dict below.

That's the entire surface — there is no auto-discovery and no decorator. The hand-authored dict is the single source of truth for which model names the dispatcher accepts.

PhenotypeModel

Bases: ABC

Abstract base class for phenotype model families.

Subclasses are frozen dataclasses; their fields are the model's typed parameters. Validation happens in __post_init__.

from_config abstractmethod classmethod

from_config(params, trait_num)

Build an instance from the simulation parameter dict for trait trait_num.

Source code in simace/phenotyping/models/_base.py
@classmethod
@abstractmethod
def from_config(cls, params: dict[str, Any], trait_num: int) -> Self:
    """Build an instance from the simulation parameter dict for trait ``trait_num``."""

add_cli_args abstractmethod classmethod

add_cli_args(parser, trait)

Register this model's CLI flags on parser for the given trait.

Source code in simace/phenotyping/models/_base.py
@classmethod
@abstractmethod
def add_cli_args(cls, parser: argparse.ArgumentParser, trait: int) -> None:
    """Register this model's CLI flags on ``parser`` for the given trait."""

from_cli abstractmethod classmethod

from_cli(args, trait)

Build an instance from parsed CLI args for the given trait.

Must raise ValueError (with trait context) if flags belonging to another family are populated alongside this model's selection, or if a required flag for this model is missing.

Source code in simace/phenotyping/models/_base.py
@classmethod
@abstractmethod
def from_cli(cls, args: argparse.Namespace, trait: int) -> Self:
    """Build an instance from parsed CLI args for the given trait.

    Must raise ``ValueError`` (with trait context) if flags belonging
    to another family are populated alongside this model's selection,
    or if a required flag for this model is missing.
    """

cli_flag_attrs abstractmethod classmethod

cli_flag_attrs(trait)

Return the argparse attribute names this model owns for the given trait.

Used by sibling models' from_cli to detect foreign-flag conflicts.

Source code in simace/phenotyping/models/_base.py
@classmethod
@abstractmethod
def cli_flag_attrs(cls, trait: int) -> set[str]:
    """Return the argparse attribute names this model owns for the given trait.

    Used by sibling models' ``from_cli`` to detect foreign-flag conflicts.
    """

to_params_dict abstractmethod

to_params_dict()

Round-trip back to the per-trait phenotype_params{N} dict shape.

Source code in simace/phenotyping/models/_base.py
@abstractmethod
def to_params_dict(self) -> dict[str, Any]:
    """Round-trip back to the per-trait ``phenotype_params{N}`` dict shape."""

simulate abstractmethod

simulate(liability, *, seed, standardize, sex, generation)

Simulate event times for one trait.

standardize is the global liability-standardization mode ("none" | "global" | "per_generation"); legacy True/False bools are accepted via the same shim used at config-load.

Concrete subclasses route standardize differently:

  • Threshold-style models (threshold, adult.ltm) consume it directly to standardize liability before the threshold comparison.
  • Hazard-bearing models (frailty, cure_frailty, first_passage, adult.cox) carry an additional per-trait field standardize_hazard (defaulting to None → inherit) and resolve it via :func:simace.phenotyping.hazards.resolve_hazard_mode for the hazard step. cure_frailty is the only model that honors both knobs (threshold step + hazard step).
Source code in simace/phenotyping/models/_base.py
@abstractmethod
def simulate(
    self,
    liability: np.ndarray,
    *,
    seed: int,
    standardize: StandardizeMode | bool,
    sex: np.ndarray | None,
    generation: np.ndarray,
) -> np.ndarray:
    """Simulate event times for one trait.

    ``standardize`` is the global liability-standardization mode
    (``"none" | "global" | "per_generation"``); legacy ``True``/``False``
    bools are accepted via the same shim used at config-load.

    Concrete subclasses route ``standardize`` differently:

    * Threshold-style models (``threshold``, ``adult.ltm``) consume it
      directly to standardize liability before the threshold comparison.
    * Hazard-bearing models (``frailty``, ``cure_frailty``,
      ``first_passage``, ``adult.cox``) carry an additional per-trait
      field ``standardize_hazard`` (defaulting to ``None`` → inherit) and
      resolve it via :func:`simace.phenotyping.hazards.resolve_hazard_mode`
      for the hazard step. ``cure_frailty`` is the only model that
      honors both knobs (threshold step + hazard step).
    """

AdultModel dataclass

AdultModel(method, prevalence, cip_x0=50.0, cip_k=0.2, beta=1.0, beta_sex=0.0, standardize_hazard=None)

Bases: PhenotypeModel

ADuLT phenotype model.

PARAMETER DESCRIPTION
method

"ltm" or "cox".

TYPE: str

prevalence

case fraction K (scalar, per-generation, or sex-specific).

TYPE: Any

cip_x0

logistic CIP midpoint age (default 50.0).

TYPE: float DEFAULT: 50.0

cip_k

logistic CIP growth rate (default 0.2).

TYPE: float DEFAULT: 0.2

beta

liability scaling factor on the probit (ltm) or log-hazard (cox) scale (1.0 = no scaling).

TYPE: float DEFAULT: 1.0

beta_sex

sex coefficient (0.0 = no effect).

TYPE: float DEFAULT: 0.0

standardize_hazard

per-trait override for the hazard-step L standardization mode under method="cox". None inherits from the global standardize flag. Must be None when method="ltm" (threshold-style models honor the global flag directly).

TYPE: StandardizeMode | None DEFAULT: None

CureFrailtyModel dataclass

CureFrailtyModel(distribution, prevalence, hazard_params=dict(), beta=1.0, beta_sex=0.0, standardize_hazard=None)

Bases: PhenotypeModel

Mixture cure-frailty model.

PARAMETER DESCRIPTION
distribution

baseline hazard for cases (see simace.phenotyping.hazards).

TYPE: str

hazard_params

dict of distribution-specific parameter values.

TYPE: dict[str, float] DEFAULT: dict()

prevalence

case fraction; scalar, per-generation dict, or sex-specific dict.

TYPE: Any

beta

coefficient on liability in the log-hazard among cases.

TYPE: float DEFAULT: 1.0

beta_sex

coefficient on sex in the log-hazard (0 = no effect).

TYPE: float DEFAULT: 0.0

standardize_hazard

per-trait override for the case-onset hazard step. None inherits from the global standardize flag passed to simulate. This is the only model that honors both knobs: standardize controls the threshold step (case status), while standardize_hazard controls the hazard step (case onset). Setting them to different modes (e.g. standardize='per_generation' + standardize_hazard='global') preserves per-gen prevalence while keeping a constant hazard slope across generations.

TYPE: StandardizeMode | None DEFAULT: None

FirstPassageModel dataclass

FirstPassageModel(drift, shape, beta=1.0, beta_sex=0.0, standardize_hazard=None)

Bases: PhenotypeModel

First-passage-time phenotype model.

PARAMETER DESCRIPTION
drift

drift rate μ; must be non-zero. Negative drift → toward boundary (everyone onsets). Positive drift → emergent cure fraction.

TYPE: float

shape

y0² (initial distance from boundary, squared).

TYPE: float

beta

coefficient on liability for log(y0); positive β → worse.

TYPE: float DEFAULT: 1.0

beta_sex

coefficient on sex for log(y0) (0 = no effect).

TYPE: float DEFAULT: 0.0

FrailtyModel dataclass

FrailtyModel(distribution, hazard_params=dict(), beta=1.0, beta_sex=0.0, standardize_hazard=None)

Bases: PhenotypeModel

Frailty phenotype model.

PARAMETER DESCRIPTION
distribution

baseline hazard name (see simace.phenotyping.hazards).

TYPE: str

hazard_params

dict of distribution-specific parameter values.

TYPE: dict[str, float] DEFAULT: dict()

beta

coefficient on liability in the log-hazard.

TYPE: float DEFAULT: 1.0

beta_sex

coefficient on sex in the log-hazard (0 = no effect).

TYPE: float DEFAULT: 0.0

standardize_hazard

per-trait override for the hazard-step standardization mode ("none" | "global" | "per_generation"). None inherits from the global standardize flag passed to simulate.

TYPE: StandardizeMode | None DEFAULT: None

models.frailty

simace.phenotyping.models.frailty

Proportional-hazards frailty phenotype model.

Per-individual onset time is drawn from a parametric baseline hazard modulated by an exp(beta * L) frailty multiplier. Liability translates into earlier onset for higher beta * L.

FrailtyModel dataclass

FrailtyModel(distribution, hazard_params=dict(), beta=1.0, beta_sex=0.0, standardize_hazard=None)

Bases: PhenotypeModel

Frailty phenotype model.

PARAMETER DESCRIPTION
distribution

baseline hazard name (see simace.phenotyping.hazards).

TYPE: str

hazard_params

dict of distribution-specific parameter values.

TYPE: dict[str, float] DEFAULT: dict()

beta

coefficient on liability in the log-hazard.

TYPE: float DEFAULT: 1.0

beta_sex

coefficient on sex in the log-hazard (0 = no effect).

TYPE: float DEFAULT: 0.0

standardize_hazard

per-trait override for the hazard-step standardization mode ("none" | "global" | "per_generation"). None inherits from the global standardize flag passed to simulate.

TYPE: StandardizeMode | None DEFAULT: None

models.cure_frailty

simace.phenotyping.models.cure_frailty

Mixture cure-frailty phenotype model.

Liability above a threshold sets case status (WHO has the disease); a proportional hazards frailty model with a configurable baseline hazard determines onset time among cases (WHEN). Controls are censored at 1e6.

CureFrailtyModel dataclass

CureFrailtyModel(distribution, prevalence, hazard_params=dict(), beta=1.0, beta_sex=0.0, standardize_hazard=None)

Bases: PhenotypeModel

Mixture cure-frailty model.

PARAMETER DESCRIPTION
distribution

baseline hazard for cases (see simace.phenotyping.hazards).

TYPE: str

hazard_params

dict of distribution-specific parameter values.

TYPE: dict[str, float] DEFAULT: dict()

prevalence

case fraction; scalar, per-generation dict, or sex-specific dict.

TYPE: Any

beta

coefficient on liability in the log-hazard among cases.

TYPE: float DEFAULT: 1.0

beta_sex

coefficient on sex in the log-hazard (0 = no effect).

TYPE: float DEFAULT: 0.0

standardize_hazard

per-trait override for the case-onset hazard step. None inherits from the global standardize flag passed to simulate. This is the only model that honors both knobs: standardize controls the threshold step (case status), while standardize_hazard controls the hazard step (case onset). Setting them to different modes (e.g. standardize='per_generation' + standardize_hazard='global') preserves per-gen prevalence while keeping a constant hazard slope across generations.

TYPE: StandardizeMode | None DEFAULT: None

models.adult

simace.phenotyping.models.adult

ADuLT phenotype model.

Two sub-methods, selected by method:

  • ltm — liability threshold model: case status from raw liability vs. threshold; case onset age via logistic-CIP transform of the case CIR computed from a probit scaling of liability + sex.
  • cox — Weibull(shape=2) proportional hazards: case status by rank/(N+1) capped at K; case onset age via logistic CIP inverse.

Both share the CIP shape parameters cip_x0 and cip_k. Reference: Pedersen et al., Nat Commun 2023 (ADuLT).

Standardization routing is asymmetric across the two sub-methods:

  • method="ltm" is a threshold-on-liability model. It honors the global standardize flag for the liability z-score and rejects standardize_hazard in its params.
  • method="cox" is a hazard-on-liability model. It honors standardize_hazard (defaulting to the global standardize when unset) for the L scaling that feeds np.exp(beta * L).

Toggling method therefore silently changes which knob controls L scaling for that trait. The validation in __post_init__ surfaces this in the error message when the field is set on an LTM-method instance.

AdultModel dataclass

AdultModel(method, prevalence, cip_x0=50.0, cip_k=0.2, beta=1.0, beta_sex=0.0, standardize_hazard=None)

Bases: PhenotypeModel

ADuLT phenotype model.

PARAMETER DESCRIPTION
method

"ltm" or "cox".

TYPE: str

prevalence

case fraction K (scalar, per-generation, or sex-specific).

TYPE: Any

cip_x0

logistic CIP midpoint age (default 50.0).

TYPE: float DEFAULT: 50.0

cip_k

logistic CIP growth rate (default 0.2).

TYPE: float DEFAULT: 0.2

beta

liability scaling factor on the probit (ltm) or log-hazard (cox) scale (1.0 = no scaling).

TYPE: float DEFAULT: 1.0

beta_sex

sex coefficient (0.0 = no effect).

TYPE: float DEFAULT: 0.0

standardize_hazard

per-trait override for the hazard-step L standardization mode under method="cox". None inherits from the global standardize flag. Must be None when method="ltm" (threshold-style models honor the global flag directly).

TYPE: StandardizeMode | None DEFAULT: None

models.first_passage

simace.phenotyping.models.first_passage

First-passage time phenotype model.

A latent health process Y(t) = y0 + driftt + W(t) starts at y0 = sqrt(shape) * exp(-scaled_beta(L - mean) - beta_sexsex). Disease onset is the first time Y(t) ≤ 0. When drift < 0 the process hits the boundary in finite time (everyone eventually onsets); when drift > 0 an emergent cure fraction P(never hit) = 1 - exp(-2y0*drift) arises.

References

Lee & Whitmore 2006 (threshold regression). Aalen & Gjessing 2001 (first-passage models in survival).

FirstPassageModel dataclass

FirstPassageModel(drift, shape, beta=1.0, beta_sex=0.0, standardize_hazard=None)

Bases: PhenotypeModel

First-passage-time phenotype model.

PARAMETER DESCRIPTION
drift

drift rate μ; must be non-zero. Negative drift → toward boundary (everyone onsets). Positive drift → emergent cure fraction.

TYPE: float

shape

y0² (initial distance from boundary, squared).

TYPE: float

beta

coefficient on liability for log(y0); positive β → worse.

TYPE: float DEFAULT: 1.0

beta_sex

coefficient on sex for log(y0) (0 = no effect).

TYPE: float DEFAULT: 0.0

bimodal_phenotype (prototype)

simace.phenotyping._prototypes.bimodal_phenotype

Bimodal phenotype models (prototype).

Three models that produce bimodal age-of-onset distributions
  1. mixture_cip: Two logistic CIP components, shared β
  2. mixture_cure_frailty: Two cure-frailty hazards, shared β
  3. two_threshold: Two liability thresholds with separate CIP per stratum

Standardization: each function accepts standardize as one of "none", "global", or a legacy bool (True"global", False"none"). "per_generation" is not supported here — these prototypes do not take a generation array.

phenotype_mixture_cip

phenotype_mixture_cip(liability, prevalence, beta=1.0, pi=0.5, cip_x0_1=10.0, cip_k_1=0.3, cip_x0_2=40.0, cip_k_2=0.2, seed=42, standardize='global', sex=None, beta_sex=0.0)

Mixture of two logistic CIP curves with shared liability threshold.

Each case is randomly assigned to component 1 (probability π) or component 2 (probability 1−π). Both components share the same prevalence K and liability scaling (β, β_sex), but have different CIP midpoints (x₀) and steepness (k), producing bimodal onset.

PARAMETER DESCRIPTION
liability

quantitative liability, shape (n,)

TYPE: ndarray

prevalence

population prevalence K; scalar or per-individual array

TYPE: float | ndarray

beta

probit scaling factor for liability

TYPE: float DEFAULT: 1.0

pi

mixing weight for component 1 (0–1)

TYPE: float DEFAULT: 0.5

cip_x0_1

logistic CIP midpoint for component 1 (early)

TYPE: float DEFAULT: 10.0

cip_k_1

logistic CIP growth rate for component 1

TYPE: float DEFAULT: 0.3

cip_x0_2

logistic CIP midpoint for component 2 (late)

TYPE: float DEFAULT: 40.0

cip_k_2

logistic CIP growth rate for component 2

TYPE: float DEFAULT: 0.2

seed

random seed for component assignment

TYPE: int DEFAULT: 42

standardize

standardization mode for liability ("none", "global", or legacy bool). "per_generation" is not accepted.

TYPE: StandardizeMode | bool DEFAULT: 'global'

sex

binary sex covariate (0/1), shape (n,), or None

TYPE: ndarray | None DEFAULT: None

beta_sex

probit-scale coefficient for sex

TYPE: float DEFAULT: 0.0

RETURNS DESCRIPTION
ndarray

Array of simulated event times, shape (n,)

Source code in simace/phenotyping/_prototypes/bimodal_phenotype.py
def phenotype_mixture_cip(
    liability: np.ndarray,
    prevalence: float | np.ndarray,
    beta: float = 1.0,
    pi: float = 0.5,
    cip_x0_1: float = 10.0,
    cip_k_1: float = 0.3,
    cip_x0_2: float = 40.0,
    cip_k_2: float = 0.2,
    seed: int = 42,
    standardize: StandardizeMode | bool = "global",
    sex: np.ndarray | None = None,
    beta_sex: float = 0.0,
) -> np.ndarray:
    """Mixture of two logistic CIP curves with shared liability threshold.

    Each case is randomly assigned to component 1 (probability π) or
    component 2 (probability 1−π).  Both components share the same
    prevalence K and liability scaling (β, β_sex), but have different
    CIP midpoints (x₀) and steepness (k), producing bimodal onset.

    Args:
        liability:   quantitative liability, shape (n,)
        prevalence:  population prevalence K; scalar or per-individual array
        beta:        probit scaling factor for liability
        pi:          mixing weight for component 1 (0–1)
        cip_x0_1:    logistic CIP midpoint for component 1 (early)
        cip_k_1:     logistic CIP growth rate for component 1
        cip_x0_2:    logistic CIP midpoint for component 2 (late)
        cip_k_2:     logistic CIP growth rate for component 2
        seed:        random seed for component assignment
        standardize: standardization mode for liability (``"none"``, ``"global"``,
                     or legacy bool).  ``"per_generation"`` is not accepted.
        sex:         binary sex covariate (0/1), shape (n,), or None
        beta_sex:    probit-scale coefficient for sex

    Returns:
        Array of simulated event times, shape (n,)
    """
    mode = _resolve_mode(standardize)
    L = standardize_liability(liability, mode)

    _ndtri_vec = np.vectorize(_ndtri_approx)
    threshold = _ndtri_vec(1.0 - prevalence)
    is_case = threshold < L

    t = np.full(len(L), 1e6)
    n_cases = is_case.sum()
    if n_cases > 0:
        prev_case = prevalence[is_case] if isinstance(prevalence, np.ndarray) else prevalence
        L_eff = beta * L[is_case]
        if beta_sex != 0.0 and sex is not None:
            L_eff = L_eff + beta_sex * sex[is_case]

        cir = 0.5 * erfc(L_eff / np.sqrt(2.0))
        valid = cir < prev_case
        cir_clipped = np.clip(cir, 1e-10, np.asarray(prev_case) - 1e-10)

        # Component assignment
        rng = np.random.default_rng(seed)
        comp1 = rng.random(n_cases) < pi

        # Invert through each component's logistic CIP
        onset_1 = cip_x0_1 + (1.0 / cip_k_1) * np.log(cir_clipped / (prev_case - cir_clipped))
        onset_2 = cip_x0_2 + (1.0 / cip_k_2) * np.log(cir_clipped / (prev_case - cir_clipped))

        onset = np.where(comp1, onset_1, onset_2)
        onset[~valid] = 1e6
        t[is_case] = onset

    np.clip(t, 0.01, 1e6, out=t)
    return t

phenotype_mixture_cure_frailty

phenotype_mixture_cure_frailty(liability, prevalence, beta, pi, baseline, hazard_params_1, hazard_params_2, seed, standardize='global', sex=None, beta_sex=0.0)

Mixture of two cure-frailty components with shared threshold.

Cases (L > threshold) are randomly assigned to component 1 (prob π) or component 2 (prob 1−π). Each component has its own baseline hazard parameters, producing different onset timing distributions.

PARAMETER DESCRIPTION
liability

quantitative liability, shape (n,)

TYPE: ndarray

prevalence

population prevalence K

TYPE: float | ndarray

beta

effect of liability on log-hazard among cases

TYPE: float

pi

mixing weight for component 1

TYPE: float

baseline

baseline hazard model name (shared)

TYPE: str

hazard_params_1

hazard parameters for component 1

TYPE: dict[str, float]

hazard_params_2

hazard parameters for component 2

TYPE: dict[str, float]

seed

random seed

TYPE: int

standardize

standardization mode for liability ("none", "global", or legacy bool). "per_generation" is not accepted.

TYPE: StandardizeMode | bool DEFAULT: 'global'

sex

binary sex covariate (0/1), shape (n,), or None

TYPE: ndarray | None DEFAULT: None

beta_sex

effect of sex on log-hazard

TYPE: float DEFAULT: 0.0

RETURNS DESCRIPTION
ndarray

Array of simulated event times, shape (n,)

Source code in simace/phenotyping/_prototypes/bimodal_phenotype.py
def phenotype_mixture_cure_frailty(
    liability: np.ndarray,
    prevalence: float | np.ndarray,
    beta: float,
    pi: float,
    baseline: str,
    hazard_params_1: dict[str, float],
    hazard_params_2: dict[str, float],
    seed: int,
    standardize: StandardizeMode | bool = "global",
    sex: np.ndarray | None = None,
    beta_sex: float = 0.0,
) -> np.ndarray:
    """Mixture of two cure-frailty components with shared threshold.

    Cases (L > threshold) are randomly assigned to component 1 (prob π)
    or component 2 (prob 1−π).  Each component has its own baseline hazard
    parameters, producing different onset timing distributions.

    Args:
        liability:       quantitative liability, shape (n,)
        prevalence:      population prevalence K
        beta:            effect of liability on log-hazard among cases
        pi:              mixing weight for component 1
        baseline:        baseline hazard model name (shared)
        hazard_params_1: hazard parameters for component 1
        hazard_params_2: hazard parameters for component 2
        seed:            random seed
        standardize:     standardization mode for liability (``"none"``,
                         ``"global"``, or legacy bool).  ``"per_generation"``
                         is not accepted.
        sex:             binary sex covariate (0/1), shape (n,), or None
        beta_sex:        effect of sex on log-hazard

    Returns:
        Array of simulated event times, shape (n,)
    """
    mode = _resolve_mode(standardize)
    L = standardize_liability(liability, mode)
    mean_arr, beta_arr = standardize_beta(liability, beta, mode)
    mean = float(mean_arr[0])
    scaled_beta = float(beta_arr[0])

    _ndtri_vec = np.vectorize(_ndtri_approx)
    threshold = _ndtri_vec(1.0 - prevalence)
    is_case = threshold < L

    n = len(liability)
    t = np.full(n, 1e6)
    n_cases = is_case.sum()
    if n_cases > 0:
        rng = np.random.default_rng(seed)
        neg_log_u = rng.exponential(size=n_cases)
        comp1 = rng.random(n_cases) < pi

        if beta_sex != 0.0 and sex is not None:
            neg_log_u = neg_log_u / np.exp(beta_sex * sex[is_case])

        # Pass RAW liability to the hazard kernel (mirrors the cure_frailty
        # fix in models/cure_frailty.py): the kernel computes
        # ``z = exp(scaled_beta * (L - mean))`` and expects the raw L; passing
        # the standardized L_z would double-shift.
        L_cases = liability[is_case]
        t1 = BASELINE_HAZARDS[baseline](neg_log_u, L_cases, mean, scaled_beta, hazard_params_1)
        t2 = BASELINE_HAZARDS[baseline](neg_log_u, L_cases, mean, scaled_beta, hazard_params_2)

        t[is_case] = np.where(comp1, t1, t2)

    return t

phenotype_two_threshold

phenotype_two_threshold(liability, prevalence_early, prevalence_late, beta=1.0, cip_x0_1=10.0, cip_k_1=0.3, cip_x0_2=40.0, cip_k_2=0.2, seed=42, standardize='global', sex=None, beta_sex=0.0)

Two-threshold liability model with separate CIP per stratum.

High-liability individuals (L > τ₁) are early-onset cases mapped through CIP₁. Moderate-liability individuals (τ₂ < L ≤ τ₁) are late-onset cases mapped through CIP₂. This preserves a single liability dimension while producing bimodal age-of-onset.

PARAMETER DESCRIPTION
liability

quantitative liability, shape (n,)

TYPE: ndarray

prevalence_early

K_early (fraction with L > τ₁)

TYPE: float | ndarray

prevalence_late

K_late (fraction with τ₂ < L ≤ τ₁); total prevalence = K_early + K_late

TYPE: float | ndarray

beta

probit scaling factor for liability

TYPE: float DEFAULT: 1.0

cip_x0_1

logistic CIP midpoint for early component

TYPE: float DEFAULT: 10.0

cip_k_1

logistic CIP growth rate for early component

TYPE: float DEFAULT: 0.3

cip_x0_2

logistic CIP midpoint for late component

TYPE: float DEFAULT: 40.0

cip_k_2

logistic CIP growth rate for late component

TYPE: float DEFAULT: 0.2

seed

unused (deterministic model)

TYPE: int DEFAULT: 42

standardize

standardization mode for liability ("none", "global", or legacy bool). "per_generation" is not accepted.

TYPE: StandardizeMode | bool DEFAULT: 'global'

sex

binary sex covariate (0/1), shape (n,), or None

TYPE: ndarray | None DEFAULT: None

beta_sex

probit-scale coefficient for sex

TYPE: float DEFAULT: 0.0

RETURNS DESCRIPTION
ndarray

Array of simulated event times, shape (n,)

Source code in simace/phenotyping/_prototypes/bimodal_phenotype.py
def phenotype_two_threshold(
    liability: np.ndarray,
    prevalence_early: float | np.ndarray,
    prevalence_late: float | np.ndarray,
    beta: float = 1.0,
    cip_x0_1: float = 10.0,
    cip_k_1: float = 0.3,
    cip_x0_2: float = 40.0,
    cip_k_2: float = 0.2,
    seed: int = 42,
    standardize: StandardizeMode | bool = "global",
    sex: np.ndarray | None = None,
    beta_sex: float = 0.0,
) -> np.ndarray:
    """Two-threshold liability model with separate CIP per stratum.

    High-liability individuals (L > τ₁) are early-onset cases mapped
    through CIP₁.  Moderate-liability individuals (τ₂ < L ≤ τ₁) are
    late-onset cases mapped through CIP₂.  This preserves a single
    liability dimension while producing bimodal age-of-onset.

    Args:
        liability:        quantitative liability, shape (n,)
        prevalence_early: K_early (fraction with L > τ₁)
        prevalence_late:  K_late (fraction with τ₂ < L ≤ τ₁);
                          total prevalence = K_early + K_late
        beta:             probit scaling factor for liability
        cip_x0_1:        logistic CIP midpoint for early component
        cip_k_1:         logistic CIP growth rate for early component
        cip_x0_2:        logistic CIP midpoint for late component
        cip_k_2:         logistic CIP growth rate for late component
        seed:             unused (deterministic model)
        standardize:      standardization mode for liability (``"none"``,
                          ``"global"``, or legacy bool).  ``"per_generation"``
                          is not accepted.
        sex:              binary sex covariate (0/1), shape (n,), or None
        beta_sex:         probit-scale coefficient for sex

    Returns:
        Array of simulated event times, shape (n,)
    """
    mode = _resolve_mode(standardize)
    L = standardize_liability(liability, mode)

    _ndtri_vec = np.vectorize(_ndtri_approx)
    K_total = np.asarray(prevalence_early) + np.asarray(prevalence_late)
    tau1 = _ndtri_vec(1.0 - prevalence_early)  # early threshold (higher)
    tau2 = _ndtri_vec(1.0 - K_total)  # late threshold (lower)

    early = tau1 < L
    late = (tau2 < L) & (~early)

    t = np.full(len(L), 1e6)

    for mask, prev_param, x0, k in [
        (early, prevalence_early, cip_x0_1, cip_k_1),
        (late, prevalence_late, cip_x0_2, cip_k_2),
    ]:
        n_grp = mask.sum()
        if n_grp == 0:
            continue
        prev_grp = prev_param[mask] if isinstance(prev_param, np.ndarray) else prev_param
        L_eff = beta * L[mask]
        if beta_sex != 0.0 and sex is not None:
            L_eff = L_eff + beta_sex * sex[mask]

        cir = 0.5 * erfc(L_eff / np.sqrt(2.0))
        valid = cir < prev_grp
        cir = np.clip(cir, 1e-10, np.asarray(prev_grp) - 1e-10)
        onset = x0 + (1.0 / k) * np.log(cir / (prev_grp - cir))
        onset[~valid] = 1e6
        t[mask] = onset

    np.clip(t, 0.01, 1e6, out=t)
    return t