meda package
- meda.logging_setup(log_file: str = 'meda.log', logger_name: str = 'MEDA') None [source]
Setup logging with a file handler.
- Parameters:
log_file (str) – The path to the log file. Defaults to ‘meda.log’.
logger_name (str) – The name of the logger. Defaults to ‘MEDA’.
- Returns:
None
meda.analysis module
- meda.analysis.lca(data: DataFrame, outcome: str | None = None, confounders: list | None = None, n_classes: list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], fixed_n_classes: int | None = None, show_metrics: bool = False, cv: int = 3, return_assignments: bool = False, show_polar_plot: bool = False, cmap: str = 'tab10', trained_model: StepMix | None = None, truncate_labels: bool = False, random_state: int = 42, n_steps: int = 1, measurement: str = 'bernoulli', structural: str = 'bernoulli', confounder_order: list | None = None, return_confounder_order: bool = False, output_folder: str | None = None, **kwargs)[source]
Fits a Latent Class Analysis (LCA) model to the given data using StepMix. If no outcome or confounders are provided, an unsupervised approach is used. Optionally plots a polar plot of the latent class assignments with normalized prevalences.
- Parameters:
data (pd.DataFrame) – The input data containing the variables for LCA.
outcome (str, optional) – The name of the outcome variable column. Defaults to None.
confounders (list, optional) – A list of confounders column names to be used in the model. Defaults to None.
n_classes (list, optional) – The number of latent classes to fit. Defaults to a range from 1 to 10.
fixed_n_classes (int, optional) – A fixed number of latent classes to use instead of tuning. Defaults to None.
show_metrics (bool, optional) – Whether to plot LCA metrics. Only applies when fixed_n_classes is None. Defaults to False.
cv (int, optional) – The number of cross-validation folds for hyperparameter tuning. Defaults to 3.
return_assignments (bool, optional) – Whether to return the latent class assignments for the observations. Defaults to False.
show_polar_plot (bool, optional) – Whether to plot a polar plot of the latent class assignments. Defaults to False.
cmap (str, optional) – The colormap to use for plotting clusters. Defaults to ‘tab10’.
trained_model (StepMix, optional) – A pre-trained StepMix model to use for predictions. If provided, no new model will be trained. Defaults to None.
truncate_labels (bool, optional) – Whether to truncate long labels in the polar plot. Defaults to False.
random_state (int, optional) – Random seed for reproducibility. Defaults to 42.
n_steps (int, optional) – The number of steps for the StepMix model. Defaults to 1.
measurement (str, optional) – Measurement model type. Defaults to ‘bernoulli’.
structural (str, optional) – Structural model type. Defaults to ‘bernoulli’.
confounder_order (list, optional) – A predefined order for confounders in the polar plot. Defaults to None.
return_confounder_order (bool, optional) – Whether to return the order of confounders used in the polar plot. Defaults to False.
output_folder (str, optional) – The folder to save the plots. Defaults to None.
**kwargs – Additional keyword arguments to pass to the StepMix model.
- Returns:
If neither return_assignments nor return_confounder_order is True, returns the fitted LCA model. If return_assignments is True, returns (model, assignments). If return_confounder_order is True, returns (model, sorted_confounder_names). If both are True, returns (model, assignments, sorted_confounder_names).
- Return type:
Union[StepMix, Tuple]
Examples
>>> import pandas as pd >>> from sklearn.datasets import make_blobs >>> from meda.analysis import lca >>> # generate synthetic data with 3 actual latent classes >>> X, _ = make_blobs(n_samples=1000, centers=3, n_features=6, random_state=42) >>> synthetic_data = pd.DataFrame( ... X, ... columns=['var_1', 'var_2', 'var_3', 'var_4', 'var_5', 'var_6'] ... ) >>> synthetic_data = (synthetic_data > synthetic_data.median()).astype(int) >>> # fit LCA model >>> model = lca(data=synthetic_data, n_classes=[2, 3, 4, 5], show_polar_plot=True)
- meda.analysis.logit(data: DataFrame, outcome: str, confounders: list, categorical_vars: list | None = None, drop_first: bool = True, dropna: bool = False, show_results: bool = False, show_forest_plot: bool = False, reference_col: str | None = None, selected_confounders: list | None = None, confounder_names: dict | None = None, custom_colors: list | None = None, error_bar_colors: list | None = None) Logit [source]
Fits a statsmodels logistic regression model to the given data. Optionally plots a forest plot of the odds ratios.
- Parameters:
data (pd.DataFrame) – The input data containing the outcome variable and confounders.
outcome (str) – The name of the outcome variable column.
confounders (list) – A list of confounders column names to be used in the model.
dropna (bool, optional) – Whether to drop rows with missing values. Defaults to False.
categorical_vars (list, optional) – A list of categorical variable column names to be converted to dummy variables. Defaults to None.
drop_first (bool, optional) – Whether to drop the first dummy variable for each categorical variable or reference_col, when given and applicable. Defaults to True.
show_results (bool, optional) – Whether to print the summary of the logistic regression results. Defaults to False.
show_forest_plot (bool, optional) – Whether to plot a forest plot of the odds ratios. Defaults to False.
reference_col (str, optional) – The reference column for adjusting odds ratios. Defaults to None.
selected_confounders (list, optional) – A list of selected confounders to be included in the forest plot. Defaults to None.
confounder_names (dict, optional) – A dictionary mapping original confounder names to display names in the forest plot. Defaults to None.
custom_colors (list, optional) – A list of custom colors for the points in the forest plot. Defaults to None.
error_bar_colors (list, optional) – A list of custom colors for the error bars in the forest plot. Defaults to None.
- Returns:
The fitted logistic regression model.
- Return type:
sm.Logit
Examples
>>> import pandas as pd >>> from meda.analysis import logit >>> data = pd.DataFrame({ ... 'outcome': [1, 0, 1, 0, 1], ... 'confounder_1': [5, 3, 6, 2, 7], ... 'confounder_2': [1, 0, 1, 0, 1] ... }) >>> result = logit( ... data=data, ... outcome='outcome', ... confounders=['confounder_1', 'confounder_2'], ... show_forest_plot=True, ... reference_col='confounder_1' ... )