scegot package

Submodules

scegot.scegot module

Module contents

scegot.is_notebook()

Check if the code is running in a Jupyter notebook or not.

Returns

bool

True if the code is running in a Jupyter notebook, False otherwise.

class scegot.scEGOT(X, day_names=None, verbose=True, adata_day_key=None)

Bases: object

animate_gene_expression(target_gene_name, mode='pca', interpolate_interval=11, n_samples=5000, x_range=None, y_range=None, c_range=None, x_label=None, y_label=None, cmap='gnuplot2', save=False, save_path=None)

Calculate interpolation between all timepoints and create animation colored by gene expression level.

Parameters

target_gene_namestr

Gene name to plot expression level.

mode{‘pca’, ‘umap’}, optional

The space to plot gene expression levels, by default “pca”

interpolate_intervalint, optional

Number of frames to interpolate between two timepoints, by default 11 This is the total number of frames at both timepoints and the number of frames between these. Note that both ends are included.

n_samplesint, optional

Number of samples to generate, by default 5000

x_rangelist or tuple of float of shape (2,), optional

Range of the x-axis, by default None

y_rangelist or tuple of float of shape (2,), optional

Range of the y-axis, by default None

c_rangelist or tuple of float of shape (2,), optional

Range of the color bar, by default None

x_labelstr, optional

Label of the x-axis, by default None

y_labelstr, optional

Label of the y-axis, by default None

cmapstr, optional

String of the colormap, by default “gnuplot2”

savebool, optional

If True, save the output image, by default False

save_path_type_, optional

Path to save the output image, by default None If None, the image will be saved as ‘./interpolate_video.gif’

Raises

ValueError

When ‘mode’ is not ‘pca’ or ‘umap’.

animatie_interpolated_distribution(x_range=None, y_range=None, interpolate_interval=11, cmap='gnuplot2', save=False, save_path=None)

Export an animation of the interpolated distribution between GMM models.

Parameters

x_rangelist or tuple of float of shape (2,), optional

Restrict the X axis range, by default None

y_rangelist or tuple of float of shape (2,), optional

Restrict the Y axis range, by default None

interpolate_intervalint, optional

The number of frames to interpolate between two timepoints, by default 11 This is the total number of frames at both timepoints and the number of frames between these. Note that both ends are included.

cmapstr, optional

String of matplolib colormap name, by default “gnuplot2”

savebool, optional

If True, save the output animation, by default False

save_path_type_, optional

Path to save the output animation, by default None If None, the animation will be saved as ‘./cell_state_video.gif’

apply_umap(n_neighbors, n_components=2, random_state=None, min_dist=0.1, umap_other_params={})

Fit self.X_pca to UMAP and return the transformed data.

Parameters

n_neighborsfloat

The size of local neighborhood used for manifold approximation. Passed to the ‘n_neighbors’ parameter of the UMAP class.

n_componentsint, optional

The dimension of the space to embed into, by default 2 Passed to the ‘n_components’ parameter of the UMAP class.

random_stateint, RandomState instance or None, optional

Fix the random seed for reproducibility, by default None Passed to the ‘random_state’ parameter of the UMAP class.

min_distfloat, optional

The effective minimum distance between embedded points, by default 0.1 Passed to the ‘min_dist’ parameter of the UMAP class.

umap_other_paramsdict, optional

Other parameters for UMAP, by default {}

Returns

list of pd.DataFrame of shape (n_samples, n_components of UMAP)

UMAP-transformed data.

umap.umap_.UMAP

UMAP instance fitted to the input data.

bures_wasserstein_distance(m_0, m_1, sigma_0, sigma_1)
calculate_cell_velocities()

Calculate cell velocities between each day.

Returns

pd.DataFrame

Cell velocities between each day. The rows are ordered as follows: when the number of days is N and the number of cells in each day is M_1, M_2, …, M_N, [day1_cell1 -> day1_cell2 -> … -> day1_cellM_1 -> day2cell1 -> … -> day(N-1)cellM_N]

calculate_grns(selected_clusters=None, alpha_range=(-2, 2), cv=3, ridge_cv_fit_intercept=False, ridge_fit_intercept=False)

Calculate gene regulatory networks (GRNs) between each day.

Parameters

selected_clusterslist of list of int of shape (n_days, 2), optional

Specify the clusters to calculate GRNs, by default None If None, all clusters will be used. The list should be like [[day1’s index, selected cluster number], [day2’s index, selected cluster number], …].

alpha_rangetuple or list of float of shape (2,), optional

Range of alpha values for Ridge regression, by default (-2, 2)

cvint, optional

Number of cross-validation folds, by default 3 This parameter is passed to RidgeCV’s ‘cv’ parameter.

ridge_cv_fit_interceptbool, optional

Whether to calculate the intercept in RidgeCV, by default False This parameter is passed to RidgeCV’s ‘fit_intercept’ parameter.

ridge_fit_interceptbool, optional

Whether to calculate the intercept in Ridge, by default False This parameter is passed to Ridge’s ‘fit_intercept’ parameter.

Returns

list of pd.DataFrame

Gene regulatory networks between each day. The rows and columns are gene names. Each element of the list corresponds to the GRN between day i and day i + 1.

list of RidgeCV objects

RidgeCV objects used to calculate GRNs. Each element of the list corresponds to the RidgeCV object between day i and day i + 1.

calculate_mut_st(gmm_source, gmm_target, t)
calculate_normalized_solutions(gmm_models, reg=0.01, numItermax=10000000000, method='sinkhorn_epsilon_scaling', tau=100000000.0, stopThr=1e-09, sinkhorn_other_params={})
calculate_solution(gmm_source, gmm_target, reg=0.01, numItermax=10000000000, method='sinkhorn_epsilon_scaling', tau=100000000.0, stopThr=1e-09, sinkhorn_other_params={})
calculate_solutions(gmm_models, reg=0.01, numItermax=10000000000, method='sinkhorn_epsilon_scaling', tau=100000000.0, stopThr=1e-09, sinkhorn_other_params={})
calculate_waddington_potential(n_neighbors=100, knn_other_params={})

Calculate Waddington potential of each sample.

Parameters

n_neighborsint, optional

Number of neighbors for rach sample, by default 100 This parameter is passed to ‘kneighbors_graph’ function.

knn_other_paramsdict, optional

Other parameters for ‘kneighbors_graph’ function, by default {}

Returns

np.ndarray of shape (sum of n_samples of each day - n_samples of the last day,)

Waddington potential of each sample.

egot(pi_0, pi_1, mu_0, mu_1, S_0, S_1, reg=0.01, numItermax=10000000000, method='sinkhorn_epsilon_scaling', tau=100000000.0, stopThr=1e-09, sinkhorn_other_params={})
fit_gmm(n_components_list, covariance_type='full', max_iter=2000, n_init=10, random_state=None, gmm_other_params={})
fit_predict_gmm(n_components_list, covariance_type='full', max_iter=2000, n_init=10, random_state=None, gmm_other_params={})

Fit GMM models with each day’s data and predict labels for them.

Parameters

n_components_listlist of int

Each element corresponds to the number of components of the GMM model for each day. Passed to the ‘n_components’ parameter of the GaussianMixture class.

covariance_type{‘full’, ‘tied’, ‘diag’, ‘spherical’}, optional

String describing the type of covariances parameters to use, by default “full” Passed to the ‘covariance_type’ parameter of the GaussianMixture class.

max_iterint, optional

The number of EM iterations to perform, by default 2000 Passed to the ‘max_iter’ parameter of the GaussianMixture class.

n_initint, optional

The number of initializations to perform, by default 10 Passed to the ‘n_init’ parameter of the GaussianMixture class.

random_stateint, RandomState instance or None, optional

Controls the random seed given at each GMM model initialization, by default None Passed to the ‘random_state’ parameter of the GaussianMixture class.

gmm_other_paramsdict, optional

Other parameters for GMM, by default {}

Returns

list of GaussianMixture instances

The length of the list is the same as the number of days. Each element is a GMM instance fitted to the corresponding day’s data.

list of np.ndarray

List of GMM labels. Each element is the predicted labels for the corresponding day’s data.

gaussian_mixture_density(mu, sigma, alpha, x)
generate_cluster_names_with_day(cluster_names=None)
get_gaussian_map(m_0, m_1, sigma_0, sigma_1, x)
get_gmm_means()
get_positive_gmm_mean_gene_values_per_cluster(gmm_means, cluster_names=None)
make_cell_state_graph(cluster_names, mode='pca', threshold=0.05)

Compute cell state graph and build a networkx graph object.

Parameters

cluster_names2D list of str

1st dimension is the number of days, 2nd dimension is the number of gmm components in each day. Can be generaged by ‘generate_cluster_names’ method.

mode{‘pca’, ‘umap’}, optional

The space to build the cell state graph, by default “pca”

thresholdfloat, optional

Threshold to filter edges, by default 0.05 Only edges with edge_weights greater than this threshold will be included.

Returns

nx.classes.digraph.DiGraph

Networkx graph object of the cell state graph

Raises

ValueError

When ‘mode’ is not ‘pca’ or ‘umap’.

make_interpolation_data(gmm_source, gmm_target, t, columns=None, n_samples=2000, seed=0)

Make interpolation data between two timepoints.

Parameters

gmm_sourceGaussianMixture

GMM model of the source timepoint.

gmm_targetGaussianMixture

GMM model of the target timepoint.

tfloat

Interpolation ratio. 0 <= t <= 1. 0 is the source timepoint, 1 is the target timepoint. If you specify 0.5, the data will be interpolated halfway between the source and target timepoints.

columnslist of str, optional

Columns names of the output data, by default None

n_samplesint, optional

Number of samples to generate, by default 2000

seedint, optional

Random seed, by default 0

Returns

pd.DataFrame

Interpolated data between two timepoints.

plot_cell_state_graph(G, cluster_names, tf_gene_names=None, tf_gene_pick_num=5, save=False, save_path=None)

Plot the cell state graph with the given graph object.

Parameters

Gnx.classes.digraph.DiGraph

Networkx graph object of the cell state graph.

cluster_nameslist of list of str

1st dimension is the number of days, 2nd dimension is the number of gmm components of each day. Can be generaged by ‘generate_cluster_names’ method.

tf_gene_nameslist of str, optional

List of transcription factor gene names to use, by default None If None, all gene names (self.gene_names) will be used. You can pass on any list of gene names you want to use, not limited to TF genes.

tf_gene_pick_numint, optional

The number of genes to show in each node and edge, by default 5

savebool, optional

If True, save the output image, by default False

save_path_type_, optional

Path to save the output image, by default None If None, the image will be saved as ‘./cell_state_graph.png’

plot_cell_velocity(velocities, mode='pca', color_points='gmm', size_points=30, cmap='tab20', cluster_names=None, save=False, save_path=None)

Plot cell velocities in 2D space.

Parameters

velocitiespd.DataFrame

Cell velocities calculated by ‘calculate_cell_velocities’ method.

mode{‘pca’ or ‘umap’}, optional

The space to plot cell velocities, by default “pca”

color_points{‘gmm’ or ‘day’}, optional

Color points by GMM clusters or days, by default “gmm”

size_pointsint, optional

Size of points, by default 30

cmapstr, optional

String of matplolib colormap name, by default “tab20”

cluster_nameslist of str of shape (sum of gmm components), optional

List of gmm cluster names, by default None Used when ‘color_points’ is ‘gmm’. You need to flatten the list of lists of gmm cluster names before passing it.

savebool, optional

If True, save the output image, by default False

save_pathstr, optional

Path to save the output image, by default None If None, the image will be saved as ‘./cell_velocity.png’

Raises

ValueError

This error is raised in the following cases: - When ‘mode’ is not ‘pca’ or ‘umap’. - When ‘color_points’ is not ‘gmm’ or ‘day’. - When ‘color_points’ is ‘gmm’ and ‘cluster_names’ is None.

plot_fold_change(cluster_names, cluster1, cluster2, tf_gene_names=None, threshold=1.0, save=False, save_path=None)

Plot fold change between two clusters.

Parameters

cluster_nameslist of list of str

1st dimension is the number of days, 2nd dimension is the number of gmm components in each day. Can be generaged by ‘generate_cluster_names’ method.

cluster1str

Cluster name of denominator.

cluster2str

Cluster name of numerator.

tf_gene_nameslist of str, optional

List of transcription factor gene names to use, by default None If None, all gene names (self.gene_names) will be used. You can pass on any list of gene names you want to use, not limited to TF genes.

thresholdfloat, optional

Threshold to filter labels, by default 1.0 Only genes with fold change greater than this threshold will be plotted its label.

savebool, optional

If True, save the output image, by default False

save_pathstr, optional

Path to save the output image, by default None If None, the image will be saved as ‘./fold_change.png’

plot_gene_expression_2d(gene_name, mode='pca', col=None, save=False, save_path=None)

Plot gene expression levels in 2D space.

Parameters

gene_namestr

Gene name to plot expression level.

mode{‘pca’, ‘umap’}, optional

The space to plot gene expression levels, by default “pca”

collist or tuple of str of shape (2,), optional

X and Y axis labels, by default None If None, the first two columns of the input data will be used.

savebool, optional

If True, save the output image, by default False

save_pathstr, optional

Path to save the output image, by default None If None, the image will be saved as ‘./pathway_single_gene_2d.png’

Raises

ValueError

When ‘mode’ is not ‘pca’ or ‘umap’.

plot_gene_expression_3d(gene_name, col=None, save=False, save_path=None)

Plot gene expression levels in 3D space.

Parameters

gene_namestr

Gene name to plot expression level.

collist or tuple of str of shape (2,), optional

X, Y, and Z axis labels, by default None If None, the first three columns of the input data will be used.

savebool, optional

If True, save the output image, by default False

save_path_type_, optional

Path to save the output image, by default None If None, the image will be saved as ‘./pathway_single_gene_3d.html’

plot_gmm_predictions(mode='pca', figure_labels=None, x_range=None, y_range=None, figure_titles_without_gmm=None, figure_titles_with_gmm=None, plot_gmm_means=False, cmap='plasma', save=False, save_paths=None)

Plot GMM predictions. Output images for the number of days. Each image contains two subplots: left one is in one color and right one is colored by GMM labels.

Parameters

mode{‘pca’, ‘umap’}, optional

The space to plot the GMM predictions, by default “pca”

figure_labelslist or tuple of str of shape (2,), optional

X and Y axis labels, by default None If None, the first two columns of the input data will be used.

x_rangelist or tuple of float of shape (2,), optional

Restrict the X axis range, by default None If None, the range will be automatically determined to include all data points.

y_rangelist or tuple of float of shape (2,), optional

Restrict the Y axis range, by default None If None, the range will be automatically determined to include all data points.

figure_titles_without_gmmlist or tuple of str of shape (n_days,), optional

List of figure titles of left subplots, by default None

figure_titles_with_gmmlist or tuple of str of shape (n_days,), optional

List of figure titles of right subplots, by default None

plot_gmm_meansbool, optional

If True, plot GMM mean points on the right subplots, by default False

cmapstr, optional

String of matplolib colormap name, by default “plasma”

savebool, optional

If True, save the output images, by default False

save_pathslist or tuple of str of shape (n_days), optional

List of paths to save the output images, by default None If None, the images will be saved as ‘./GMM_preds_{i + 1}.png’.

Raises

ValueError

When ‘mode’ is not ‘pca’ or ‘umap’.

plot_grn_graph(grns, ridge_cvs, selected_genes, threshold=0.01, save=False, save_paths=None, save_format='png')

Plot gene regulatory networks (GRNs) between each day.

Parameters

grnslist of pd.DataFrame

Gene regulatory networks between each day. The rows and columns are gene names.

ridge_cvslist of RidgeCV objects

RidgeCV objects used to calculate GRNs.

selected_geneslist of str

Gene names to plot GRNs.

thresholdfloat, optional

Threshold to plot edges, by default 0.01 If the absolute value of the edge weight is less than this value, the edge will not be plotted.

savebool, optional

If True, save the output image, by default False

save_pathsstr, optional

Paths to save the output images, by default None

save_formatstr, optional

Format of the output images, by default “png”

plot_interpolation_of_cell_velocity(velocities, mode='pca', color_streams=False, color_points='gmm', cluster_names=None, x_range=None, y_range=None, cmap='gnuplot2', linspace_num=300, save=False, save_path=None)

Plot the interpolation of cell velocities. This mefhod could be depricated in the future because ‘plot_cell_velocity’ method now supports plotting streamlines.

Parameters

velocitiespd.DataFrame

Cell velocities calculated by ‘calculate_cell_velocities’ method.

mode{‘pca’, ‘umap’}, optional

The space to plot cell velocities, by default “pca”

color_streamsbool, optional

If True, color the streamlines by the speed of the cell velocities, by default False

color_points{‘gmm’ or ‘day’}, optional

Color points by GMM clusters or days, by default “gmm”

cluster_nameslist of str of shape (sum of gmm n_components), optional

List of gmm cluster names, by default None Used when ‘color_points’ is ‘gmm’. You need to flatten the list of lists of gmm cluster names before passing it.

x_rangetuple or list of float of shape (2,), optional

Limit of the x-axis, by default None

y_rangetuple or list of float of shape (2,), optional

Limit of the y-axis, by default None

cmapstr, optional

String of matplolib colormap name, by default “gnuplot2”

linspace_numint, optional

Number of points on each axis to interpolate, by default 300 linspace_num * linspace_num points will be interpolated.

savebool, optional

If True, save the output image, by default False

save_pathstr, optional

Path to save the output image, by default None If None, the image will be saved as ‘./interpolation_of_cell_velocity_gmm_clusters.png’

Raises

ValueError

This error is raised in the following cases: - When ‘mode’ is not ‘pca’ or ‘umap’. - When ‘color_points’ is not ‘gmm’ or ‘day’. - When ‘color_points’ is ‘gmm’ and ‘cluster_names’ is None.

plot_pathway_gene_expressions(cluster_names, pathway_names, selected_genes, save=False, save_path=None)

Plot gene expression levels within a pathway.

Parameters

cluster_nameslist of list of str

1st dimension is the number of days, 2nd dimension is the number of gmm components in each day. Can be generaged by ‘generate_cluster_names’ method.

pathway_nameslist of str of shape (n_days,)

List of cluster names included in the pathway. Specify like [‘day0’s cluster name’, ‘day1’s cluster name’, …, ‘dayN’s cluster name’].

selected_geneslist of str

List of gene names whose gene expression changes you want to track. Recommend using about 5 genes.

savebool, optional

If True, save the output image, by default False

save_path_type_, optional

Path to save the output image, by default None If None, the image will be saved as ‘./pathway_gene_expressions.png’

plot_pathway_mean_var(cluster_names, pathway_names, tf_gene_names=None, threshold=1.0, save=False, save_path=None)

Plot mean and variance of gene expression levels within a pathway.

Parameters

cluster_nameslist of list of str

1st dimension is the number of days, 2nd dimension is the number of gmm components in each day. Can be generaged by ‘generate_cluster_names’ method.

pathway_nameslist of str of shape (n_days,)

List of cluster names included in the pathway. Specify like [‘day0’s cluster name’, ‘day1’s cluster name’, …, ‘dayN’s cluster name’].

tf_gene_nameslist of str, optional

List of transcription factor gene names to use, by default None If None, all gene names (self.gene_names) will be used. You can pass on any list of gene names you want to use, not limited to TF genes.

thresholdfloat, optional

Threshold to filter labels, by default 1.0 Only genes with variance greater than this threshold will be plotted its label.

savebool, optional

If True, save the output image, by default False

save_path_type_, optional

Path to save the output image, by default None If None, the image will be saved as ‘./pathway_mean_var.png’

plot_pathway_single_gene_2d(gene_name, mode='pca', col=None, save=False, save_path=None)
plot_pathway_single_gene_3d(gene_name, col=None, save=False, save_path=None)
plot_simple_cell_state_graph(G, layout='normal', order=None, save=False, save_path=None)

Plot the cell state graph with the given graph object in a simple way.

Parameters

Gnx.classes.digraph.DiGraph

Networkx graph object of the cell state graph.

layout{‘normal’, ‘hierarchy’}, optional

The layout of the graph, by default “normal” When ‘normal’, the graph is plotted the same layout as the self.plot_cell_state_graph method. When ‘hierarchy’, the graph is plotted with the day on the x-axis and the cluster on the y-axis.

order{‘weight’, None}, optional

Order of nodes along the y-axis, by default None This parameter is only used when ‘layout’ is ‘hierarchy’. When ‘weight’, the nodes are ordered by the size of the nodes. When None, the nodes are ordered by the cluster number.

savebool, optional

If True, save the output image, by default False

save_pathstr, optional

Path to save the output image, by default None If None, the image will be saved as ‘./simple_cell_state_graph.png’

Raises

ValueError

When ‘layout’ is not ‘normal’ or ‘hierarchy’, or ‘order’ is not None or ‘weight’.

plot_true_and_interpolation_distributions(interpolate_index, mode='pca', n_samples=2000, t=0.5, plot_source_and_target=True, alpha_true=0.5, x_col_name=None, y_col_name=None, x_range=None, y_range=None, save=False, save_path=None)

Compare the true and interpolation distributions by plotting them.

Parameters

interpolate_indexint

Index of the timepoint to interpolate. 1 <= interpolate_index <= n_days - 2

mode{‘pca’, ‘umap’}, optional

The space to plot gene expression levels, by default “pca”

n_samplesint, optional

Number of samples to generate, by default 2000

tfloat, optional

Interpolation ratio, by default 0.5 If you want to interpolate halfway between the source and target timepoints, specify 0.5. Source timepoint is interpolate_index - 1, target timepoint is interpolate_index + 1.

plot_source_and_targetbool, optional

If True, plot the source and target timepoints, by default True

alpha_truefloat, optional

Transparency of the true data, by default 0.5

x_col_namestr, optional

Label of the x-axis, by default None

y_col_name_type_, optional

Label of the y-axis, by default None

x_rangelist or tuple of float of shape (2,), optional

Range of the x-axis, by default None If None, the range will be automatically determined based on the data.

y_rangelist or tuple of float of shape (2,), optional

Range of the y-axis, by default None If None, the range will be automatically determined based on the data.

savebool, optional

If True, save the output image, by default False

save_path_type_, optional

Path to save the output image, by default None

Raises

ValueError

When ‘mode’ is not ‘pca’ or ‘umap’.

plot_waddington_potential(waddington_potential, mode='pca', gene_name=None, save=False, save_path=None)

Plot Waddington potential in 3D space.

Parameters

waddington_potentialnp.ndarray

Waddington potential of each sample. This array should be calculated by ‘calculate_waddington_potential’ method.

mode{‘pca’, ‘umap’}, optional

The space to plot Waddington potential, by default “pca”

gene_namestr, optional

Gene name to color the points, by default None If None, the points will be colored by Waddington potential. If specified, the points will be colored by the expression of the specified gene.

savebool, optional

If True, save the output image, by default False

save_pathstr, optional

Path to save the output image, by default None If None, the image will be saved as ‘./waddington_potential.html’

plot_waddington_potential_surface(waddington_potential, mode='pca', save=False, save_path=None)

Plot Waddington’s landscape in 3D space by using cellmap.

Parameters

waddington_potentialnp.ndarray

Waddington potential of each sample. This array should be calculated by ‘calculate_waddington_potential’ method

mode{‘pca’, ‘umap’}, optional

The space to plot Waddington potential, by default “pca”

savebool, optional

If True, save the output image, by default False

save_pathstr, optional

Path to save the output image, by default None If None, the image will be saved as ‘./wadding_potential_surface.html’

predict_gmm_label(X_item, gmm_model)
predict_gmm_labels(X, gmm_models)
preprocess(pca_n_components, recode_params={}, umi_target_sum=10000.0, pca_random_state=None, pca_other_params={}, apply_recode=True, apply_normalization_log1p=True, apply_normalization_umi=True, select_genes=True, n_select_genes=2000)

Preprocess the input data.

Apply scRECODE, normalize, select highly variable genes, and apply PCA.

Parameters

pca_n_componentsint

Number of components to keep in PCA. Passed to the ‘n_components’ parameter of the PCA class.

recode_paramsdict, optional

Parameters for scRECODE, by default {}

umi_target_sumint or float, optional

Target sum for UMI normalization, by default 1e4

pca_random_stateint, RandomState instance or None, optional

Pass an int for reproducible results, by default None Passed to the ‘random_state’ parameter of the PCA class.

pca_other_paramsdict, optional

Parameters other than ‘n_components’ and ‘random_state’ for PCA, by default {}

apply_recodebool, optional

If True, apply scRECODE, by default True

apply_normalization_log1pbool, optional

If True, apply log1p normalization, by default True

apply_normalization_umibool, optional

If True, apply UMI normalization, by default True

select_genesbool, optional

If True, filter genes and select highly variable genes, by default True

n_select_genesint, optional

Number of highly variable genes to select, by default 2000 Used only when ‘select_genes’ is True.

Returns

list of pd.DataFrame of shape (n_samples, n_components of PCA)

Normalized, filtered, and PCA-transformed data.

sklearn.decomposition.PCA

PCA instance fitted to the input data.

replace_gmm_labels(converter)