scegot package
Submodules
scegot.scegot module
Module contents
- class scegot.CellStateGraph(G, scegot, threshold=0.05, mode='pca', cluster_names=None, node_ids=None, merge_clusters_by_name=False, x_reverse=False, y_reverse=False, require_parent=False)
Bases:
object- plot_cell_state_graph(layout='normal', y_position='name', cluster_names=None, gene_names=None, gene_pick_num=5, plot_title='Cell State Graph', save=False, save_path=None)
Plot the cell state graph with the given graph object.
Parameters
- layout{‘normal’, ‘hierarchy’}, optional
The layout of the graph, by default “normal”
When ‘normal’, the graph is plotted in PCA or UMAP space.
When ‘hierarchy’, the graph is plotted with the day on the x-axis and the cluster on the y-axis.
- y_positionstr or dict, optional
Determines the y-axis position of nodes when layout is ‘hierarchy’, by default “name”.
‘name’: Sort nodes alphabetically by name.
‘weight’: Sort nodes by their weight.
dict: A dictionary mapping node names to y-axis positions.
This parameter is ignored when layout is ‘normal’.
- cluster_nameslist of list of str
Custom names for the clusters, by default None.
1st dimension is the number of days, 2nd dimension is the number of gmm components in each day. When the attribute
merge_clusters_by_nameis True, clusters to be merged must be given the same new name.If None, the attribute
cluster_namesis used.- gene_nameslist of str, optional
List of gene names to use, by default None If None, all gene names (
self.scegot.gene_names) will be used. You can pass on any list of gene names you want to use, not limited to TF genes.- gene_pick_numint, optional
The number of genes to show in each node and edge, by default 5
- plot_titlestr, optional
Title of the plot, by default “Cell State Graph”
- savebool, optional
If True, save the output image, by default False
- save_pathstr, optional
Path to save the output image, by default None If None, the image will be saved as ‘./cell_state_graph.png’
- plot_simple_cell_state_graph(layout='normal', y_position='name', cluster_names=None, node_weight_annotation=False, edge_weight_annotation=False, save=False, save_path=None)
Plot the cell state graph with the given graph object in a simple way.
Parameters
- layout{‘normal’, ‘hierarchy’}, optional
The layout of the graph, by default “normal”.
When “normal”, the graph is plotted in PCA or UMAP space.
When “hierarchy”, the graph is plotted with the day on the x-axis and the cluster on the y-axis.
- y_positionstr or dict, optional
Determines the y-axis position of nodes when layout is “hierarchy”, by default “name”.
“name”: Sort nodes alphabetically by name.
“weight”: Sort nodes by their weight.
dict: A dictionary mapping node names to y-axis positions.
This parameter is ignored when layout is “normal”.
- cluster_nameslist of list of str, optional
Custom names for the clusters, by default None.
1st dimension is the number of days, 2nd dimension is the number of gmm components in each day. When the attribute
merge_clusters_by_nameis True, clusters to be merged must be given the same new name.If None, the attribute
cluster_namesis used.- node_weight_annotationbool, optional
If True, display the weight of each node, by default False.
- edge_weight_annotationbool, optional
If True, display the weight of each edge, by default False.
- savebool, optional
If True, save the output image, by default False.
- save_pathstr, optional
Path to save the output image, by default None. If None, the image will be saved as ‘./simple_cell_state_graph.png’.
Raises
- ValueError
This error is raised in the following cases: - When ‘layout’ is not ‘normal’ or ‘hierarchy’. - When ‘y_position’ is a string but not ‘name’ or ‘weight’.
- TypeError
When ‘y_position’ is not a string or dict (if layout is ‘hierarchy’).
- reverse_graph(x=False, y=False)
Reverse the graph layout along the specified axes.
Parameters
- xbool, optional
If True, reverse the x-axis of the graph layout, by default False.
- ybool, optional
If True, reverse the y-axis of the graph layout, by default False.
- set_cluster_names(cluster_names)
Set new cluster names for the cell state graph.
Parameters
- cluster_nameslist of list of str
New names for the clusters. 1st dimension is the number of days, 2nd dimension is the number of gmm components in each day. Merged clusters must have the same name when ‘merge_clusters_by_name’ is True.
Returns
- list of list of str
The new cluster names.
- update_cluster_names(cluster_names_map, day=None)
Update cluster names for the cell state graph based on a mapping dictionary.
Parameters
- cluster_names_mapdict
A dictionary mapping old cluster names to new cluster names.
- dayint, optional
The specific day to update cluster names for, by default None. If None, update cluster names for all days.
Returns
- list of list of str
The updated cluster names.
- scegot.integrate_data(input_data_dict, adata_day_key=None, recode_params={}, recode_fit_transform_params={})
Integrate multiple data using iRECODE.
Parameters
- input_data_dictdict
A dictionary where keys are data names and values are input data (list of pd.DataFrame or AnnData).
- adata_day_keystr, optional
Name of the key in AnnData.obs for day names, by default None. Should be specified when values of input_data_dict are AnnData.
- recode_paramsdict, optional
paramaters passed to the screcode.RECODE constructor, by default {}
- recode_fit_transform_paramsdict, optional
paramaters for RECODE.fit_transform(), by default {}
Raises
- ValueError
When ‘X’ is AnnData and ‘adata_day_key’ is not specified.
- TypeError
This error is raised in the following cases:
When input_data_dict is not a dict.
When values of input_data_dict is neither list of pd.DataFrame nor AnnData.
Returns
- (list of pd.DataFrame) or AnnData
Integrated data.
- scegot.is_notebook()
Check if the code is running in a Jupyter notebook or not.
Returns
- bool
True if the code is running in a Jupyter notebook, False otherwise.
- class scegot.scEGOT(X, day_names=None, verbose=True, adata_day_key=None)
Bases:
object- animate_gene_expression(target_gene_name, mode='pca', interpolate_interval=11, n_samples=5000, x_range=None, y_range=None, c_range=None, x_label=None, y_label=None, cmap='gnuplot2', save=False, save_path=None)
Calculate interpolation between all timepoints and create animation colored by gene expression level.
Parameters
- target_gene_namestr
Gene name to plot expression level.
- mode{‘pca’, ‘umap’}, optional
The space to plot gene expression levels, by default “pca”
- interpolate_intervalint, optional
Number of frames to interpolate between two timepoints, by default 11 This is the total number of frames at both timepoints and the number of frames between these. Note that both ends are included.
- n_samplesint, optional
Number of samples to generate, by default 5000
- x_rangelist or tuple of float of shape (2,), optional
Range of the x-axis, by default None
- y_rangelist or tuple of float of shape (2,), optional
Range of the y-axis, by default None
- c_rangelist or tuple of float of shape (2,), optional
Range of the color bar, by default None
- x_labelstr, optional
Label of the x-axis, by default None
- y_labelstr, optional
Label of the y-axis, by default None
- cmapstr, optional
String of the colormap, by default “gnuplot2”
- savebool, optional
If True, save the output image, by default False
- save_pathstr, optional
Path to save the output image, by default None If None, the image will be saved as ‘./interpolate_video.gif’
Raises
- ValueError
When ‘mode’ is not ‘pca’ or ‘umap’.
- animatie_interpolated_distribution(x_range=None, y_range=None, interpolate_interval=11, cmap='gnuplot2', save=False, save_path=None)
Export an animation of the interpolated distribution between GMM models.
Parameters
- x_rangelist or tuple of float of shape (2,), optional
Restrict the X axis range, by default None
- y_rangelist or tuple of float of shape (2,), optional
Restrict the Y axis range, by default None
- interpolate_intervalint, optional
The number of frames to interpolate between two timepoints, by default 11 This is the total number of frames at both timepoints and the number of frames between these. Note that both ends are included.
- cmapstr, optional
String of matplolib colormap name, by default “gnuplot2”
- savebool, optional
If True, save the output animation, by default False
- save_pathstr, optional
Path to save the output animation, by default None If None, the animation will be saved as ‘./cell_state_video.gif’
- apply_umap(n_neighbors, n_components=2, random_state=None, min_dist=0.1, umap_other_params={})
Fit self.X_pca to UMAP and return the transformed data.
Parameters
- n_neighborsfloat
The size of local neighborhood used for manifold approximation. Passed to the ‘n_neighbors’ parameter of the UMAP class.
- n_componentsint, optional
The dimension of the space to embed into, by default 2 Passed to the ‘n_components’ parameter of the UMAP class.
- random_stateint, RandomState instance or None, optional
Fix the random seed for reproducibility, by default None Passed to the ‘random_state’ parameter of the UMAP class.
- min_distfloat, optional
The effective minimum distance between embedded points, by default 0.1 Passed to the ‘min_dist’ parameter of the UMAP class.
- umap_other_paramsdict, optional
Other parameters for UMAP, by default {}
Returns
- list of pd.DataFrame of shape (n_samples, n_components of UMAP)
UMAP-transformed data.
- umap.umap_.UMAP
UMAP instance fitted to the input data.
- bures_wasserstein_distance(m_0, m_1, sigma_0, sigma_1)
- calculate_cell_velocities()
Calculate cell velocities between each day.
Returns
- pd.DataFrame
Cell velocities between each day. The rows are ordered as follows: when the number of days is N and the number of cells in each day is M_1, M_2, …, M_N, [day1_cell1 -> day1_cell2 -> … -> day1_cellM_1 -> day2cell1 -> … -> day(N-1)cellM_N]
- calculate_grns(selected_clusters=None, alpha_range=(-2, 2), cv=3, ridge_cv_fit_intercept=False, ridge_fit_intercept=False)
Calculate gene regulatory networks (GRNs) between each day.
Parameters
- selected_clusterslist of list of int of shape (n_days, 2), optional
Specify the clusters to calculate GRNs, by default None If None, all clusters will be used. The list should be like [[day1’s index, selected cluster number], [day2’s index, selected cluster number], …].
- alpha_rangetuple or list of float of shape (2,), optional
Range of alpha values for Ridge regression, by default (-2, 2)
- cvint, optional
Number of cross-validation folds, by default 3 This parameter is passed to RidgeCV’s ‘cv’ parameter.
- ridge_cv_fit_interceptbool, optional
Whether to calculate the intercept in RidgeCV, by default False This parameter is passed to RidgeCV’s ‘fit_intercept’ parameter.
- ridge_fit_interceptbool, optional
Whether to calculate the intercept in Ridge, by default False This parameter is passed to Ridge’s ‘fit_intercept’ parameter.
Returns
- list of pd.DataFrame
Gene regulatory networks between each day. The rows and columns are gene names. Each element of the list corresponds to the GRN between day i and day i + 1.
- list of RidgeCV objects
RidgeCV objects used to calculate GRNs. Each element of the list corresponds to the RidgeCV object between day i and day i + 1.
- calculate_mut_st(gmm_source, gmm_target, t)
- calculate_normalized_solutions(gmm_models, reg=0.01, numItermax=5000, method='sinkhorn_epsilon_scaling', tau=100000000.0, stopThr=1e-09, sinkhorn_other_params={})
- calculate_solution(gmm_source, gmm_target, reg=0.01, numItermax=5000, method='sinkhorn_epsilon_scaling', tau=100000000.0, stopThr=1e-09, sinkhorn_other_params={})
- calculate_solutions(gmm_models, reg=0.01, numItermax=5000, method='sinkhorn_epsilon_scaling', tau=100000000.0, stopThr=1e-09, sinkhorn_other_params={})
- calculate_waddington_potential(n_neighbors=100, knn_other_params={})
Calculate Waddington potential of each sample.
Parameters
- n_neighborsint, optional
Number of neighbors for rach sample, by default 100 This parameter is passed to ‘kneighbors_graph’ function.
- knn_other_paramsdict, optional
Other parameters for ‘kneighbors_graph’ function, by default {}
Returns
- np.ndarray of shape (sum of n_samples of each day - n_samples of the last day,)
Waddington potential of each sample.
- create_separated_data(data_names, min_cluster_size=2, return_cluster_names=False, cluster_names=None, original_covariances_weight=0)
Create separated data for each data name.
Parameters
- data_nameslist of str
List of prefixes to identify datasets. Cells with names starting with these strings will be extracted into separate scEGOT objects.
- min_cluster_sizeint, optional
Minimum number of cells required to retain a cluster, by default 2. Clusters smaller than this threshold will be removed in the separated objects.
- return_cluster_namesbool, optional
If True, also return the cluster names for each separated object, by default False.
- cluster_nameslist of list of str, optional
Custom names for the clusters in the original object, by default None. 1st dimension is the number of days, 2nd dimension is the number of gmm components in each day. If None, names are automatically generated by generate_cluster_names_with_day() method.
- original_covariances_weightfloat, optional
Weight factor for blending the original GMM covariances with recalculated ones, by default 0. The new covariance is calculated as: new_cov = original_cov * weight + recalculated_cov * (1 - weight). - 0.0: Use only covariances calculated from the separated data. - 1.0: Use only covariances of the original object.
Returns
- dict
A dictionary where keys are data_names and values are the corresponding separated scEGOT objects.
- dict
A dictionary where keys are data_names and values are lists of cluster names. These names correspond to the cluster names of the original object. This will be returned only when ‘return_cluster_names’ is True.
- egot(pi_0, pi_1, mu_0, mu_1, S_0, S_1, reg=0.01, numItermax=5000, method='sinkhorn_epsilon_scaling', tau=100000000.0, stopThr=1e-09, sinkhorn_other_params={})
- fit_gmm(n_components_list, covariance_type='full', max_iter=2000, n_init=10, random_state=None, gmm_other_params={})
- fit_predict_gmm(n_components_list, covariance_type='full', max_iter=2000, n_init=10, random_state=None, gmm_other_params={})
Fit GMM models with each day’s data and predict labels for them.
Parameters
- n_components_listlist of int
Each element corresponds to the number of components of the GMM model for each day. Passed to the ‘n_components’ parameter of the GaussianMixture class.
- covariance_type{‘full’, ‘tied’, ‘diag’, ‘spherical’}, optional
String describing the type of covariances parameters to use, by default “full” Passed to the ‘covariance_type’ parameter of the GaussianMixture class.
- max_iterint, optional
The number of EM iterations to perform, by default 2000 Passed to the ‘max_iter’ parameter of the GaussianMixture class.
- n_initint, optional
The number of initializations to perform, by default 10 Passed to the ‘n_init’ parameter of the GaussianMixture class.
- random_stateint, RandomState instance or None, optional
Controls the random seed given at each GMM model initialization, by default None Passed to the ‘random_state’ parameter of the GaussianMixture class.
- gmm_other_paramsdict, optional
Other parameters for GMM, by default {}
Returns
- list of GaussianMixture instances
The length of the list is the same as the number of days. Each element is a GMM instance fitted to the corresponding day’s data.
- list of np.ndarray
List of GMM labels. Each element is the predicted labels for the corresponding day’s data.
- gaussian_mixture_density(mu, sigma, alpha, x)
- generate_cluster_names_with_day(cluster_names=None)
- get_gaussian_map(m_0, m_1, sigma_0, sigma_1, x)
- get_gmm_means()
- get_positive_gmm_mean_gene_values_per_cluster(gmm_means, cluster_names=None)
- make_cell_state_graph(cluster_names, mode='pca', threshold=0.05)
Warning
make_cell_state_graph()was deprecated in version 0.3.0 and will be removed in future versions. Usemake_cell_state_graph_object()instead.Compute cell state graph and build a networkx graph object.
Parameters
- cluster_names2D list of str
1st dimension is the number of days, 2nd dimension is the number of gmm components in each day. Can be generaged by ‘generate_cluster_names’ method.
- mode{‘pca’, ‘umap’}, optional
The space to build the cell state graph, by default “pca”
- thresholdfloat, optional
Threshold to filter edges, by default 0.05 Only edges with edge_weights greater than this threshold will be included.
Returns
- nx.classes.digraph.DiGraph
Networkx graph object of the cell state graph
Raises
- ValueError
When ‘mode’ is not ‘pca’ or ‘umap’.
- make_cell_state_graph_object(cluster_names=None, mode='pca', threshold=0.05, merge_clusters_by_name=False, x_reverse=False, y_reverse=False, require_parent=False)
Compute cell state graph and build a
CellStateGraphobject.Parameters
- cluster_names2D list of str, optional
Cluster names for each GMM cluster in each day. 1st dimension is the number of days, 2nd dimension is the number of gmm components in each day.
If merge_clusters_by_name is True, clusters with the same name will be merged.
If None, generated by
generate_cluster_names_with_day()method.- mode{‘pca’, ‘umap’}, optional
The space to build the cell state graph, by default ‘pca’
- thresholdfloat, optional
Threshold to filter edges, by default 0.05 Only edges with edge_weights greater than this threshold will be included.
- merge_clusters_by_namebool, optional
If True, clusters with the same name will be merged, by default False
- x_reversebool, optional
If True, reverse the X axis direction, by default False
- y_reversebool, optional
If True, reverse the Y axis direction, by default False
- require_parentbool, optional
If True, ensure that each cluster in the target day has at least one incoming edge from the source day, by default False
Returns
- scegot.CellStateGraph
scegot.CellStateGraphobject of the cell state graph
Raises
- ValueError
This error is raised in the following cases:
When ‘mode’ is not ‘pca’ or ‘umap’.
When the length of ‘cluster_names’ is not the same as the number of days.
When the length of the second dimension of ‘cluster_names’ is not the same as the number of GMM components in each day.
- make_interpolation_data(gmm_source, gmm_target, t, columns=None, n_samples=2000, seed=0)
Make interpolation data between two timepoints.
Parameters
- gmm_sourceGaussianMixture
GMM model of the source timepoint.
- gmm_targetGaussianMixture
GMM model of the target timepoint.
- tfloat
Interpolation ratio. 0 <= t <= 1. 0 is the source timepoint, 1 is the target timepoint. If you specify 0.5, the data will be interpolated halfway between the source and target timepoints.
- columnslist of str, optional
Columns names of the output data, by default None
- n_samplesint, optional
Number of samples to generate, by default 2000
- seedint, optional
Random seed, by default 0
Returns
- pd.DataFrame
Interpolated data between two timepoints.
- merge_cluster_names_by_pathway(last_day_cluster_names, n_merge_iter=None, merge_method='pattern', threshold=0.05, n_clusters_list=None, **kmeans_kwargs)
Merge cluster names based on cell state graph pathways.
Parameters
- last_day_cluster_nameslist of str
Cluster names for the last day. Clusters with the same name will be merged.
The length of the list should be equal to the number of clusters in the last day.
- n_merge_iterint, optional
Number of preceding days to trace back and merge cluster names, starting from the last day, by default (the number of days - 1).
Must be an integer in the range from 1 to (the number of days - 1).
- merge_method{‘pattern’, ‘kmeans’}, optional
Method to merge nodes, by default ‘pattern’.
‘pattern’: Merges nodes that share the same connection pattern to the next day’s nodes.
‘kmeans’: Merges nodes based on the edge weights to the next day’s nodes using K-Means.
- thresholdfloat, optional
Threshold to filter edges, by default 0.05. Edges with weights below this value are ignored.
This parameter is used only when merge_method is ‘pattern’.
- n_clusters_listlist of int, optional
List specifying the number of merged clusters for each day.
The length of the list must equal to the number of days or (the number of days - 1). If None, defaults to the minimum of (original cluster count, 4) for each day.
This parameter is used only when merge_method is ‘kmeans’.
- **kmeans_kwargsdict
Arbitrary keyword arguments passed to sklearn.cluster.KMeans.
This parameter is used only when merge_method is ‘kmeans’.
Returns
- list of list of str
Merged cluster names for each day.
Raises
- ValueError
This error is raised in the following cases:
When ‘n_merge_iter’ is not an integer within the valid range (1 to number of days - 1).
When ‘merge_method’ is not one of ‘pattern’ or ‘kmeans’.
- plot_cell_state_graph(G, cluster_names, tf_gene_names=None, tf_gene_pick_num=5, save=False, save_path=None)
Warning
scEGOT.plot_cell_state_graph()was deprecated in version 0.3.0 and will be removed in future versions. UseCellStateGraph.plot_cell_state_graph()instead.Plot the cell state graph with the given graph object.
Parameters
- Gnx.classes.digraph.DiGraph
Networkx graph object of the cell state graph.
- cluster_nameslist of list of str
1st dimension is the number of days, 2nd dimension is the number of gmm components of each day. Can be generaged by ‘generate_cluster_names’ method.
- tf_gene_nameslist of str, optional
List of transcription factor gene names to use, by default None If None, all gene names (self.gene_names) will be used. You can pass on any list of gene names you want to use, not limited to TF genes.
- tf_gene_pick_numint, optional
The number of genes to show in each node and edge, by default 5
- savebool, optional
If True, save the output image, by default False
- save_path_type_, optional
Path to save the output image, by default None If None, the image will be saved as ‘./cell_state_graph.png’
- plot_cell_velocity(velocities, mode='pca', color_points='gmm', size_points=30, cmap='tab20', cluster_names=None, save=False, save_path=None)
Plot cell velocities in 2D space.
Parameters
- velocitiespd.DataFrame
Cell velocities calculated by ‘calculate_cell_velocities’ method.
- mode{‘pca’ or ‘umap’}, optional
The space to plot cell velocities, by default “pca”
- color_points{‘gmm’ or ‘day’}, optional
Color points by GMM clusters or days, by default “gmm”
- size_pointsint, optional
Size of points, by default 30
- cmapstr, optional
String of matplolib colormap name, by default “tab20”
- cluster_nameslist of str of shape (sum of gmm components), optional
List of gmm cluster names, by default None Used when ‘color_points’ is ‘gmm’. You need to flatten the list of lists of gmm cluster names before passing it.
- savebool, optional
If True, save the output image, by default False
- save_pathstr, optional
Path to save the output image, by default None If None, the image will be saved as ‘./cell_velocity.png’
Raises
- ValueError
This error is raised in the following cases: - When ‘mode’ is not ‘pca’ or ‘umap’. - When ‘color_points’ is not ‘gmm’ or ‘day’. - When ‘color_points’ is ‘gmm’ and ‘cluster_names’ is None.
- plot_fold_change(cluster_names, cluster1, cluster2, tf_gene_names=None, threshold=1.0, save=False, save_path=None)
Plot fold change between two clusters.
Parameters
- cluster_nameslist of list of str
1st dimension is the number of days, 2nd dimension is the number of gmm components in each day. Can be generaged by ‘generate_cluster_names’ method.
- cluster1str
Cluster name of denominator.
- cluster2str
Cluster name of numerator.
- tf_gene_nameslist of str, optional
List of transcription factor gene names to use, by default None If None, all gene names (self.gene_names) will be used. You can pass on any list of gene names you want to use, not limited to TF genes.
- thresholdfloat, optional
Threshold to filter labels, by default 1.0 Only genes with fold change greater than this threshold will be plotted its label.
- savebool, optional
If True, save the output image, by default False
- save_pathstr, optional
Path to save the output image, by default None If None, the image will be saved as ‘./fold_change.png’
- plot_gene_expression_2d(gene_name, mode='pca', col=None, save=False, save_path=None)
Plot gene expression levels in 2D space.
Parameters
- gene_namestr
Gene name to plot expression level.
- mode{‘pca’, ‘umap’}, optional
The space to plot gene expression levels, by default “pca”
- collist or tuple of str of shape (2,), optional
X and Y axis labels, by default None If None, the first two columns of the input data will be used.
- savebool, optional
If True, save the output image, by default False
- save_pathstr, optional
Path to save the output image, by default None If None, the image will be saved as ‘./pathway_single_gene_2d.png’
Raises
- ValueError
When ‘mode’ is not ‘pca’ or ‘umap’.
- plot_gene_expression_3d(gene_name, col=None, save=False, save_path=None)
Plot gene expression levels in 3D space.
Parameters
- gene_namestr
Gene name to plot expression level.
- collist or tuple of str of shape (2,), optional
X, Y, and Z axis labels, by default None If None, the first three columns of the input data will be used.
- savebool, optional
If True, save the output image, by default False
- save_pathstr, optional
Path to save the output image, by default None If None, the image will be saved as ‘./pathway_single_gene_3d.html’
- plot_gmm_predictions(mode='pca', figure_labels=None, x_range=None, y_range=None, figure_titles_without_gmm=None, figure_titles_with_gmm=None, plot_gmm_means=False, cmap='plasma', save=False, save_paths=None)
Plot GMM predictions. Output images for the number of days. Each image contains two subplots: left one is in one color and right one is colored by GMM labels.
Parameters
- mode{‘pca’, ‘umap’}, optional
The space to plot the GMM predictions, by default “pca”
- figure_labelslist or tuple of str of shape (2,), optional
X and Y axis labels, by default None If None, the first two columns of the input data will be used.
- x_rangelist or tuple of float of shape (2,), optional
Restrict the X axis range, by default None If None, the range will be automatically determined to include all data points.
- y_rangelist or tuple of float of shape (2,), optional
Restrict the Y axis range, by default None If None, the range will be automatically determined to include all data points.
- figure_titles_without_gmmlist or tuple of str of shape (n_days,), optional
List of figure titles of left subplots, by default None
- figure_titles_with_gmmlist or tuple of str of shape (n_days,), optional
List of figure titles of right subplots, by default None
- plot_gmm_meansbool, optional
If True, plot GMM mean points on the right subplots, by default False
- cmapstr, optional
String of matplolib colormap name, by default “plasma”
- savebool, optional
If True, save the output images, by default False
- save_pathslist or tuple of str of shape (n_days), optional
List of paths to save the output images, by default None If None, the images will be saved as ‘./GMM_preds_{i + 1}.png’.
Raises
- ValueError
When ‘mode’ is not ‘pca’ or ‘umap’.
- plot_grn_graph(grns, ridge_cvs, selected_genes, threshold=0.01, save=False, save_paths=None, save_format='png')
Plot gene regulatory networks (GRNs) between each day.
Parameters
- grnslist of pd.DataFrame
Gene regulatory networks between each day. The rows and columns are gene names.
- ridge_cvslist of RidgeCV objects
RidgeCV objects used to calculate GRNs.
- selected_geneslist of str
Gene names to plot GRNs.
- thresholdfloat, optional
Threshold to plot edges, by default 0.01 If the absolute value of the edge weight is less than this value, the edge will not be plotted.
- savebool, optional
If True, save the output image, by default False
- save_pathsstr, optional
Paths to save the output images, by default None
- save_formatstr, optional
Format of the output images, by default “png”
- plot_interpolation_of_cell_velocity(velocities, mode='pca', color_streams=False, color_points='gmm', cluster_names=None, x_range=None, y_range=None, cmap='gnuplot2', linspace_num=300, save=False, save_path=None)
Warning
plot_interpolation_of_cell_velocity()was deprecated in version 0.3.0 and will be removed in future versions. Useplot_cell_velocity()instead.Parameters
- velocitiespd.DataFrame
Cell velocities calculated by ‘calculate_cell_velocities’ method.
- mode{‘pca’, ‘umap’}, optional
The space to plot cell velocities, by default “pca”
- color_streamsbool, optional
If True, color the streamlines by the speed of the cell velocities, by default False
- color_points{‘gmm’ or ‘day’}, optional
Color points by GMM clusters or days, by default “gmm”
- cluster_nameslist of str of shape (sum of gmm n_components), optional
List of gmm cluster names, by default None Used when ‘color_points’ is ‘gmm’. You need to flatten the list of lists of gmm cluster names before passing it.
- x_rangetuple or list of float of shape (2,), optional
Limit of the x-axis, by default None
- y_rangetuple or list of float of shape (2,), optional
Limit of the y-axis, by default None
- cmapstr, optional
String of matplolib colormap name, by default “gnuplot2”
- linspace_numint, optional
Number of points on each axis to interpolate, by default 300 linspace_num * linspace_num points will be interpolated.
- savebool, optional
If True, save the output image, by default False
- save_pathstr, optional
Path to save the output image, by default None If None, the image will be saved as ‘./interpolation_of_cell_velocity_gmm_clusters.png’
Raises
- ValueError
This error is raised in the following cases: - When ‘mode’ is not ‘pca’ or ‘umap’. - When ‘color_points’ is not ‘gmm’ or ‘day’. - When ‘color_points’ is ‘gmm’ and ‘cluster_names’ is None.
- plot_pathway_gene_expressions(cluster_names, pathway_names, selected_genes, save=False, save_path=None)
Plot gene expression levels within a pathway.
Parameters
- cluster_nameslist of list of str
1st dimension is the number of days, 2nd dimension is the number of gmm components in each day. Can be generaged by ‘generate_cluster_names’ method.
- pathway_nameslist of str of shape (n_days,)
List of cluster names included in the pathway. Specify like [‘day0’s cluster name’, ‘day1’s cluster name’, …, ‘dayN’s cluster name’].
- selected_geneslist of str
List of gene names whose gene expression changes you want to track. Recommend using about 5 genes.
- savebool, optional
If True, save the output image, by default False
- save_pathstr, optional
Path to save the output image, by default None If None, the image will be saved as ‘./pathway_gene_expressions.png’
- plot_pathway_mean_var(cluster_names, pathway_names, tf_gene_names=None, threshold=1.0, save=False, save_path=None)
Plot mean and variance of gene expression levels within a pathway.
Parameters
- cluster_nameslist of list of str
1st dimension is the number of days, 2nd dimension is the number of gmm components in each day. Can be generaged by ‘generate_cluster_names’ method.
- pathway_nameslist of str of shape (n_days,)
List of cluster names included in the pathway. Specify like [‘day0’s cluster name’, ‘day1’s cluster name’, …, ‘dayN’s cluster name’].
- tf_gene_nameslist of str, optional
List of transcription factor gene names to use, by default None If None, all gene names (self.gene_names) will be used. You can pass on any list of gene names you want to use, not limited to TF genes.
- thresholdfloat, optional
Threshold to filter labels, by default 1.0 Only genes with variance greater than this threshold will be plotted its label.
- savebool, optional
If True, save the output image, by default False
- save_pathstr, optional
Path to save the output image, by default None If None, the image will be saved as ‘./pathway_mean_var.png’
- plot_pathway_single_gene_2d(gene_name, mode='pca', col=None, save=False, save_path=None)
- plot_pathway_single_gene_3d(gene_name, col=None, save=False, save_path=None)
- plot_simple_cell_state_graph(G, layout='normal', order=None, save=False, save_path=None)
Warning
scEGOT.plot_simple_cell_state_graph()was deprecated in version 0.3.0 and will be removed in future versions. UseCellStateGraph.plot_simple_cell_state_graph()instead.Plot the cell state graph with the given graph object in a simple way.
Parameters
- Gnx.classes.digraph.DiGraph
Networkx graph object of the cell state graph.
- layout{‘normal’, ‘hierarchy’}, optional
The layout of the graph, by default “normal” When ‘normal’, the graph is plotted the same layout as the self.plot_cell_state_graph method. When ‘hierarchy’, the graph is plotted with the day on the x-axis and the cluster on the y-axis.
- order{‘weight’, None}, optional
Order of nodes along the y-axis, by default None This parameter is only used when ‘layout’ is ‘hierarchy’. When ‘weight’, the nodes are ordered by the size of the nodes. When None, the nodes are ordered by the cluster number.
- savebool, optional
If True, save the output image, by default False
- save_pathstr, optional
Path to save the output image, by default None If None, the image will be saved as ‘./simple_cell_state_graph.png’
Raises
- ValueError
When ‘layout’ is not ‘normal’ or ‘hierarchy’, or ‘order’ is not None or ‘weight’.
- plot_true_and_interpolation_distributions(interpolate_index, mode='pca', n_samples=2000, t=0.5, plot_source_and_target=True, alpha_true=0.5, x_col_name=None, y_col_name=None, x_range=None, y_range=None, save=False, save_path=None)
Compare the true and interpolation distributions by plotting them.
Parameters
- interpolate_indexint
Index of the timepoint to interpolate. 1 <= interpolate_index <= n_days - 2
- mode{‘pca’, ‘umap’}, optional
The space to plot gene expression levels, by default “pca”
- n_samplesint, optional
Number of samples to generate, by default 2000
- tfloat, optional
Interpolation ratio, by default 0.5 If you want to interpolate halfway between the source and target timepoints, specify 0.5. Source timepoint is interpolate_index - 1, target timepoint is interpolate_index + 1.
- plot_source_and_targetbool, optional
If True, plot the source and target timepoints, by default True
- alpha_truefloat, optional
Transparency of the true data, by default 0.5
- x_col_namestr, optional
Label of the x-axis, by default None
- y_col_namestr, optional
Label of the y-axis, by default None
- x_rangelist or tuple of float of shape (2,), optional
Range of the x-axis, by default None If None, the range will be automatically determined based on the data.
- y_rangelist or tuple of float of shape (2,), optional
Range of the y-axis, by default None If None, the range will be automatically determined based on the data.
- savebool, optional
If True, save the output image, by default False
- save_pathstr, optional
Path to save the output image, by default None
Raises
- ValueError
When ‘mode’ is not ‘pca’ or ‘umap’.
- plot_waddington_potential(waddington_potential, mode='pca', gene_name=None, save=False, save_path=None)
Plot Waddington potential in 3D space.
Parameters
- waddington_potentialnp.ndarray
Waddington potential of each sample. This array should be calculated by ‘calculate_waddington_potential’ method.
- mode{‘pca’, ‘umap’}, optional
The space to plot Waddington potential, by default “pca”
- gene_namestr, optional
Gene name to color the points, by default None If None, the points will be colored by Waddington potential. If specified, the points will be colored by the expression of the specified gene.
- savebool, optional
If True, save the output image, by default False
- save_pathstr, optional
Path to save the output image, by default None If None, the image will be saved as ‘./waddington_potential.html’
- plot_waddington_potential_surface(waddington_potential, mode='pca', save=False, save_path=None)
Plot Waddington’s landscape in 3D space by using cellmap.
Parameters
- waddington_potentialnp.ndarray
Waddington potential of each sample. This array should be calculated by ‘calculate_waddington_potential’ method
- mode{‘pca’, ‘umap’}, optional
The space to plot Waddington potential, by default “pca”
- savebool, optional
If True, save the output image, by default False
- save_pathstr, optional
Path to save the output image, by default None If None, the image will be saved as ‘./wadding_potential_surface.html’
- predict_gmm_label(X_item, gmm_model)
- predict_gmm_labels(X, gmm_models)
- preprocess(pca_n_components, recode_params={}, umi_target_sum=10000.0, pca_random_state=None, pca_other_params={}, apply_recode=True, apply_normalization_log1p=True, apply_normalization_umi=True, select_genes=True, n_select_genes=2000, hvg_method='dispersion')
Preprocess the input data.
Apply scRECODE, normalize, select highly variable genes, and apply PCA.
Parameters
- pca_n_componentsint
Number of components to keep in PCA. Passed to the ‘n_components’ parameter of the PCA class.
- recode_paramsdict, optional
Parameters for scRECODE, by default {}
- umi_target_sumint or float, optional
Target sum for UMI normalization, by default 1e4
- pca_random_stateint, RandomState instance or None, optional
Pass an int for reproducible results, by default None Passed to the ‘random_state’ parameter of the PCA class.
- pca_other_paramsdict, optional
Parameters other than ‘n_components’ and ‘random_state’ for PCA, by default {}
- apply_recodebool, optional
If True, apply scRECODE, by default True
- apply_normalization_log1pbool, optional
If True, apply log1p normalization, by default True
- apply_normalization_umibool, optional
If True, apply UMI normalization, by default True
- select_genesbool, optional
If True, filter genes and select highly variable genes, by default True
- n_select_genesint, optional
Number of highly variable genes to select, by default 2000 Used only when ‘select_genes’ is True.
- hvg_method{‘dispersion’, ‘RECODE’}, optional
Method to select highly variable genes, by default ‘dispersion’ * ‘dispersion’: select genes based on dispersion. * ‘RECODE’: select genes based on scRECODE.
Raises
- ValueError
If ‘hvg_method’ is not ‘dispersion’ or ‘RECODE’.
Returns
- list of pd.DataFrame of shape (n_samples, n_components of PCA)
Normalized, filtered, and PCA-transformed data.
- sklearn.decomposition.PCA
PCA instance fitted to the input data.
- replace_gmm_labels(converter)