Quick Start

In [1]:

Copied!

import cafe

cafe.logger.setLevel("INFO")
cafe.settings.backend = "python_function" # python_function, conda, cafe_docker
cafe.settings.data_dir = "/root/PyCode/scRNA/data" # NOTE: set data directory here, or use environment variable CAFE_DATA_DIR
import cafe

cafe.logger.setLevel("INFO")
cafe.settings.backend = "python_function" # python_function, conda, cafe_docker
cafe.settings.data_dir = "/root/PyCode/scRNA/data" # NOTE: set data directory here, or use environment variable CAFE_DATA_DIR

INFO                                                                                                                                                                                                    
             ██████╗ █████╗ ███████╗███████╗                                                                                                                                                            
           ██╔════╝██╔══██╗██╔════╝██╔════╝                                                                                                                                                             
           ██║     ███████║█████╗  █████╗                                                                                                                                                               
           ██║     ██╔══██║██╔══╝  ██╔══╝                                                                                                                                                               
           ╚██████╗██║  ██║██║     ███████╗                                                                                                                                                             
             ╚═════╝╚═╝  ╚═╝╚═╝     ╚══════╝                                                                                                                                                            
                                                                                                                                                                                                        
INFO      Version: 0.1.3

Read FateAnnData¶

In [2]:

Copied!

fadata = cafe.data.read_pancreas() # pancrease gloden dataset from scvelo.
fadata
fadata = cafe.data.read_pancreas() # pancrease gloden dataset from scvelo.
fadata

Out[2]:

AnnData object with n_obs × n_vars = 3696 × 27998
    obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score', 'raw_index'
    var: 'highly_variable_genes'
    uns: 'clusters_coarse_colors', 'clusters_colors', 'day_colors', 'neighbors', 'pca', 'id', 'cafe', 'filename'
    obsm: 'X_pca', 'X_umap'
    layers: 'spliced', 'unspliced', 'expression', 'counts'
    obsp: 'distances', 'connectivities'

Infer trajectory with scvelo/paga¶

In [3]:

Copied!





# scvelo method 
method_name="scvelo"
method = cafe.method.FateMethod(method_name=method_name)
method.infer_trajectory(fadata, id=method_name)
# scvelo method 
method_name="scvelo"
method = cafe.method.FateMethod(method_name=method_name)
method.infer_trajectory(fadata, id=method_name)

INFO     |- method backend loaded: FunctionBackend:'scvelo'                                                                                                                                             
Filtered out 20801 genes that are detected 20 counts (shared).
Normalized count data: X, spliced, unspliced.
Extracted 2000 highly variable genes.
Logarithmized X.
computing neighbors
    finished (0:00:05) --> added 
    'distances' and 'connectivities', weighted adjacency matrices (adata.obsp)
computing moments based on connectivities
    finished (0:00:00) --> added 
    'Ms' and 'Mu', moments of un/spliced abundances (adata.layers)
computing velocities
    finished (-1:59:59) --> added 
    'velocity', velocity vectors for each individual cell (adata.layers)
computing velocity graph (using 1/16 cores)

  0%|          | 0/3696 [00:00<?, ?cells/s]

    finished (0:00:07) --> added 
    'velocity_graph', sparse matrix with cosine correlations (adata.uns)
WARNING  |- Both milestone_percentages and progressions are given, will only use progressions                                                                                                           
computing velocity embedding
    finished (0:00:00) --> added
    'velocity_umap', embedded velocity vectors (adata.obsm)
WARNING: The neighbor graph has an unexpected format (e.g. computed outside scvelo) 
or is corrupted (e.g. due to subsetting). Consider recomputing with `pp.neighbors`.
computing terminal states
    identified 2 regions of root cells and 1 region of end points .
    finished (0:00:00) --> added
    'root_cells', root cells of Markov diffusion process (adata.obs)
    'end_points', end points of Markov diffusion process (adata.obs)
running PAGA using priors: ['velocity_pseudotime']
    finished (0:00:00) --> added
    'paga/connectivities', connectivities adjacency (adata.uns)
    'paga/connectivities_tree', connectivities subtree (adata.uns)
    'paga/transitions_confidence', velocity transitions (adata.uns)
WARNING  |- Both milestone_percentages and progressions are given, will only use progressions                                                                                                           
INFO     |- method infer trajectory successfully: FunctionBackend:'scvelo'

In [4]:

Copied!





# palantir method
method_name="palantir"
method = cafe.method.FateMethod(method_name=method_name)
method.infer_trajectory(fadata, id=method_name)
# palantir method
method_name="palantir"
method = cafe.method.FateMethod(method_name=method_name)
method.infer_trajectory(fadata, id=method_name)

INFO     |- method backend loaded: FunctionBackend:'palantir'                                                                                                                                           
repreprocess finish
Sampling and flocking waypoints...
Time for determining waypoints: 0.005299703280131022 minutes
Determining pseudotime...
Shortest path distances using 30-nearest neighbor graph...
Time for shortest paths: 0.3544146696726481 minutes
Iteratively refining the pseudotime...
Correlation at iteration 1: 1.0000
Entropy and branch probabilities...
Markov chain construction...
Identification of terminal states...
Computing fundamental matrix and absorption probabilities...
Project results to all cells...
palantir execute finish
INFO     |- method infer trajectory successfully: FunctionBackend:'palantir'

Trajectory visualization¶

Evaluate trajectory result by visualization. scVelo and Palantir trajectory can be compared with the reference(ref) trajectory as golden standard. scVelo is better.

In [5]:

Copied!

model_name_list = fadata.get_all_model_name()
cafe.plot.plot_trajectory(fadata, model_name=model_name_list)
model_name_list = fadata.get_all_model_name()
cafe.plot.plot_trajectory(fadata, model_name=model_name_list)

Out[5]:

array([[<Axes: title={'center': 'ref(clusters)'}, xlabel='X_umap1', ylabel='X_umap2'>,
        <Axes: title={'center': 'scvelo(clusters)'}, xlabel='X_umap1', ylabel='X_umap2'>,
        <Axes: title={'center': 'palantir(clusters)'}, xlabel='X_umap1', ylabel='X_umap2'>]],
      dtype=object)

No description has been provided for this image

In [6]:

Copied!

# TODO: cellxgene-cafe web application for interactive visualization and exploration of trajectory inference results.
# fadata.launch_cellxgene()
# TODO: cellxgene-cafe web application for interactive visualization and exploration of trajectory inference results.
# fadata.launch_cellxgene()

Metric evaluation¶

Evaluate trajectory result by metrics. Pseudotime metric("pseudotime_correlation") is closely between scVelo and Palantir. Other metrics(him, F1_branches, correlation) show that scvelo is better.

In [7]:

Copied!

metrics = ["pseudotime_correlation", "him", "F1_branches", "correlation"] # "velocity_cbdir" need recalculate neighbors.
metric_df = cafe.metric.calculate_metrics(fadata, now_models=model_name_list[1:], metrics=metrics)
metric_df
metrics = ["pseudotime_correlation", "him", "F1_branches", "correlation"] # "velocity_cbdir" need recalculate neighbors.
metric_df = cafe.metric.calculate_metrics(fadata, now_models=model_name_list[1:], metrics=metrics)
metric_df

Out[7]:

	pseudotime_correlation	him	F1_branches	correlation
scvelo	0.99	0.96	1.00	0.88
palantir	0.98	0.27	0.19	0.67

Print trajectory data¶

In [8]:

Copied!

fadata.print_trajectory_data()
fadata.print_trajectory_data()

cafe
├── prior_information
│   ├── cluster → str: 'clusters'
│   ├── basis → str: 'X_umap'
│   └── start_cell → str: 'cell_1103'
├── model_name → str: 'palantir'
├── trajectory_history_dict
│   ├── ref
│   │   ├── milestone_wrapper → MilestoneWrapper: <cafe.data.fate_milestone_wrapper.MilestoneWrapper
│   │   ├── raw_wrapper_dict
│   │   ├── trajectory_embedding
│   │   │   └── X_umap
│   │   │       ├── wp_segments → DataFrame shape=(212, 9)
│   │   │       └── milestone_positions → DataFrame shape=(14, 9)
│   │   ├── waypoint_wrapper → WaypointWrapper: <cafe.data.fate_waypoint_wrapper.WaypointWrapper o
│   │   └── pseudotime_from_Ductal → list len=3696
│   ├── scvelo
│   │   ├── raw_wrapper_dict
│   │   │   ├── wrapper_type → str: 'velocity'
│   │   │   ├── velocity → ndarray shape=(3696, 2000)
│   │   │   ├── velocity_graph → csr_matrix shape=(3696, 3696)
│   │   │   ├── velocity_graph_neg → csr_matrix shape=(3696, 3696)
│   │   │   ├── neighbors
│   │   │   │   ├── distances → csr_matrix shape=(3696, 3696)
│   │   │   │   └── connectivities → csr_matrix shape=(3696, 3696)
│   │   │   ├── obs_index → Index shape=(3696,)
│   │   │   ├── var_index → Index shape=(2000,)
│   │   │   └── velocity_umap → ndarray shape=(3696, 2)
│   │   ├── milestone_wrapper → MilestoneWrapper: <cafe.data.fate_milestone_wrapper.MilestoneWrapper
│   │   ├── trajectory_embedding
│   │   │   └── X_umap
│   │   │       ├── wp_segments → DataFrame shape=(210, 9)
│   │   │       └── milestone_positions → DataFrame shape=(14, 9)
│   │   ├── waypoint_wrapper → WaypointWrapper: <cafe.data.fate_waypoint_wrapper.WaypointWrapper o
│   │   └── pseudotime_from_Ductal → list len=3696
│   └── palantir
│       ├── raw_wrapper_dict
│       │   ├── pseudotime → Series shape=(3696,)
│       │   └── wrapper_type → str: 'linear'
│       ├── milestone_wrapper → MilestoneWrapper: <cafe.data.fate_milestone_wrapper.MilestoneWrapper
│       ├── trajectory_embedding
│       │   └── X_umap
│       │       ├── wp_segments → DataFrame shape=(201, 9)
│       │       └── milestone_positions → DataFrame shape=(2, 9)
│       ├── waypoint_wrapper → WaypointWrapper: <cafe.data.fate_waypoint_wrapper.WaypointWrapper o
│       └── pseudotime_from_milestone_begin → list len=3696
└── wrapper_type → str: 'linear'

Write h5ad file¶

In [9]:

Copied!

fadata.write_h5ad("pancrease_fadata.h5ad")
fadata.write_h5ad("pancrease_fadata.h5ad")