Single Cell Preprocessing#
- biokowloon.sc.log2_transform(df) DataFrame#
log2 transform expression values (plus 1)
- Parameters:
df – a dataframe with expression values, cells by genes
- Returns:
a dataframe with log2 transformed values
- biokowloon.sc.log_exp2cpm(exp_df: DataFrame | array, log_base=2, correct=1) DataFrame | array#
Convert log2(CPM + 1) to non-log space values (CPM / TPM)
- Parameters:
exp_df – samples by genes
log_base – the base of log transform
correct – plus 1 for avoiding log transform 0
- Returns:
counts per million (CPM) or transcript per million (TPM)
- biokowloon.sc.non_log2cpm(exp_df, sum_exp=1000000.0) DataFrame#
Normalize gene expression to CPM / TPM for non-log space
- Parameters:
exp_df – gene expression profile in non-log space, sample by gene
sum_exp – sum of gene expression for each sample, default is 1e6
- Returns:
counts per million (CPM) or transcript per million (TPM)
- biokowloon.sc.non_log2log_cpm(input_file_path: str | DataFrame, result_file_path: str | None = None, transpose: bool = True, correct: int = 1)#
Convert non-log expression data to log2(CPM + 1) or log2(TPM + 1)
- Parameters:
input_file_path – non-log space expression file, genes by samples
result_file_path – file path, samples by genes
transpose – if input file is samples by genes, set to False, otherwise set to True
correct – plus 1 for avoiding log transform 0
- Returns:
log2(CPM + 1) or save result to file, samples by genes if transpose is True, otherwise genes by samples