Single Cell Preprocessing#

biokowloon.sc.log2_transform(df) DataFrame#

log2 transform expression values (plus 1)

Parameters:

df – a dataframe with expression values, cells by genes

Returns:

a dataframe with log2 transformed values

biokowloon.sc.log_exp2cpm(exp_df: DataFrame | array, log_base=2, correct=1) DataFrame | array#

Convert log2(CPM + 1) to non-log space values (CPM / TPM)

Parameters:
  • exp_df – samples by genes

  • log_base – the base of log transform

  • correct – plus 1 for avoiding log transform 0

Returns:

counts per million (CPM) or transcript per million (TPM)

biokowloon.sc.non_log2cpm(exp_df, sum_exp=1000000.0) DataFrame#

Normalize gene expression to CPM / TPM for non-log space

Parameters:
  • exp_df – gene expression profile in non-log space, sample by gene

  • sum_exp – sum of gene expression for each sample, default is 1e6

Returns:

counts per million (CPM) or transcript per million (TPM)

biokowloon.sc.non_log2log_cpm(input_file_path: str | DataFrame, result_file_path: str | None = None, transpose: bool = True, correct: int = 1)#

Convert non-log expression data to log2(CPM + 1) or log2(TPM + 1)

Parameters:
  • input_file_path – non-log space expression file, genes by samples

  • result_file_path – file path, samples by genes

  • transpose – if input file is samples by genes, set to False, otherwise set to True

  • correct – plus 1 for avoiding log transform 0

Returns:

log2(CPM + 1) or save result to file, samples by genes if transpose is True, otherwise genes by samples