"Lineage EM Algorithm for Inferring Latent States from Cellular Lineage Trees"
A population of genetically identical cells is phenotypically heterogeneous. The heterogeneity is partially inherited over generations and can work as a bet-hedging strategy of the survival of the population under fluctuating environments. A typical instance of the bet-hedging strategy is the bacterial persistence. To understand such strategies, we need to identify the phenotypes of each cell and its inheritance. For this purpose, recent advancements in single-cell analysis and microfluidic devices offer us useful lineage data, though such data accommodate but do not explicitly show the phenotypic information of each cell. Several studies have attempted to overcome the difficulty by inferring the phenotypes from lineage data via latent-variable estimation. However, we must correct the bias caused by the growth of the population, which we call the survivorship bias, in the estimation. In this work, we characterize the survivorship bias and establish a correction method of the bias. Then, we propose an expectation-maximization (EM) type latent variable estimation, which we call Lineage EM algorithm (LEM). LEM is bias-free and applicable to various kinds of lineage data to characterize the phenotype of the cells. Finally, we apply LEM to a synthetic and a real lineage tree of E. coli and validate the performance.