Part I – Summary
Authors aimed to generate a platform that would allow for the identification of the genetic basis of variation in cell behavior in iPSC lines from healthy donors. They looked a panel of 110 human iPSC lines from HipSci.
They analyzed a set of phenotypes: cell and nucleus morphology, intensity of DAPI staining and EdU incorporation, and number of single and clumped cells and number of cells per clump. They seeded the cells on three different concentrations of fibronectin. To distinguish between extrinsic (environment, ECM), intrinsic (genetic), technical, or biological components, they applied a dimensionality reduction approach, Probabilistic Estimation of Expression Residuals (PEER), to the data generated from quantifying these phenotypes.
They identified 9 “synthetic PEER factors” that explain most of the variation in the data. The effect of fibronectin concentration was apparent in one factor, factor 1 (the one that contributed the most to the variation in the data), which they refer to as the “extrinsic factor”. The variance due to genotype was apparent in factor 9 (the factor that contributed the least to the variation in the data), the “intrinsic factor”.
They found that genes correlating only with the extrinsic factor were enriched in signaling pathways associated with cell surface receptors, proliferation, and negative regulation of differentiation. Genes correlating with the intrinsic factor were enriched in more dissimilar terms such as intracellular receptor signaling and transport and positive regulation of transcription. They found 4573 genes whose expression correlated with both or either the extrinsic or intrinsic factor. After filtering for genes that were associated with Ensembl identifiers, whose protein products were related to the cell behaviors explored here: ECM and cell junction proteins, cell adhesion and signaling molecules, whose expression correlated with the key measured phenotypes, 160 genes showed correlation with at least one phenotypic feature. This shows a strong correlation between gene expression and phenotypes explored here in a large panel of iPSC lines. This correlation, however, was not seen in cells whose phenotypic values were outliers.
They searched for SNVs in the 332 genes whose expression correlated with the extrinsic and intrinsic factors. They identified 28 SNVs that were classified as rare, deleterious, and destabilizing to protein structure. Out of those, 18 occurred in cell lines that were outliers for one or more phenotypes. They then tested whether the presence of rare, deleterious, and destabilizing SNVs could predict outlier cell behavior in the complete collection of 700 lines. They looked at two SNVs in two cell lines. Only one was an outlier when looking at the cells in the pluripotent state, but when induced to differentiate, they both showed altered differentiation patterns.
Strengths:
- Well characterized, large dataset
- Builds on two previously generated, tested approaches -- dimensionality reduction approach, PEER, and protocol for high-content imaging of iPSC lines
- Generates widely applicable platform for studies involving human iPSC lines
Weaknesses:
- Figures need legends, axes labels, more clarity for the reader
- Accessibility, some technical terms need to be briefly touched upon
Part II – Detailed comments
1. Significance
Authors make the case for the importance of this study and its immediate applicability and usefulness to the field: “Now that the applications of human induced pluripotent stem cells (hiPSC) for disease modelling and drug discovery are well established, attention is turning to the creation of large cohorts of hiPSC from healthy donors to examine common genetic variants and their effects on gene expression and cellular phenotypes1-5”.
They also mentioned a previous study where they “demonstrate[d] a donor contribution of approximately 8-23% to the observed variation [in the phenotype]”. The current study goes beyond this in that they aim to identify the genetic variants that were responsible for that variation.
In addition, the authors mentioned a previous study, Carcamo-Orive et al. 2017, that also showed a strong correlation between gene expression and phenotypic features in a large cohort of iPSC lines. They point out that in this study, cells were not exposed to different environmental stimuli, something they do in the current study where the cells are exposed to different fibronectin concentrations.
2. Observation
Not mentioned how the 7 disease cell lines (from individuals with Bardet-Biedl Syndrome) were used.
Slight discrepancy in the description in Figure 1. Says that the 110 iPSC lines come from 85 individuals, however, the paper described the lines coming from 65 healthy donors, 7 individuals with Bardet-Biedl Syndrome, and 3 non-HipSci control lines.
Logical flow of the filtering decisions they made to identify the genes correlating with extrinsic and intrinsic variation in phenotype. After they identified the 4573 genes correlated with either extrinsic or intrinsic factors, or both and filtered them to obtain 3880 genes, they then filtered the genes according to the function of their protein product selecting genes whose products were in the following categories: ECM and cell junction proteins, cell adhesion and cell signaling molecules. They then performed correlation analysis between the expression levels and the measured phenotypes and found 160 genes that showed a significant correlation with at least one phenotype in at least one fibronectin concentration. Supplementary Table 2 does not seem to be included though.
Also important that the authors looked at the outlier cells.
Would be helpful for the non-expert reader that they provided a brief explanation of some of the technical terms such as: automatic relevance detection, principal component analysis, just to give the reader an idea of when these would be used or what they can do to the data.
3. Interpretation
They find that from the genes that they found to be correlated with extrinsic and intrinsic variation in cellular phenotype, 160 showed significant correlation with at least one phenotypic feature. In addition, 54 of the genes showed significant correlation with multiple of the phenotypic features. The authors’ conclusion that “these results indicate a robust correlation between RNA expression and the phenotypic features in a large panel of iPSC lines, with specific RNAs associated with intrinsic or extrinsic factors” follows from these data.
They then look at what could be setting outlier cells apart given that their variation does not show the correlation between gene expression and phenotypic features seen for the rest of the cells. They look at SNVs among the genes whose expression is correlated with extrinsic and intrinsic factors and find 28 that are rare, deleterious, and destabilizing. Of those, 18 occurred in cell lines that were outliers for one or more phenotype. Here it is worth looking at the non-outlier cells to see if they had any of the other 10 SNVs and if they did, checking in what genes these SNVs were present, perhaps those are genes whose expression had a lower correlation with the factors than the other genes. Another question worth answering then is whether those 18 SNVs that occurred in the outlier cell lines were in the genes that showed the highest correlation to the factors.
Next, they test whether the presence of these SNVs could predict outlier cell behavior using the complete cell line collection, of 700 lines. They look at two SNVs in two cell lines. Only one line was an outlier when looking at the cells in the pluripotent state, but when induced to differentiate, they both showed altered differentiation patterns. This leads the authors to conclude, in the discussion, that they have proof of principle that their approach can reveal SNVs that have lineage specific effects. This is the only conclusion that can be made from the data, that they have a proof of principle. This last section seems a bit underdeveloped. The authors could look at some of the non-outlier cell lines to see if they had any of the other 10 SNVs (discussed in previous paragraph). If any did, they could look at those under differentiation conditions to see if perhaps they would show altered differentiation patterns. If this was the case, this would provide more evidence that SNVs can provide additional information about the behavior of the cells. The authors may also look at a couple or a few other cell lines from the 700 lines to see if the ones that have any of the 28 SNVs are also outliers, or if not, if they show altered differentiation patterns.
In addition, something that could be touched on as well is whether passage number had any correlation to outlier cell behavior since the authors looked at cells between passage 15 and 45.
4. Clarity
The article was well organized, mostly easy to read. However, there were some technical terms that if briefly explained could go a long way in terms of making the article more accessible to a wider audience (discussed in section 2 above).
The figures, especially where there are plots need to have labels and legends that clearly explain what data is being depicted. This applies for the supplementary figures as well, which currently do not even have figure or table numbers, and supplementary table 2 seems to be missing. Figure 3 in the main article is very low resolution and some parts are not distinguishable, such as part d, making it difficult to interpret.