Molecular and Cytogenetic Markers and MRD in Pediatric AML and ALL
Leukemias - Cytogenetics and Molecular Markers in Diagnosis and Prognosis
Identifying Candidate Normal and Leukemic B Cell Progenitor Populations with Hierarchical Clustering of 6-Color Flow Cytometry Data - A Better View.
Karel Fier, PhD1,*,
Tomá Sieger, MSc2,* and
Josef H. Vormoor, MD1
1 Northern Institute for Cancer Research, Newcastle University, Newcastle upon Tyne, United Kingdom and 2 Praha, Czech Republic.
6-color flow cytometry allows multiparameter analysis of highnumbers of single cells. It is an excellent tool for the characterizationof a wide range of hematopoietic populations and for monitoringminimal residual disease. However, analysis of complex flowdata is challenging. Gating populations on 28 two-parameterplots is extremely tedious and does not reflect the multidimensionalityof the data. Here, we describe a novel approach, employing hierarchicalclustering (HCA) and support vector machine (SVM) learning inanalyzing flow data. This approach provides a new perspectivefor looking at flow data and promises better identificationof rare and novel subpopulations that escape classic analysis.Our aim was to identify normal and leukemic B cell progenitor/stemcell populations in normal (n=6) and ALL (n=10) bone marrow.Samples were labelled with fluorochrome-conjugated antibodiesto 6 CD markers (CD 10, 19, 22, 34, 38, 117) and 104 to 106events were acquired (FACSCanto, BD Biosciences). To analyzeflow data with HCA we developed a new algorithm, better suitedfor the ellipsoid nature of cell populations than other currentHCA metrics. Data exported from DiVa software were externallycompensated and Hyperlog transformed to achieve a logarithmic-likescale that displayed zero and negative values. Normalized datawere then subjected to HCA employing a scale-invariant Mahalanobisdistance measurement for merging clusters. This reflects theextended ellipsoid shape of the populations (here: 8 dimensionalellipsoids). We developed a new adaptive linkage algorithm thatsmoothly shifts from the Euclidean distance (when clusters aretoo small to compute Mahalanobis distance) to Mahalanobis distancemeasurement. This allowed us to build the hierarchy from singleevents, yet to retain the advantage of Mahalanobis measurementfor larger clusters. To build classifiers we used SVM employingpolynomial kernel. All work was carried out in MATLAB (MathWorks,Inc.). The resulting hierarchical tree combined with the heatmapof the CD marker expression allows visualization of hierarchicallyclustered data with all 8 parameters displayed in a single plot(!) as compared to 28 traditional two-parameter plots. HCA hasbig advantage of providing populations homogenous in their expressionpattern of all parameters (without the need for complex subor back gating). We were able to identify populations correspondingto the different stages of B-cell development. In a normal controlbone marrow we could detect the following candidate B-lineageprogenitor populations: CD34+117+38+10–22–19–(0.94% of total) progenitor/stem cells, CD34+117–38+10+22+19med(0.26% of total) pro-B cells, CD34–117–38+10+22+19+(2.77% of total) small pre-B cells (lower FCS values), CD34–117–38+10+22+19+(1.09% of total) large pre-B cells (higher FCS values) and CD34–117–38lo10–22+19+(5.94% of total) (immature) B cells. In 10 diagnostic or relapsesamples HCA clearly identified the main leukemic population.HCA is able to visualize otherwise "hidden" populations. Thiswas exemplified by a distinct CD38+B-lin– population thatoverlapped with other populations in all 28 two-parameter plots(most likely T cells). We have built a classifier able to findestablished populations across samples and in large datasets(106 events) for which HCA would be computationally too demanding.In summary, we show the advantages of using hierarchical clusteringanalysis for large complex multiparameter flow cytometry datasets.
Disclosure: No relevant conflicts of interest to declare.