DREAM6/FlowCAP2 Molecular Classification of Acute Myeloid Leukemia Challenge

team21

Jose M. G. Vilar†‡
Biophysics Unit (CSIC-UPV/EHU) and Department of Biochemistry and Molecular Biology, University of the Basque Country, Bilbao, Spain
IKERBASQUE, Basque Foundation for Science, Bilbao, Spain


Below is a brief description of the approach of team21, consisting of Jose M. G. Vilar, which ranked first (http://www.the-dream-project.org/result/classification-aml) among the best performers at the DREAM6 Molecular Classification of Acute Myeloid Leukemia Challenge. The description of the challenge can be found at http://www.the-dream-project.org/challenges/dream6flowcap2-molecular-classification-acute-myeloid-leukaemia-challenge.

Rationale

The approach uses relative entropies to evaluate if the distribution of the values of flow cytometry data for a given individual is closer to the overall distribution for AML or for Normal individuals in the space of values Γ. Explicitly, the relative entropy difference ΔSi = Si, AML − Si, Normal =  − Pi(Γ)lnPNormal(Γ)PAML(Γ)dΓ indicates that the individual looks like an AML patient for positive values and like a Normal subject for negative values.

Implementation

  1. Compute the 4-dimensional histograms H(Γ)i, j, k for the values of Γ=("FS Log", "SS Log", "FL3 Log", j) with j ∈ {"FL1 Log","FL2 Log", "FL4 Log","FL5 Log"} for Tube k ∈ {1,2,3,4,5,6,7} for all individuals i. For each individual there are 7 × 4 = 28 histograms.
  2. Compute H(Γ)AML, j, k = i ∈ AMLH(Γ)i, j, k and H(Γ)Normal, j, k = i ∈ NormalH(Γ)i, j, k as the overall histograms for AML and Normal individuals.
  3. Normalize the histograms to obtain the probabilities P(Γ)i, j, k, P(Γ)AML, j, k, and P(Γ)Normal, j, k.
  4. Compute the relative entropy differences ΔSi, j, k =  − ΓPi, j, k(Γ)lnPNormal, j, k(Γ)PAML, j, k(Γ). The total relative entropy difference is defined as ΔSi = j, kΔSi, j, k
  5. The likelihood that an individual i is AML positive is quantified as Li = 1 ⁄ (1 + e − ΔSi).

Code execution

Instructions to get the data for the challenge are available at http://www.the-dream-project.org and http://flowcap.flowsite.org

The two python files can be downloaded below by right clicking their names.

In a directory of a Unix/OSX machine with the files "series10createdist_tot.py", "series10usedist_tot.py", "DREAM6AMLTrainingSet.csv", and the directory "CSV" with the files "0001.CSV", "0002.CSV"... execute:

$ python -O -u series10createdist_tot.py ; python -O -u series10usedist_tot.py
$ cat DREAM6_AML_Predictions_team21u.txt | sort -n -k 3 | cut -f 1,2 > DREAM6_AML_Predictions_team21.txt