נושא הפרוייקט
מספר פרוייקט
מחלקה
שמות סטודנטים
אימייל
שמות מנחים
גילוי סחיפת נושא עבור ניתוח התפתחות מחלה
Concept drift detection for disease development analysis
תקציר בעיברית
גילוי סחיפת נושא עבור ניתוח התפתחות מחלה
תקציר באנגלית
Introduction: In this project, I present my own Python implementation of the concept drift detection and re-learning (CDDRL) algorithm, originally coded in Matlab. I demonstrate this implementation in the identification of biomarkers and in early detection of Crohn's disease (CD) based on electronic health records (EHRs) of 5021 pre-diagnostic patients who were diagnosed in CD between 2005 and 2021 and 9896 controls. Background: Longitudinal clinical data, as contained in the EHR, are routinely collected by healthcare providers and can be viewed as a data stream. Analysis of EHR data is needed for the detection of biomarkers and early identification of chronic diseases. The CDDRL algorithm, modeling the clinical domain using a Bayesian network (BN) re-learned over time, was developed to analyze data streams and meet these goals, but since it is implemented in Matlab, it cannot be used on servers of healthcare providers that are reluctant to pay the license fee. Therefore, to apply the CDDRL algorithm to clinical data, in the first part of this project, I had to convert the Matlab implementation into a Python one. Then, I applied the CDDRL algorithm to analyze the development of CD in pre-diagnostic patients. Method: The conversion of the code included an in-depth understanding of all algorithm modules, re-coding of over 1300 code lines, and a strict validation process against the MATLAB version. The research part of the project included application of a pipeline to prepare the clinical data followed by the application of the CDDRL algorithm on this data. Finally, I examined and analyzed the BNs learned by the algorithm to study the development of CD in pre-diagnostic patients. Results: All intermediate results of the Python implementation were similar to those of the Matlab version, as well as the predictability and explainability of CD. Applying the CDDRL algorithm to the CD data, we found strong connections between diagnoses of gastrointestinal disorders and Crohn's disease at least 7 years before the diagnosis. Biomarkers (blood test results) whose appearance was found related to the disease long before diagnosis were high-density lipoprotein (HDL) cholesterol, platelets volume, and the Mean corpuscular volume (MCV) tests. Conclusion: The conversion of the CDDRL algorithm to Python provides us with an important tool that enables future studies of clinical data of pre-diagnostic patients in different diseases to help in the identification of biomarkers and promote early detection. Following the analysis of CD, we plan application of the CDDRL algorithm to other diseases such as Parkinson's and lung cancer.