נושא הפרוייקט
מספר פרוייקט
מחלקה
שמות סטודנטים
אימייל
שמות מנחים
גישה בייסיאנית דלילה לכימיה אורגנית: ניצולת תגובות ובחירת זרז
A Sparse Bayesian Approach to Organic Chemistry: Reaction Yields and Catalyst Selection
תקציר בעיברית
גישה בייסיאנית דלילה לכימיה אורגנית: ניצולת תגובות ובחירת זרז
תקציר באנגלית
Feature extraction and selection from molecules play a central role in explanatory and predictive statistical models of computational chemistry. Such a task is particularly challenging when the number of experiments is limited, thus yielding a small number of samples. This project implements a sparse Bayesian approach to feature selection and experimental design for several regression and classification problems in organic chemistry. The sparse Bayesian approach assumes a generative model of regression and classification in which potential features are incorporated into the models via random binary variables. Parameters of the Bayesian model are obtained from the posterior distribution using Markov chain Monte Carlo (MCMC) sampling methods (such as Gibbs sampling and the Metropolis-Adjusted Langevin algorithm). All algorithms were implemented from scratch in Python with Numpy to match the generative model precisely. The result is a visual output that allows the researcher to appreciate the participation of each binary variable in the posterior distribution, thus finding significant variables for the phenomenon being investigated. Unlike standard regularization methods, the method can describe several alternatives for the same problem. The method was implemented in two studies in Dr. Millo's laboratory at Ben-Gurion University. The first study deals with deuteration, a type of chemical reaction. In an experiment conducted in Dr. Millo's laboratory, ten molecules and 17 features were examined, with two chosen for predicting the reaction rate, a regression task, in a published article. The second study deals with the polymorphism of crystals based on an existing database containing about 80,000 records. The learning task is binary classification and aims to learn which features signal the existence of a polymorphic crystal. In both studies, the researchers had previously chosen features according to their professional decision, combined with standard regularization methods and forward and backward regression. Our approach revealed three additional features that might be relevant to the first study. The results for the second study are being explored these days. In both cases, the researchers have shown a willingness to increase the number of initial features to allow the Bayesian approach to find additional relevant features, and there are additional studies in Dr. Millo's lab that will use the method soon.