נושא הפרוייקט

מספר פרוייקט מחלקה שמות סטודנטים אימייל שמות מנחים

Model-based features construction for Deep Reinforcement Learning

בניית קלטים ללמידת חיזוקים עמוקה מבוססת מודלים

תקציר בעיברית

כיום, בעיות מודרניות רבות נפתרות באמצעות שימוש למידת חיזוקים. ממערכות אוטונומיות ועד לצ'אט בוטים החדשניים ביותר.
בעבודה זו, נציג דרך חדשנית לשלב ידע מומחה בתהליך הלמידה של סוכני למידת חיזוקים. שילוב ידע המומחה ויצירתו בצורה טבעית יבוצע באמצעות פרדיגמת התכנות ההתנהגותי (Behavioral Programming). אנו נציג כיצד שימוש בפרדיגמה מאפשר את השילוב של פיצ'רים המיוצגים כתהליכים במערכת. על מנת להעריך את השיטה שלנו אנו ננתח את ההשפעות של השימוש בפרדיגמה על תהליך הלמידה. נתמקד בעיקר במשחקי קופסה, בהם נתרגם אסטרטגיות במשחקים לפיצ'רים בעלי מצב (stateful)
ההיפותזה שלנו היא שבעזרת שימוש בשיטה המוצגת, נוכל לשפר את תהליך הלמידה כולו על ידי הקטנת מספר האפוקים (epochs) הדרושים לאימון וכן ישתפר דיוק המודלים הנוצרים בתהליך האימון.
בנוסף, אני מאמינים כי שילוב האסטרטגיות והתהליכים הללו באמצעות הפרדיגמה ישפר ספציפית את היכולת לאמן סוכנים לפתור סביבות בעלות מסלולים ארוכים (long trajectories) ועם פרסים מפוזרים (sparse reward) ביעילות.

תקציר באנגלית

Many real-life problems are solved by state-of-the-art Reinforcement Learning (RL) solutions, from autonomous systems to artificial intelligence chatbots. 
In this work, we present a novel approach for incorporating domain knowledge to the learning process. Specifically, we present a method to express domain knowledge in a natural way using the Behavioral Programming paradigm. We show how the paradigm allows for incorporating features that represent processes in the system. To evaluate the approach, we analyze the effect of the approach on the learning process. We will focus on board games and translate game strategies into stateful features. Our working hypothesis is that the approach will improve the learning process, reducing the number of required epochs, and improve the accuracy of the resulting models. Furthermore, we believe that incorporating strategies and processes as domain features will improve the ability of trained agents in learning difficult environments with long trajectories where the reward is sparse.