נושא הפרוייקט

מספר פרוייקט מחלקה שמות סטודנטים אימייל שמות מנחים

התקפות WB מבוססות גרדיאנט על מודלי BB באמצעות מודל חלופי

Whitebox gradient based attacks on black box models via surrogate model

תקציר בעיברית

Adversarial examples נוצרות על ידי הוספת כמות קטנה של רעש לדגימה מקורית, באופן שהבעיה אינה מורגשת לבני אדם, אך הדגימה החדשה גורמת למודל למידת מכונה לחזות תוצאה שונה. כיום, רוב האלגוריתמים החזקים של התקפות אדוורסריות דורשות ידע על הפרמטרים של המודל (הידועים כהתקפות קופסא לבנה) וכוללים ניצול של גרדיאנט המודל. עם זאת, ההתקפות הנפוצות והמאיימות ביותר הן כאלה שהתוקף יכול לבצע רק שאילתות על המודל (הידוע בתור התקפות קופסה שחורה). יתר על כן, ישנם מודלים רבים שאינם גזירים ולכן ללא גרדיאנט (למשלRandom Forest ) . כדי להבין טוב יותר את רמת הביטחות של אותם מודלי קופסה שחורה ומודלים שאינם גזירים, נרצה להפעיל נגדם התקפות קיימות מבוססות גרדיאנט. השיטה שלנו היא (1) לאמן רשת עצבית עמוקה (DNN) שמשחזרת את משטח ההחלטה של מודל הקופסה השחורה (בניגוד להעתקת ביצועי המודל), (2) לתקוף את המודל DNN החלופי עם התקפת קופסה לבנה ולאחר מכן (3) להשתמש בדוגמה האדוורסרית שהתקבלה במודל המקורי . בניגוד לשיטות אחרות המשתמשות במודלים חלופיים השיטה שלנו מתמקדת בהעתקת המשטח בעוד שהאחרת מתמקדת בגבול ההחלטה עצמו.

תקציר באנגלית

Adversarial examples are created by adding a small amount of noise to an original sample, in such a way that the problem is imperceptible to humans, but the new sample make machine learning model to predict a different result. Today, most powerful adversarial attacks algorithms require knowledge of the model's parameters (known as white-box attacks) and involve exploiting the gradient of the model. However, the most common and threatening attacks are where the attacker can only query the model (known as black-box attacks). Moreover, there are many models which are non-differentiable (e.g., Random Forest model). To better understand the security of black-box models and non-differentiable models, we would like to apply existing gradient based attacks against them. Our approach is to (1) have a deep neural network (DNN) learn how to reproduce the decision surface of black-box (BB) model (as opposed to just copying the model's performance), (2) attack that surrogate model DNN with a white box (WB) attack, and then (3) use the resulting adversarial example on the BB model. In contrast to other methods which use surrogate models our method focuses on the fidelity of the surface while the other focus on the decision boundary itself.