נושא הפרוייקט

מספר פרוייקט מחלקה שמות סטודנטים אימייל שמות מנחים

שימוש באלגוריתמים אבולוציונים בכדי לחלץ מודל black box עם מספר מינימלי של שאילתות

Using evolutionary algorithms to extract a black box model with the fewest queries possible

תקציר בעיברית

Model extraction היא התקפה שבה תוקף מנסה לגנוב מודל למידת מכונה. התקפה זו בדרך כלל מושגת על ידי שליחת שאילתות למודל ההמטרה (המודל אותו נרצה לגנוב) ולאחר מכן אימון מודל חדש על באמצעות הפלטים שהתקבלו ממודל המטרה. התקפות אלו מאיימות במיוחד על חברות המציעות למידת מכונה כשירות (MLaaS) שבה הציבור יכול לתשאל את המודלים הקנייניים הללו.
התקפות model extraction קיימות פועלות על ידי ביצוע מיליוני עד מיליארדי שאילתות הבודקות את אזורי הביטחון הגבוהים של המודל. עם זאת, התקפות אלו אינן מעשיות במקרים רבים מכיוון שמספר השאילתות בלבד יכול להיות מזוהה כחריג על ידי המגן. יתר על כן, מודלים שנגנבו בשיטות הקיימות אינם מגיעים לאותם ביצועים כמו הדגם המקורי (מה שהופך את מניע התקיפה למפוקפק).
במחקר זה, אנו מציעים שיטה יעילה יותר לביצוע התקפת model extraction. בכדי לייעל את ההתקפה, אנו מציעים להשתמש באלגוריתמים אבולוציוניים כדי לבצע את ההתקפה בהצלחה בפחות שאילתות. אנו מציעים גם אסטרטגיות פוסט עיבוד לא מקוונות המשפרות את ביצועי המודל הגנוב ללא צורך בשאילתה נוספת של המודל של הקורבן. למען האפקטיביות, אנו מציעים לבצע שאילתות על גבול המודל, בניגוד לשאילתות בלבד באזורים בעלי ביטחון גבוה. גילנו שבאמצעות השיטה הנ"ל ניתן לגנוב את המודל בצורה מדויקת יותר (high fidelity).

תקציר באנגלית

Model extraction is an attack where an adversary tries to steal a machine learning model. This is usually accomplished by querying the victim’s model and then training a new model on the responses. These attacks are particularly threatening to companies which offer machine learning as a service (MLaaS) where the public can query these proprietary models. 
Existing model extraction attacks work by executing millions to billions of queries that probe the model’s high confidence regions. However, these attacks are impractical in many cases since the number of queries alone can be detected as anomalous by the defender. Furthermore, the models stolen using existing methods do not perform nearly as well as the original model (making the motive of the attack questionable). 
In this research, we propose a more efficient and effective method for extracting models. For efficiency, we propose the use of evolutionary algorithms to produce the necessary responses in fewer queries. We also propose offline post processing strategies that improve the stolen model’s performance without the need to query the victim’s model further. For effectiveness, we propose querying the model’s boundary as opposed to only querying the areas with high confidence. We found that doing so can captures the victim’s model with higher fidelity.