נושא הפרוייקט

מספר פרוייקט מחלקה שמות סטודנטים אימייל שמות מנחים

האצה של רשתות עצביות במעבד דל אמצעים

HW implementation of DNN in embedded processor

תקציר בעיברית

בשנים האחרונות רשתות נוירונים הפכו לנושא מרכזי כחלק מההתפתחות המהירה של תחום הבינה המלאכותית.

אולם, היכולת החישובית הגבוהה הנדרשת לממשק למידת מכונה על מכשירי אינטרנט של הדברים (IoT), המיושמים על גבי מעבדים זעירים, מונעת שימוש ישיר בתוכנה על גבי מכשירים אלו.

היעד בפרויקט זה הוא ליצור מיפוי יעיל של יכולת החישוב על פני מסגרת חומרתית ותוכנתית לצורך האצת ממשק למידת מכונה על מכשירי קצה, תוך שימוש בסביבת עבודה תוכנתית של TensorFlow Lite for Microcontroller (TFLM) הרצה על מעבד דל הספק ועל מאיץ חומרתי.

השיטה שאנו מציעים היא שימוש בכרטיס FPGA, שבו ממומשות שכבות ההאצה. מאיצי השכבות החומרתיים ממוזגים עם סביבת העבודה התוכנתית של TFLM. אנו נייעל את המאיצים הללו ונעבור לעבודה של מספר ליבות במקביל.

בעקבות מיקבול המאיצים הגענו להאצה ליניארית כתלות במספר המאיצים (ופי 717 ביחס להרצה תוכנתית), תוך שמירה על אחוז הדיוק של המערכת.

דאנו מסיקים מכך כי האצת שכבות רשתות עצביות מתפתחות היא בעלת תועלות משמעותיות. קיימות אפשרויות אופטימזציה רבות נוספות בעלות סיכויים גבוהים להגיע להישגים חסכוניים אף יותר.

תקציר באנגלית

In recent years, Neural Networks have become one of today’s main fields of interest as part of the rapid development in AI.

However, the high computational demand required for Machine Learning inference on tiny IoT devices avoids a direct software deployment on such devices.

Our objective is to implement Efficient mapping of the computational load onto hardware and software resources in order to accelerate machine learning inference. This will be done by using a modified TFLM model running on a Microcontroller and a custom hardware accelerator.

Our proposed method is implementing and optimizing hardware acceleration layers integrated in the TFLM software platform.

We evaluate our implementation using the MNIST dataset as a benchmark. Excellent results are received with a substantial speedup (X717) in the CNN performance.

We conclude from the results that accelerating the CNN layers has significant benefits. There is much prospect for further optimization to achieve an even more economical solution.