Öğrenci adı: Reyhan Kevser Keser
Tez adı: Signal Processing Based Knowlegde Distillation for Model Compression
Tarih / saat: 04.07.2024 / 10:30
Konum: Bilişim Enstitüsü 411 nolu oda
Danışman: Prof. Dr. Behçet Uğur Töreyin
Abstract:
Knowledge distillation is an effective tool for model training, which refers to the process of knowledge transfer between models. In the context of knowledge distillation, the model to be trained with the injected knowledge is named student, where the teacher refers to the model whose knowledge is acquired. It can be exploited for various aims including improving model performance, accelerating the model, and reducing model parameters. Further, with the advent of diverse distillation schemes, it can be efficiently applied in various scenarios and problems. Thus, it has a wide range of application fields including computer vision and natural language processing.
This thesis comprises the studies conducted on numerous problems of knowledge distillation, as well as the literature review. The first problem we focus on is hint position selection as an essential element in hint distillation, which is transferring features extracted in intermediate layers, namely hints. First, we demonstrate the importance of the determination of the hint positions. Then, we propose an efficient hint point selection methodology based on layer clustering. For this purpose, we exploit the k-means algorithm with specially designed metrics for layer comparison. We validate our approach by conducting comprehensive experiments utilizing various architectures for teacher-student pairs, hint types, and hint distillation methods, on two well-known image classification datasets. The results indicate that the proposed method achieves superior performance compared to the conventional approach.
Another problem focused on in this thesis is model stealing, which refers to acquiring knowledge of a model that is desired to be protected due to the privacy concerns or commercial purposes. Since knowledge distillation can be exploited for model stealing, the concept of the undistillable teacher has been introduced recently, which aims to protect the model from stealing its knowledge via distillation. To contribute to this field, we propose an approach called averager student, whose goal is distilling the undistillable teacher, in this thesis. We evaluate the proposed approach for given teachers which are undistillable or normal. The results suggest that the proposed method outperforms the compared methods whose aim is the same as ours.
The last problem we addressed is cross distillation, which means the distillation process between teacher and student models that operate on different modalities. In this work, we introduce a cross distillation scheme that transfers the compressed domain knowledge to the pixel domain. Further, we employ hint distillation which utilizes our previously proposed hint selection method. We evaluate our approach on two computer vision tasks, that are object detection and recognition. The results demonstrate that compressed domain knowledge can be efficiently exploited in a task in the pixel domain via the proposed approach.