As large models advance, there’s growing demand to use knowledge distillation to produce smaller, more portable models (student) that match ...