快捷搜索:

(Knowledge Distillation)

-2015Hinton:

Knowledge DistillationKD(Knowledge)(Distill):

Q&A

2. HintonKDStudent modelTeacher modelKDTeacher modelStudent modelTeacher modelsoft targetsTeacher model;

Teacher modelTeacher modelstudent model

:

underfittingoverfittingAI:

()()()()

()

variance()

:

))

(32).

TeacherStudentteacherstudent2:

: Teacher, Net-TTeacherX, YYsoftmax

: Student, Net-SXYYsoftmax

softmax

():

()

Net-TNet-TNet-SNet-SNet-T

softmaxsoft target

softmax(hard target)KDNet-S

23softmax30.12770.12hard targetsoft targetsoft targethard targetsoft targetsoft target

Net-Shard target

softmaxsoft target, : softmax0

Net-T:distill loss(soft target)student loss(hard target)

Net-T Net-S transfer set (Net-Ttraining set), Net-Tsoftmax distribution (with high temperature) soft targetNet-STsoftmaxsoft targetcross entropy

: Net-Tground truthNet-S

soft targetgradienthard targetgradient

(ps. )

Net-SinferencesoftmaxT1.

match logitssoftmaxlogitssoft targetsNet-TNet-Slogits

Net-S: Net-S

Net-Tnoisyempirical:

TNet-SNet-Scapture all knowledge

nervanasystems.github.io/distiller/knowledge_distillation.html

您可能还会对下面的文章感兴趣: