microgradの簡素化とコード最適化
X投稿を読み込み中...
Andrej Karpathyが自動微分ライブラリmicrogradを243行から200行に削減。各演算が勾配チェーニングの基本のみを表現する設計改善。
日本語翻訳
Andrej Karpathyが自動微分ライブラリmicrogradを243行から200行に削減。各演算が勾配チェーニングの基本のみを表現する設計改善。
原文
I spent more test time compute and realized that my micrograd can be dramatically simplified even further. You just return local gradients for each op and get backward() to do the multiply (chaining) with global gradient from loss. So each op just expresses the bare fundamentals of what it needs to: the forward computation and the backward gradients for it. Huge savings from 243 lines of code to just 200 (~18%). Also, the code now fits even more beautifully to 3 columns and happens to break just right: Column 1: Dataset, Tokenizer, Autograd Column 2: GPT model Column 3: Training, Inference Ok now surely we are done.