familiarcycle

More notes on GPT-2 fine-tuning

2/4/2020

I'm currently fine-tuning GPT-2 on the full 43MB catalog of Accidental Tech Podcast transcripts. It's running on a GCP TPUv2-8 (and previously on a v3-8).

Some learnings and notes: