New year, new blog, new decoders
This will be the last post on the Medium blog as I’m now set up on blog.georeactor.com. I’m hoping that the next post can be a new “ML Arxiv Haul”, with new formatting.
I added topics/tags (such as maps) which have their own RSS feeds. So people can subscribe just to the Arxiv posts, etc.
I started a post about country codes, flags, and TLDs, which I’m pleased has already made an impact! I convinced Emojipedia to set aside a page for the Flag of Sark — emojipedia.org/flag-sark/
Also I started making commits to decoder-ring
, a library to control output of text-generation models:
The idea came ~6 months ago when I was preparing for ML Prague and had issues demoing decoders. The text-generation function in transformers
accepts the parameters to cover all possible decoders, then silently runs the one which appears to fit. Meaning if I want to do Typical Decoding, I send a value for typical_p
. But what if I forget to also pass do_sample=True
, or have a value outside the valid range… [earlier this year] it would silently switch to another decoder. The function also accepts any **kwargs
to pass on to models, so code mistakes are silently ignored.
Transformers has done a bit of a rewrite, but passed on most of my suggestions around logging the actual decoder or raising errors.
decoder-ring
offers an opportunity to separate out text-generation, and explicitly set a decoder with distinct parameters and error handling. I’d like to support several new decoders (such as RankGen, contrastive decoding, or time control) without making a case that it’s going to be notable and necessary for every transformers
user. In the next year I’d like to see text diffusion, reproducible runs, and some kind of decoder + probability visualization beyond the model’s own next-token probabilities.