/vedantasp/blogs
/vedantasp/blogs
Experiments.
i have been working with niklas nolte, exploring discrete diffusion and how it performs on chess problems. There is another paper whose existence helped me kind of figure out how the tokenisation would work but it was not that simple.
to train chess datasets what we need to change is how the tokeniser interprets tokens. for the use case of chess, i had to feed all the valid tokens possible in chess for it to work, if there are a large number of tokens that get mapped to [UNK}, the train/loss shoots up to NAN took me hours to figure out a minor bug because i kept overcomplicating it, all that was needed was a deep long look at the code.
the train/loss esp for the first few epoch hovers around 0.5, i think i can get it lower by making some more changes to the tokenisation, there are some unknown tokens ,
if we can figure out a way to what causes what after i at least run the basic experimentations till 20, i mean now it trains faster than it did with my runs from last week which were taking 2:50 hours for no reason.
19 Jan 2025
Experiments.
i have been working with niklas nolte, exploring discrete diffusion and how it performs on chess problems. There is another paper whose existence helped me kind of figure out how the tokenisation would work but it was not that simple.
to train chess datasets what we need to change is how the tokeniser interprets tokens. for the use case of chess, i had to feed all the valid tokens possible in chess for it to work, if there are a large number of tokens that get mapped to [UNK}, the train/loss shoots up to NAN took me hours to figure out a minor bug because i kept overcomplicating it, all that was needed was a deep long look at the code.
the train/loss esp for the first few epoch hovers around 0.5, i think i can get it lower by making some more changes to the tokenisation, there are some unknown tokens ,
if we can figure out a way to what causes what after i at least run the basic experimentations till 20, i mean now it trains faster than it did with my runs from last week which were taking 2:50 hours for no reason.
19 Jan 2025