I don't know if anyone does it still, but a few years ago there were a lot of pa...

I don't know if anyone does it still, but a few years ago there were a lot of papers suggesting more or less clever alternatives to ReLU as activation function. There was also a whole zoo of optimizers as alternatives to SGD.

Those papers were within reach for me. Even if the math (or the collossal search effort) needed to find them was out of reach, implementing them wasn't.

There were some things besides optimizers and activation functions too. In particular I remember Dmitri Ulyanov's "Deep Image Priors" paper. He did publish code, but the thing he explored - using the implicit structure in a model architecture without training (or, training on just your input data!) is actually dead simple to try yourself.

I'm sure if you just drink from the firehose of the arxiv AI/ML feeds, you'll find something that tickles your interest that you can actually implement. Or at least play with published code.