arent a lot of transformers built in a way where attention is only applied to previous tokens in sequence, even though its fully possible to apply it both ways?
That's the autoregressive aspect. The decoder aspect is that the last layer converts representations into output sequences (and the generation happens autoregressively, one at a time). Similarly at the last layer an encoder outputs a representation/embedding (while being able to attend to the entire sequence).
The term for models that look only at previous tokens in the sequence is auto-regressive.
Encoder and decoder has nothing to do with this.