Of course it can. A typical structure for a lossless compressor is a predictor feeding into an entropy encoder. Your predictor tells you how likely each next token is, and the entropy encoder uses that information to encode more likely tokens with fewer bits. As long as the compressor and decompressor make the same predictions, you’ll be able to reconstruct the original data exactly. And if the predictions are good, you’ll compress the data well.
An LLM (specifically the type people usually mean, like ChatGPT) uses a big transformer network to assign probabilities* to the all possible next tokens. To generate text, you randomly pick from the most likely next tokens, then repeat. But you can just leave out that second part, and use the transformer as your predictor for compression.
* Actually the log of the probability, and in modern, RLHF tuned models, they don’t quite represent probabilities anymore, but anyway