Well, it is difficult, because for a media player, you need to have a correct video stack, (which means X11, DRI, mesa, OpenGL etc...) and a correct audio stack (which means pulseaudio) and those are quite hard to ship in a cross platform way.
If you look at the Snap packages of VLC, a LARGE part is not VLC at all, but this graphic stack.
Why is it so hard? Do you mean that it would be extremely difficult to make the large number of binaries required for all the different Linux ditros?