It does matter. Because there are many people who have access to the servers and they can cooperate with government agencies voluntarily or involuntarily.
For example, if you use "voice assistant" from some major company, what prevents it from voluntarily sharing all the records with the government for the sake of national security? What prevents its employee from secretly sharing the data under some legal obligations?
The location of the server is really important here.
The point of E2E is that it doesn't matter who owns the server. "End to End" means end to end, not 'end to middle then decrypted then to end'. If it's ever decrypted anywhere but an endpoint it's not E2E, by definition.
In the voice assistant example, the server is one of the endpoints. The point they're making is that E2E doesn't really help there if you can't be sure who is looking over shoulders on the other end. If the two ends are both under your control or scrutiny, then it doesn't matter who's hiding in the clouds... Except for metadata of course.
Typical IoT device usually talks to the server where all the received data are often stored for futured use. If they used E2E, server wouldn't be necessary - it would make more sense to connect directly inside home WiFi network for example.