That only applies to adds, subtracts, and register moves. 16-bit Booleans, shifts/rotates, multiplies, and load/store still need to be done with multiple instructions.
I did a little mucking on some AVR code of mine. Sometimes going from an uint8_t to a uint16_t saves a couple of bytes Sometimes adds a dozen.
One case changing an index in a for loop to an int, code went from 34024 bytes to 34018 (saved four bytes). But changing uint8_t i, j, k; to uint16_t i, j, k; code compiled to 34068 bytes, gain of 44 bytes.