I linked to the paper because it wasn't clear if OP's goal was to increase the security with the cascaded hashes or not, and also because I personally found this result very surprising and interesting when I first learned about it.
If you have a moment can you please elaborate a little more on your second paragraph? Are you describing applying a similar method inside the compression function of the hash function? Any hash function? Where does the parallelism come in? Thank you!
As for the second paragraph: it is not an parallelism in the traditional sense, but as an shortcut in description of the design. To rephrase it in concrete terms: if you implement 128b block cipher as an two independent 64b wide SPN networks, you will not get the same security level as in 128b wide SPN network and the reason why is obvious from the SPN network diagram. (Compression function of hash function based on Merkle-Damgard construction is effectively an block cipher with somewhat large key)
In fact, the paper even mentions inverse of this in the context of the sketch of wider RIPEMD, where the sketch recommends some mixing between between the separate streams (particularly, exchanging one word of the state) between rounds, which the authrs view as enough to make their multi-collision attack infeasible.
If you have a moment can you please elaborate a little more on your second paragraph? Are you describing applying a similar method inside the compression function of the hash function? Any hash function? Where does the parallelism come in? Thank you!