What I find most interesting is the multiple training methods used to get the ne...

What I find most interesting is the multiple training methods used to get the network to improve its performance. They name a few in the article:

- dual learning - deliberation networks - joint training - agreement regularization

I haven't read the paper to see how these are combined but it makes intuitive sense that using multiple training methods can lead to better performance. That is to say, to more effectively search the weight space of the network.