NN are _just_ transfer functions. Look up tables. Or really dense maps. So f(g(x)) make total sense. But I dont think these are the interesting combinations. I think giving one NN the training experience of another, plus the feedback on "correct inference" will be when on NN trains its replacement.