This is absolutely not cheating; every hand designed algorithm can access and co...

rightbyte · on July 10, 2020

It is not a pointer to an array I'm concerned about but the "neighbour diff vector" or what you should call it that provided by the "environment". See A.1.2.

Doing so many comparisons and storing them has a cost. Also the model can't decide if it is done so at each step the array has to be iterated to see if it is sorted by the "environment". Are they only counting function calls? I guess so. The paper is really hard to follow and the pseudocode syntax is quite madding.

If I understand the paper correctly of course I could be wrong.

If so, "our approach can learn to outperform custom-written solutions for a variety of problems", is bogus.

YeGoblynQueenne · on July 10, 2020

>> Are they only counting function calls? I guess so.

Oh that, yes, it's true. They're listing "average episode lengths" in tables 1-3 and those are their main support for their claim of efficiency. By "episode length" they mean instruction or function calls made during training by the student agent which they compare to the instructions/function calls by the teacher agent. So, no asymptotic analysis, just a count of concrete operations performed to solve e.g. a sorting task.