Nice breakdown. Seeing each core as its own machine passing messages through cache lines makes the usual atomic patterns much easier to reason about. This mindset also explains why misuse of atomics can kill performance when you forget how caches talk to each other. Good mental model, bookmarked for future reference.
You know reordering must have happened. As it stands, no reordering is necessary -- it could simply be that both writes took effect at thread 2 in-order, but with the write to *ptr_to_shared_1 taking effect after the first print statement.