Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The old .doc format was never "a dump of Word's working memory", implying copy of raw bytes. It's rather Word's internal object graph serialized into COM Structured Storage (https://en.wikipedia.org/wiki/COM_Structured_Storage), which is basically a FAT-like filesystem inside a single file. This is convenient for the app because it gets an FS-like API and can serialize data into as many "files" as is convenient, and dynamically update data inside without overwriting the entire container file every time (which, back when this all was designed in late 80s - early 90s, would be slow).

Thus the reason why you could end up with old bits of a Word document sticking around inside the .doc is the same as to why your FS has bits of deleted files: the space has been marked as free, and nothing else overwrote it yet.

But none of this applies to images, so the explanation here ought to be different.



> The old .doc format was never "a dump of Word's working memory", implying copy of raw bytes. It's rather Word's internal object graph serialized into COM Structured Storage

Probably the "dump of Word's working memory" part emanates from Word for DOS, which predates COM by the order of a decade.


MacWord 3.0+ (and then WinWord 1.1+) had fast-save which leaned on the in-memory piece-table data structure to write to disk only the changes to the Word document.

see https://web.archive.org/web/20160308183811/http://1017.songt...

The COM structured storage Office file format came from OLE2 (object linking and embedding - one of the mid-90s must-have features)


It's something people have been parroting ever since this Spolsky post from 2008 https://www.joelonsoftware.com/2008/02/19/why-are-the-micros...


This is also why "Save As..." with old Word versions would often produce files much smaller than "Save" would - it was writing a brand new, compact file which was effectively "garbage collected".


This stemmed from the fast save feature which was present from Word97 through 2003. Earlier versions didn't work that way.


Wow, COM Structured Storage. That brings back memories, but not necessarily good ones. And as for working with OLE... ouch.


It was hard on the developer, but some of the features it enabled were very impressive, like the ability to arbitrarily embed documents into other documents in a way that allows composite rendering as a single piece, without the app managing either part aware of the nature of the other. In fact, I'm not aware of any modern equivalent of this tech, not even on Windows (since Office stopped supporting embedding of its documents via OLE by third-party apps).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: