The hard thing about that approach might be handling conflicting signals and determining a "source of truth" in adverse situations. Generally speaking, there are variables in the real world that will trip up any given strategy (even if you were to station a human at each doorway with a hand counter). And while nothing can count perfectly 100% of the time, the difficulty is in finding an affordable strategy that will be really close or perfect most of the time.
Here's a neat demo of a multi-sensing device (not ours) which combines many signals to guess the activity taking place in a room: https://www.youtube.com/watch?v=aqbKrrru2co