To feel connected to another person in VR, the delay between you speaking and the other person hearing you must be minimized. Studies of sensitivity to audio delay have been made (including our own) suggesting that you need to be below around 150 milliseconds one-way latency to be able to speak comfortably with another person. But obviously lower is better, and something as demanding as playing music together requires lower than 15 milliseconds or so.
As with HMD sensor-to-photon latency, there are lots of things that can go wrong and timing inside the software isn't sufficient. To measure this latency with absolute certainty, we use the setup shown above. A two-channel oscilloscope is driven by the microphone and headphones (we split them off at the analog input and output to the PC). A metronome (like you use for piano practice) make a regular 'click' sound that triggers the first scope channel, and the second channel shows the 'echo' of the sound coming back from the server some milliseconds later. This means the true end-to-end delay can be measured by counting the time on the display between the two spikes, as you can see in the picture above.