Today I fixed a strange problem where BT Business Cloud Voice phones would have about a 10 second delay when answering, or when dialing out. When answering a call it would stay silent for a few seconds before finally working. The phones are Yealink T46 supplied by BT, the problem would have probably been fixed a lot quicker if BT gave us access to the phone admin accounts to look at the logs. We did actually have the admin passwords when the system was first installed, but they have been changed now.
It seems like the Yealink phones do a DNS lookup every time you make or answer a call. The reason for the 10 second delay was because there was a problem with the primary DNS server and so the phone was waiting 10 seconds for it to fail before trying the secondary DNS server.
No other devices on the network were showing any problems because I guess the PCs just queried both DNS servers and used whichever replied first, they would have also cached the DNS responses. So I got stuck going through the BT firewall setup guide, thinking it was maybe related to the NAT UDP session timeout (even though I know it was set right) and didn't think about DNS at all. I still don't know why the phones would pause to do a DNS lookup every single time you answer or dial the phone since they have an open session to the server all the time anyway.
I eventually found out there was a DNS problem when the IoT chargers went offline. I guess their DNS lookups were cached but once the cached data expired and they needed to do a lookup again, that failed. Luckily I have access to the EV charger logs so I could see what was going wrong, otherwise we might still be waiting for BT to get back to us (actually we are still waiting, they went quiet after we asked to see the phones event logs).
There is nothing strange about the local network, it's just a single Draytek router doing DHCP with the ISPs DNS server as the primary, and Googles public DNS as the secondary. The ISPs DNS server has either gone wrong, or they changed it without telling us, so 99% of lookups just timeout.