Rick Guyton Posted September 26, 2016 Report Posted September 26, 2016 We seem to have never ending issues with BLF. I understand that some at 2600hz kind of write this off and think it's just for "old line key" users. And that is true to some extent. But, none the less it is vitally critical to the operation of some of our client's businesses. I've had two clients call me today to tell me that they are having more BLF issuses. So I believe there is a problem once again. But, the frustrating part to me is that I don't know how I'm supposed to even start diagnosing these issues. Can someone, anyone, please give me a concrete method to diagnose and resolve problems with BLF?I know that there's the presence tab under the debugging tool. But, I have know idea wat BLF states 1,2,3,4 or 5 mean or what PR stands for. Even if I did though, what about situations where some phones show different lights than others? Obviously, some phones will be in line with what the server's status is, while others won't. What can cause that? What can be done to prevent it?
Logicwrath Posted September 26, 2016 Report Posted September 26, 2016 I need to be careful as I am not a subject matter expert on this. However, this type of thing is likely diagnosed using a tool like wireshark. If you have a good understanding of the handshake you identify problems in the SIP responses.I suspect in some cases your firewall can have an affect on this as well. Some firewalls do better than others with VoIP & NAT. You may also want to look at your NAT timeout values in the firewall. I am not sure if BLF registrations and re-registration times are different than the handsets register timeout.https://doc.pfsense.org/index.php/VoIP_ConfigurationThe pfsense article above mentions UDP timeout. If your stateful firewall is timing out the NAT connection and not re-registering, I could imaging how that would cause a problem.You may consider flushing the connection tracking table or reviewing the timeout and registration settings.Here is a screenshot of the default Mikrotik "Connection Tracking" settings.http://prntscr.com/cmnou5Again, take this lightly as I am by no means an expert, however, this is the kind of stuff I would start looking at. If you were good at Wireshark, you may be able to confirm that the issue in on one side or another (client vs 2600) for example.
Logicwrath Posted September 26, 2016 Report Posted September 26, 2016 Here is a 3CX article that talks about the BLF handshake process:http://www.3cx.com/blog/voip-howto/busy-lamp-field/Additionally, the Yealink handsets can create a pcap that you can open in wireshark, so if you have a recurring issue on a specific phone you can turn that on and then open the pcap in wireshark.Wireshark has some built in tools for checking the flow of SIP traffic. Looks kind of like this: http://prntscr.com/cmnugvAgain, I am not an expert.
Rick Guyton Posted September 26, 2016 Author Report Posted September 26, 2016 Hrm, I don't think it's router config. One of my clients is using a highly standardized router & firmware across all their locations. They have no issues at most locations. But, one does so regularly.
Rick Guyton Posted September 26, 2016 Author Report Posted September 26, 2016 Thanks, I really liked that 3cx article. It'll tell me what I'm looking at in a PCAP. I'm off to get a PCAP now. Thanks!
Tuly Posted September 27, 2016 Report Posted September 27, 2016 Oh yeah BLF is one of the most hardest issues to troubleshoot, looks like a small thing but it's really hard to troubleshoot, I've already given up on the concept of not having BLF issues at all, what we are doing is we are making a change in our yealink Global file, (same on grandstream) one change that requires a phone reboot.... so that way every second or third week all phones are rebooted, on our experience by rebooting the phones once a month we have no BLF issues, I don't believe this is a router UDP timeout issue if it was your client wouldn't have been able to receive calls
esoare Posted September 27, 2016 Report Posted September 27, 2016 I am coming to that conclusion as well.... I have Yealinks at a customer site, 16 phones, with ParkSlot BLF's *3(101-105) + 2 Attendant Consoles with BLF's for each user presence. Last time it happened (3 days ago) I added a phone to the customer + changed the Combo keys on the attendant consoles to add that User's Presence... Not sure if it was co-incidence with timing, but I ended up "restarting" the phone from the Advanced Provisioner module, and everything came up to snuff... Can that Global File be modified Through the RPS configs?
Tuly Posted September 27, 2016 Report Posted September 27, 2016 not sure about RPS we are not using it yet, (but I'm sure you could)on yealink we change this line from 3 to 2 and backsyslog.log_level = 3and Grandstream we change Anything in this field,<P146>GXP1</P146>(Not every change in a provisioning file will trigger the phone to reboot, only specific ones like a network change, or core setting changes like the syslog level)It's not a solution it is just a workaround, it's so difficult to troubleshoot it because I'm sure there's nothing wrong on the server side, if there was anything wrong, it would be wrong even after a reboot of the phone, I'm putting more blame on the phone, but not total blame, if the phone had an issue with a UDP timeout, or any other NAT router setting, you shouldn't receive phone calls at that time, or BLF should not work even after a restart, or it should start working again when the phone resubscribes,
Tuly Posted September 28, 2016 Report Posted September 28, 2016 i just tested the yealink RPS, and i cannot fine a way to change any setting with RPS, there is no global file option there....... if so what is the purpose of the RPS? to save the 30 seconds it takes to put in the kazoo provisioning server URL?
Anthony Goss Posted September 28, 2016 Report Posted September 28, 2016 It allows you to ship phones directly to the site from your distributor. They plug it in, and boom. they are up. Zero Touch Provisioning. Also, if a phone gets messed up because a user was playing with it.... just hold the ok button down and reset it to factory, pulls the re-direct from yealink, booo ya'. But yealink's RPS has been know to get stupid and not respond to some firmware versions. The Polycom ZTP works much better. And DHCP options works best of all. Polycom and Yealink both have custom DHCP options.
Tuly Posted September 29, 2016 Report Posted September 29, 2016 so anyone knows what the BLF numbers and the PR stands for?
Tuly Posted September 29, 2016 Report Posted September 29, 2016 ok the BLF number means how many phones are subscribing to that BLF or Park,
Rick Guyton Posted September 29, 2016 Author Report Posted September 29, 2016 Yep, got word from 2600 the other day. The number is the number of phones subscribers to that entry. WMI is message waiting indicator, BLF is busy lamp field and PR is for presence. Basically different status indicator protocols. There's no way of seeing what the status "should be" according to the servers.
David Durik Posted September 30, 2016 Report Posted September 30, 2016 Rick, we ran into BLF issues at a customer's site. It turns out that we were load balancing the calls across the Kazoo cluster out of multiple data centers. The replication delay and which cluster was handling each call was causing havoc with BLF. Once we pointed the user to just one instance, with failover to the other in case of an emergency, the BLF issue cleared up. It is because the same system is now sending the clearing the BLF notifications. I can get you a more technical explanation from my NOC team if necessary. Hope that helps.
esoare Posted October 3, 2016 Report Posted October 3, 2016 I wonder if that is what is happening on the Hosted 2600hz side of things... David Durik, your running your own cluster?
Rick Guyton Posted October 3, 2016 Author Report Posted October 3, 2016 Hi David, thanks for your feedback. These particular clients are all on the same cluster. But, you are defiantly correct, BLF problems are vastly reduced by pointing all phones in a domain to the same cluster.
Recommended Posts