Jump to content
KAZOOcon: hackathon signup and details here! ×

BLF issues...


Rick Guyton

Recommended Posts

We seem to have never ending issues with BLF. I understand that some at 2600hz kind of write this off and think it's just for "old line key" users. And that is true to some extent. But, none the less it is vitally critical to the operation of some of our client's businesses. I've had two clients call me today to tell me that they are having more BLF issuses. So I believe there is a problem once again. But, the frustrating part to me is that I don't know how I'm supposed to even start diagnosing these issues. Can someone, anyone, please give me a concrete method to diagnose and resolve problems with BLF?

I know that there's the presence tab under the debugging tool. But, I have know idea wat BLF states 1,2,3,4 or 5 mean or what PR stands for. Even if I did though, what about situations where some phones show different lights than others? Obviously, some phones will be in line with what the server's status is, while others won't. What can cause that? What can be done to prevent it?
Link to comment
Share on other sites

I need to be careful as I am not a subject matter expert on this.  However, this type of thing is likely diagnosed using a tool like wireshark.  If you have a good understanding of the handshake you identify problems in the SIP responses.

I suspect in some cases your firewall can have an affect on this as well.  Some firewalls do better than others with VoIP & NAT.  You may also want to look at your NAT timeout values in the firewall.  I am not sure if BLF registrations and re-registration times are different than the handsets register timeout.

https://doc.pfsense.org/index.php/VoIP_Configuration

The pfsense article above mentions UDP timeout.  If your stateful firewall is timing out the NAT connection and not re-registering, I could imaging how that would cause a problem.

You may consider flushing the connection tracking table or reviewing the timeout and registration settings.

Here is a screenshot of the default Mikrotik "Connection Tracking" settings.
http://prntscr.com/cmnou5

Again, take this lightly as I am by no means an expert, however, this is the kind of stuff I would start looking at.  If you were good at Wireshark, you may be able to confirm that the issue in on one side or another (client vs 2600) for example.
Link to comment
Share on other sites

Here is a 3CX article that talks about the BLF handshake process:

http://www.3cx.com/blog/voip-howto/busy-lamp-field/

Additionally, the Yealink handsets can create a pcap that you can open in wireshark, so if you have a recurring issue on a specific phone you can turn that on and then open the pcap in wireshark.

Wireshark has some built in tools for checking the flow of SIP traffic.  Looks kind of like this: http://prntscr.com/cmnugv

Again, I am not an expert.
Link to comment
Share on other sites

Oh yeah BLF is one of the most hardest issues to troubleshoot,  looks like a small thing but it's really hard to troubleshoot,  I've already given up on the concept of not having BLF issues at all, what we are doing is we are making a change in our yealink Global file, (same on grandstream) one change that requires a phone reboot....  so that way every second or third week all phones are rebooted, on our experience by rebooting the phones once a month we have no BLF issues,

 I don't believe this is a router UDP timeout issue if it was your client wouldn't have been able to receive calls
Link to comment
Share on other sites

I am coming to that conclusion as well....
I have Yealinks at a customer site, 16 phones, with ParkSlot BLF's *3(101-105) + 2 Attendant Consoles with BLF's for each user presence.

Last time it happened (3 days ago) I added a phone to the customer + changed the Combo keys on the attendant consoles to add that User's Presence... Not sure if it was co-incidence with timing, but I ended up "restarting" the phone from the Advanced Provisioner module, and everything came up to snuff...

Can that Global File be modified Through the RPS configs?
Link to comment
Share on other sites

not sure about RPS we are not using it yet, (but I'm sure you could)

on yealink we change this line from 3 to 2 and back
syslog.log_level = 3


and Grandstream we change Anything in this field,
<P146>GXP1</P146>

(Not every change in a provisioning file will trigger the phone to reboot,  only specific ones like a network change, or core setting changes like the syslog level)


It's not a solution it is just a workaround,  it's so difficult to troubleshoot it because I'm sure there's nothing wrong on the server side,  if there was anything wrong, it would be wrong even after a reboot of the phone,  I'm putting more blame on the phone, but not total blame,  if the phone had an issue with a UDP timeout, or any other NAT router setting,  you shouldn't receive phone calls at that time, or BLF should not work even after a restart, or it should start working again when the phone  resubscribes, 
Link to comment
Share on other sites

It allows you to ship phones directly to the site from your distributor.  They plug it in, and boom. they are up. Zero Touch Provisioning.  Also, if a phone gets messed up because a user was playing with it.... just hold the ok button down and reset it to factory, pulls the re-direct from yealink, booo ya'.   But yealink's RPS has been know to get stupid and not respond to some firmware versions.  The Polycom ZTP works much better.  And DHCP options works best of all.  Polycom and Yealink both have custom DHCP options.
Link to comment
Share on other sites

Yep, got word from 2600 the other day. The number is the number of phones subscribers to that entry. WMI is message waiting indicator, BLF is busy lamp field and PR is for presence. Basically different status indicator protocols. There's no way of seeing what the status "should be" according to the servers.
Link to comment
Share on other sites

Rick, we ran into BLF issues at a customer's site.  It turns out that we were load balancing the calls across the Kazoo cluster out of multiple data centers.  The replication delay and which cluster was handling each call was causing havoc with BLF.  Once we pointed the user to just one instance, with failover to the other in case of an emergency, the BLF issue cleared up.  It is because the same system is now sending the clearing the BLF notifications.  I can get you a more technical explanation from my NOC team if necessary.  Hope that helps.
Link to comment
Share on other sites

×
×
  • Create New...