Administrators Darren Schreiber Posted July 13, 2016 Administrators Report Posted July 13, 2016 Hi everyone, I wanted to provide a quick update as we've been asked a few times for more transparency and information on our number management strategy. Our goal is always to provide the best service and the best ease-of-use possible for running your VoIP services, while also providing good pricing to allow you to be competitive. It is my belief, being in the industry a very long time now, that we're doing that. However, I don't think we always paint a great picture of what we're up to. So hopefully this post will help. Over the past month, those of you who have numbers via us may have experienced issues with some of our providers. Namely, we've had two complete failures with Peerless Networks and one severe service degradation with bandwidth.com . Folks have been notably frustrated with this. First, a bit of background. There has been some discussion primarily about Peerless, a carrier most people don't know. Some people contacted us stating "I thought you only use Tier 1 Providers" and so on. I want to make sure everyone is aware that there's no strict definition of what a "Tier 1" provider is. The way we define it is generally someone who has their own SS7 interconnects and is a CLEC in multiple regions in the country (if not every region). This allows us to have a "buck stops here" attitude towards these carriers because they are ultimately responsible for the connection to the PSTN network (and not reselling someone else's service). A lot of people have heard of bandwidth.com in relation to Tier 1 carriers. bandwidth.com is well known because they offer retail, small business services as well as wholesale. We deal only with wholesale vendors in general. Peerless is another wholesale vendor (they power many minutes for T-Mobile, Skype, Google Voice, etc.) but they are lesser known because they don't deal with small contracts. Level3 is another example of a Tier1 carrier, as is O1, Onvoy, Windstream/Paetec, TNCI, Impact Telecom and a few others. We use ALL these carriers, and we always have. We don't provide currently provide transparency into which carriers we use namely because it's expensive to do so. Our attitude was previously that we were doing a favor to you by doing the work to deal with these guys. For example, two years ago one of our providers went bankrupt. We were able to seamlessly move everyone's numbers to a new carrier without anyone even having to deal with it. This is a huge time saver for you, but also for us because we didn't need to write letters and answer 200 questions while literally the clock was ticking to save everyone's phone numbers. There is a lot that goes on "behind the scenes" regarding contract negotiation, services overlaid on top, etc. that many of you don't see that isn't quite as simple as just "signing up". That said, it became clear after the last few issues that people want more transparency. I hear that and we will work toward that goal. Let me start by explaining a few items on the past few outages. Then I'll explain what we plan to do going forward.For the Peerless issue:Since the outage, we have been working with Peerless on this and we are going to be announcing a major routing change with them. It will impact all customers and will be scheduled. Peerless had two failures with boards in their Sonus equipment. This honestly should not have been a big deal, this will happen. However the big question is why there wasn't a successful failover to alternate equipment . We have been exploring that with Peerless. We've learned a bunch. When we signed up with Peerless eons ago they provided us multiple IPs they route to/from. We have this same setup with every provider. We discussed their infrastructure at length with them (and other folks we know in the industry and trust) and audited their setup and did an extensive interop. We tested features ranging from DTMF to failover. We believed all items were redundant, and were told the same. In reality, Peerless had routed (either accidentally or simply a misunderstanding, we're not sure) our numbers to what they refer to as a single trunk group. We were not aware of this and the assignment of multiple IPs for routing gave us the impression the routes were to different equipment, so we did not dig in further. During our negotiations they told us multiple trunk groups were available, but somehow this wasn't configured. The last two outages are when we dug into this in detail with them and became aware of this. While we do infact have multiple IPs in and out of Peerless, the routes were effectively hitting the same equipment after reaching Peerless. This is not a standard setup for Peerless and nobody is exactly sure why it was setup this way, seems to have been an oversight in configuration of some sort. Unfortunately we can't see their internal configuration, so we found out about this the hard way (obviously). This effectively means numbers were mapped to a single region and a single physical board. This is obviously unacceptable. Now that we've uncovered what happened we've been working to reconfigure the zones with them and they've been cooperative. Unfortunately, it does take some time. We are now working on properly reconfiguring our setup with them and hope to have this resolved sometime next week. Hope that helps shed light on the issue.For the Bandwidth issue:This is actually one of the reasons I don't like bandwidth.com. Despite their popularity among many of you guys because they have coverage in 50 states and an easy number management interface and fast porting, they have only three POPs. This means any transport issues in one of their three locations screws EVERYTHING up. They don't have the ability to failover the calls when that happens on a per-number or per-route basis - it's all or nothing - so we have to wait around for them to act. In some cases, we also have to reconfigure things (as was the case in this issue)Last week, they had a severe BGP routing issue that crippled their audio quality. They also don't seem to monitor this, so this went on for several hours before they actually acted. Finally they adjusted their BGP routing but not until they received a ton of complaints. Afterward they sent out a notice of the issue and it's resolution.As a general note, we don't use bandwidth.com for outbound calling so only inbound calling was effected. This is because bandwidth.com, to save money on outbound termination (but still charge you), tries to keep calls on their network. This leads to DTMF issues because they don't always normalize the audio, and they also have a terrible LCR routine that has awful post-dial-delay issues, which we got tired of chasing down. They are unable to 503 calls to areas they don't cover, so we've abandoned them for termination (outbound).Other CarriersThe reality is every single carrier has their strengths and weaknesses. Bandwidth.com seems to hate IA, VT and IL, they have constant outages there, but we never have issues in NYC or Texas for example. We have issues in Sacramento with call completion with them, too. O1 is rock solid in California. Level3 is generally excellent but expensive, and when they go down, it's usually all over the place.Ironically, when we have an outage with a carrier, most people's reaction is "I'm leaving!". They don't look at the overall stats. We've had many instances where Murphy's Law applies, and the week we finally port a customer away from Carrier X to Carrier Y, Carrier Y (who hasn't had an issue in ages) has an outage. Heh. Not sure what to do about this.So, alas, just try to keep in mind that there is no perfect solution to inbound origination redundancy right now. Our theory is that by spreading around numbers to various providers that are Tier 1, when there is an outage it's at least not everything. In addition, we've started embracing the strategy of having half a client's numbers on one carrier, half on another. This mitigates the perception to a customer that "everything is down" which, believe it or not, can be helpful, even though they're obviously still mad when there's an outage.But I Thought Everything Was Redundant?Wholesale VoIP services are still a box, plugged into the phone network, in a datacenter somewhere. Based generally on population you can put more boxes in for more redundancy. But at some point the cost is prohibitive and it's not competitive. So in rural IA it doesn't make sense to put multiple boxes and multiple OC3s if only 100 people are actually using them - the cost would be like $1/minute instead of $0.01/minute. Nobody would pay that.So ALL VoIP carriers assess their redundancy based on geographic location, population, and how many customers they expect to have there. EVERY SINGLE ONE does this - there is NO carrier who just blindly puts the same amount of redundancy in every spot. It doesn't make sense to do so.That said, some carriers piggyback off bigger provider's SS7 -> IP solutions so they can have better uptime without providing the equipment. They then get their pricing down in volume, though it still may be higher. Unfortunately this literally changes month to month based on demand and contracts, so it's tough to "chase" this. But understand that this is how things work. It's all about the $$ and volume in wholesale.Going ForwardWe've always been proud of buying our services based on quality, NOT price. A lot of people compare the DID prices they get from other providers to ours but don't factor in the fact that we bundle in CNAM dips, taxes and normalize our Tier and porting fees across all DID levels. However, as our service has grown, more and more people seem to be unable to do the math to compare things properly. In addition, due to outages and other reasons, people have started having preferences as to which providers they want. Some prefer bandwidth.com, some O1, some Level3, etc.So, we're taking the following steps (and it will take a while to get there, but we're doing them):* De-couple CNAM dip fees from the number fees. This will allow us to lower our DID prices so the perception is that they're more competitive with the general rates you get elsewhere (although, we've already run the math, and actually this will really be a rise in rates because the CNAM dips really do add up, but alas, can't win on perception, so we'll go this route - plus it's more transparent)* Allow multi-tier pricing, per carrier, so that the pricing is less normalized and more based on the carrier and region where you're buying the number* Allow you to choose the carrier you port to, for both purchasing new numbers and porting* Allow you to see which services are available on each carrier. When we finally roll out SMS, it will be tied only to specific carriers, so you can choose numbers with SMS or without, which will impact carrier and pricing selectionsWe have a ways to go before we have all the above but we are working on it.I hope this provides more transparency into what we're up to. You are always welcome to move to the Bring-Your-Own-Carrier model as well. Please note that while the grass is always greener on the other side, you take on a lot of responsibility with BYOC - including that we charge you support credits for any setup or debugging time spent working with your carrier, regardless of fault, because we are unable to manage individual people's carriers "en masse" and thus it drives up our labor costs. We're still working to automate our systems so you can download PCAPs on your own so that you don't have to contact us in the first place, but that's not in place yet.Please also note, if you subscribe to our bring-your-own-carrier service, we'll be enhancing that with some new features, too. That said, you may have noticed that you can use the Inbound Caller ID Name feature on that service and it's not charging you. This is actually a bug, the system isn't currently smart enough to NOT allow this. We're changing that as well, so CNAM dips will be charged to bring-your-own-carrier customers as well. You can of course turn this off if you want to do this elsewhere.I hope this helps clarify our position on the number management services.
Rick Guyton Posted July 15, 2016 Report Posted July 15, 2016 Hey Darren, on the CNAM dip bit, I'm assuming you cache responses so that you don't have to dip every single time. Is that correct? If so, will the billing system be able to differentiate between cached dips and non-cached dips as to not charge for the former? Also, will a reseller be able to define the max life time of a cached CNAM on their respective systems so that we can balance responsiveness to changes VS cost?
Administrators Darren Schreiber Posted July 18, 2016 Author Administrators Report Posted July 18, 2016 Hey Rick, We do have the ability to cache and add a tag to the CDR if it's cached. Some of our upstream carriers forbid caching from a contractual standpoint, but if it were to happen, it's a configurable cache time and size (if size is exceeded items will leave the cache). Initially the CNAM config options will be system-wide and not something you administer. Honestly I'd be surprised if "tweaking" the settings does anything for your bill anyway, you'd have to have a lot of call volume for it to make a huge difference. Note that CNAM will not be available with Bring-Your-Own-Carrier as we have no technology to configure and manage external providers for CNAM at this time. We are debating whether we can charge for it as an overlay service or if it will just be turned off altogether for people who BYOC and they can do it upstream via their provider.
FASTDEVICE Posted July 31, 2016 Report Posted July 31, 2016 @Darren, I'm running into an interesting issue with SMS providers and the choice of underlying carrier that 2600hz assigns. The most predominant landline DID enabling SMS service is Zipwhip and they are rejecting anyone using Bandwidth.com. It's not Zipwhip per say, but Bandwidth that is causing the problem. Whenever Zipwhip converts a landline DID to their service, within weeks Bandwidth reverts the SMS service back to them. Therefore, Zipwhip has given up and refuses to enable anyone on Bandwidth for obvious reasons. In fact, if Bandwidth doesn't change their policy, I'm inclined to ask not to use them going forward. I believe it's small things like this where the option to select the carrier makes sense.
Karl Stallknecht Posted July 31, 2016 Report Posted July 31, 2016 We also use ZipWhip and ran into the same issue. We use Peerless for our customers who want to use ZipWhip and have had zero problems so far (knock on wood). Everyone else is on Bandwidth though unless they want SMS/MMS.
FASTDEVICE Posted July 31, 2016 Report Posted July 31, 2016 @Karl, do you request existing clients be transferred to Peerless? And, if so, does the transfer go smoothly?
Karl Stallknecht Posted August 1, 2016 Report Posted August 1, 2016 Yep, just ask 2600hz to port the DID to Peerless. Never had any issues with this. It's all seamless and ZipWhip can have it setup in literally minutes after the port completes.
Rick Guyton Posted August 12, 2016 Report Posted August 12, 2016 Hey Darren, can we please get an update on the redundancy issue with peerless? Has this been resolved?
Karl Stallknecht Posted August 12, 2016 Report Posted August 12, 2016 I think 2600hz's official response is that they won't guarentee the reliability and it's best effort/you're taking a risk that they won't provide support with. At least that's what I last heard and I assume nothing has changed... From our own experience the Peerless solution has worked just fine with no problems.
Rick Guyton Posted August 15, 2016 Report Posted August 15, 2016 I'm confused. Peerless is one of their providers... Why wouldn't they fully support them? Darren was talking about why peerless failed the way it did above and siad they were resolving it. I was just wanting a status update on that.
Karl Stallknecht Posted August 15, 2016 Report Posted August 15, 2016 Oh, sorry, I thought you were talking about SMS. SMS through a company like ZipWhip and using Peerless for voice is the thing that isn't supported.
Rick Guyton Posted August 15, 2016 Report Posted August 15, 2016 LOL, ok. Thanks for the clarification. I was raging for a minute there I'll admit. :)
Rick Guyton Posted August 23, 2016 Report Posted August 23, 2016 So..... about that redundancy issue with Peerless... Resolved???
Administrators Darren Schreiber Posted August 23, 2016 Author Administrators Report Posted August 23, 2016 Yes, the initial failure issue should be resolved. HOWEVER, we're retooling our routing strategy with ALL our carriers and I was hoping to update this thread all at once with that info once it's finalized.So, stay tuned. More to come.
Rick Guyton Posted January 11, 2017 Report Posted January 11, 2017 Hey Darren, any update on this? I know a few months ago you were still working on it. Curious where it's gotten.
Administrators Darren Schreiber Posted January 11, 2017 Author Administrators Report Posted January 11, 2017 All Peerless items should now be redundant. HOWEVER, we're working on rolling out Geo-IP based routing with them and all our other carriers which further enhances this, so I was waiting until that was done for a full update.
Recommended Posts