Jump to content

Presenting kazoo-ansible (Ansible Roles for Kazoo)


Tom

Recommended Posts

I have developed an MIT-licensed set of Ansible roles for Kazoo called kazoo-ansible. The goal of kazoo-ansible is to provide the entire community with a way to easily manage a repeatable installation of a Kazoo cluster, so we can spend less time setting up and maintaining our Kazoo installations and more time on the unique offerings of our VOIP endeavors.

kazoo-ansible has the following features:

  • Automatically clusters CouchDB, Freeswitch, Kamailio, and Kazoo
  • Let's Encrypt TLS certificate generation for Monster UI, including support for multiple Monster UI hosts
  • Uses CouchDB instead of BigCouch
  • Splits up roles for CouchDB, Freeswitch, Kamailio, Kazoo, Monster UI, and RabbitMQ to allow lots of cluster custimization
  • Publishes roles to Ansible Galaxy to allow easy integration into custom playbooks
  • Easy to install using included bootstrap scripts

I have added installation documentation to make the installation more intuitive.

kazoo-ansible is currently in a pre-release state. It is feature-complete and working for my needs; however, I'd really like to get feedback from the community before I release 1.0.0.

Check out kazoo-ansible on GitHub: https://github.com/kazoo-ansible/kazoo-ansible

 

Edited by Tom (see edit history)
Link to comment
Share on other sites

  • 2600Hz Employees

This looks really nice!

I would consider adding, either on the main README or in an INSTALL or similar file, copy/paste-able shell commands for each of the installation steps. What is intuitive and obvious to you may not be to others and having an example shell session will provide more feedback as people try setting this up for themselves. For instance, if I wanted to play with this, I would need to dig up how to add the user for SSH and sudo, running the Kazoo commands, etc.

If there are more hooks into Kazoo that would make life easier for setting this up, do let us know! Anything we can do to facilitate easier setup and maintenance, we'd like to know about.

Thanks again for this work and sharing it with the community.

 

Link to comment
Share on other sites

Thanks for the suggestion to improve the documentation. I always find documentation to be more difficult than the actual programming :) 

Kazoo actually made it very easy to automate this because I was able to use the sup command to perform the clustering. Some of the sup commands are a bit easier for humans to read than machines, but a quick regex was able to fix that for me. Clustering CouchDB was actually the only hard part.

Link to comment
Share on other sites

I am currently using your playbooks. I had to disable firewalld settings because i got a couchdb access error .

fatal: [10.0.1.1]: FAILED! => {"changed": false, "failed": true, "msg": "<urlopen error [Errno 111] Connection refused>"}
fatal: [10.0.1.2]: FAILED! => {"changed": false, "failed": true, "msg": "<urlopen error [Errno 111] Connection refused>"}
 

Could you elaborate on that.

Also could you tell me more about the role of kazoo domain variable in group vars

 

Link to comment
Share on other sites

The common role should add an exception so that each node in the cluster can communicate. What does /etc/firewalld/zones/kazoo-zone.xml look like on one of the servers on your cluster? Does it match the IP address that will be resolved for the server names you used in /etc/ansible/hosts?

The kazoo domain variable is used to create an Nginx configuration for the domain where Monster UI will be hosted. For example, if your domain is monsterui.example.com, the kazoo domain variable would be monsterui.example.com.

Link to comment
Share on other sites

ok  so the zones file looks fine. all proper ips are there. Even after removing the  firewalld commands i still got the error

TASK [kazoo-ansible.couchdb : Cluster CouchDB] *********************************************************************************************
fatal: [10.0.1.1]: FAILED! => {"changed": false, "failed": true, "msg": "<urlopen error [Errno 111] Connection refused>"}
fatal: [10.0.1.2]: FAILED! => {"changed": false, "failed": true, "msg": "<urlopen error [Errno 111] Connection refused>"}

however waiting a bit and running it again it went through.

Maybe couchdb needs some time to startup ? (My machine is dual xeon E5 with ssds)

 

Link to comment
Share on other sites

Same kind of error with freeswitch.

RUNNING HANDLER [kazoo-ansible.freeswitch : Gracefully Restart FreeSwitch] *****************************************************************
fatal: [10.0.1.1]: FAILED! => {"changed": true, "cmd": "fs_cli -x 'fsctl shutdown asap restart'", "delta": "0:00:00.010998", "end": "2017-10   -06 17:01:25.362917", "failed": true, "rc": 255, "start": "2017-10-06 17:01:25.351919", "stderr": "[ERROR] fs_cli.c:1659 main() Error Connec   ting [Socket Connection Error]", "stderr_lines": ["[ERROR] fs_cli.c:1659 main() Error Connecting [Socket Connection Error]"], "stdout": "",    "stdout_lines": []}
fatal: [10.0.1.2]: FAILED! => {"changed": true, "cmd": "fs_cli -x 'fsctl shutdown asap restart'", "delta": "0:00:00.014279", "end": "2017-10   -06 17:01:25.404357", "failed": true, "rc": 255, "start": "2017-10-06 17:01:25.390078", "stderr": "[ERROR] fs_cli.c:1659 main() Error Connec   ting [Socket Connection Error]", "stderr_lines": ["[ERROR] fs_cli.c:1659 main() Error Connecting [Socket Connection Error]"], "stdout": "",    "stdout_lines": []}
        to retry, use: --limit @/root/kazoo-ansible/site.retry
I ran it again and it dissapeared

Link to comment
Share on other sites

It does sound like the Ansible script is running quicker than the components can come online, but that's surprising because your server is really good :).

I tested this on a few VMs running on a laptop and Google Cloud. I'd like to think of how I might be able to replicate your test case. Were you able to complete the installation?

Link to comment
Share on other sites

Still working on it.

Trying to get past this right now

TASK [kazoo-ansible.kazoo : Install Kazoo] *************************************************************************************************
failed: [10.0.1.2] (item=[u'kazoo-applications-4.1-34.el7.centos', u'kazoo-application-*-4.1-34.el7.centos']) => {"changed": true, "failed":                             true, "item": ["kazoo-applications-4.1-34.el7.centos", "kazoo-application-*-4.1-34.el7.centos"], "msg": "Error: Package: kazoo-applications                            -4.1-34.el7.centos.noarch (2600hz-stable)\n           Requires: kazoo-core = 4.1-34.el7.centos\n           Available: kazoo-core-4.0-0.el7.c                            entos.x86_64 (2600hz-stable)\n               kazoo-core = 4.0-0.el7.centos\n           Available: kazoo-core-4.0-1.el7.centos.x86_64 (2600hz                            -stable)\n               kazoo-core = 4.0-1.el7.centos\n           Available: kazoo-core-4.0-2.el7.centos.x86_64 (2600hz-stable)\n                                           kazoo-core = 4.0-2.el7.centos\n           Available: kazoo-core-4.0-3.el7.centos.x86_64 (2600hz-stable)\n               kazoo-core = 4.                            0-3.el7.centos\n           Available: kazoo-core-4.0-4.el7.centos.x86_64 (2600hz-stable)\n               kazoo-core = 4.0-4.el7.centos\n                                       Available: kazoo-core-4.0-5.el7.centos.x86_64 (2600hz-stable)\n               kazoo-core = 4.0-5.el7.centos\n           Available: ka                            zoo-core-4.0-6.el7.centos.x86_64 (2600hz-stable)\n               kazoo-core = 4.0-6.el7.centos\n           Available: kazoo-core-4.0-7.el7.c                            entos.x86_64 (2600hz-stable)\n               kazoo-core = 4.0-7.el7.centos\n           Available: kazoo-core-4.0-8.el7.centos.x86_64 (2600hz                            -stable)\n               kazoo-core = 4.0-8.el7.centos\n           Available: kazoo-core-4.0-9.el7.centos.x86_64 (2600hz-stable)\n                                           kazoo-core = 4.0-9.el7.centos\n           Available: kazoo-core-4.0-10.el7.centos.x86_64 (2600hz-stable)\n               kazoo-core = 4                            .0-10.el7.centos\n           Available: kazoo-core-4.0-11.el7.centos.x86_64 (2600hz-stable)\n               kazoo-core = 4.0-11.el7.centos\n     

i ended up changing the play command to yum install kazoo-applications kazoo-applications-*  removing the versioning and el7.centos

Link to comment
Share on other sites

I thought hard-coding the versioning would make it more stable, but it seems that the dependencies break when you aren't using the latest version. I will update the roles tonight to use the latest version, and you'll have to update them. I'll post back here when I've done so.

Thank you so much for testing this and helping me identify issues I haven't run into.

Link to comment
Share on other sites

No problem !  A good ansible install will help all of us.  Thanks for taking the time to make this. I am learning ansible as i go and understanding the clustering aspects of kazoo through your work!

I did get an error for selinux whihc i had disable. I wonder if its possible to define if not to do certain things in ansible . some thing like:  ansible-playbook site.yml -nofirewall -noselinux

I also created a requirements.yml file with all the playbooks so i can download and install them using  ansible-galaxy install -r  requirements.yml

I usually have it unsecured in test environment so i can play around with different settings.

 

Got this error a few times. Just kept rerunning the script and it went ok after 2-3 tries

TASK [kazoo-ansible.kazoo : Cluster Freeswitch] ************************************************************************************************************************
fatal: [10.0.1.1]: FAILED! => {"changed": false, "failed": true, "module_stderr": "Shared connection to 10.0.1.1 closed.\r\n", "module_stdout": "Traceback (most recent call last):\r\n  File \"/tmp/ansible_4H0Vm1/ansible_module_freeswithcluster.py\", line 42, in <module>\r\n    main()\r\n  File \"/tmp/ansible_4H0Vm1/ansible_module_freeswithcluster.py\", line 39, in main\r\n    module.fail_json(msg=error)\r\n  File \"/tmp/ansible_4H0Vm1/ansible_modlib.zip/ansible/module_utils/basic.py\", line 1997, in fail_json\r\n  File \"/tmp/ansible_4H0Vm1/ansible_modlib.zip/ansible/module_utils/basic.py\", line 1977, in _return_formatted\r\n  File \"/tmp/ansible_4H0Vm1/ansible_modlib.zip/ansible/module_utils/basic.py\", line 414, in remove_values\r\n  File \"/tmp/ansible_4H0Vm1/ansible_modlib.zip/ansible/module_utils/basic.py\", line 414, in <genexpr>\r\n  File \"/tmp/ansible_4H0Vm1/ansible_modlib.zip/ansible/module_utils/basic.py\", line 425, in remove_values\r\nTypeError: Value of unknown type: <type 'exceptions.IOError'>, Command failed: {'EXIT',{noproc,{gen_server,call,[ecallmgr_fs_nodes,{connected_nodes,false}]}}}\r\n\r\n\r\n", "msg": "MODULE FAILURE", "rc": 0}
ok: [10.0.1.2]
 

now i have everything installed

Have two problems

monsterui is working on https but  nginx for http is not redirecting to https to monsterui. Turns out i have to enter my ip in http conf file for nginx since  i dont have dns.

 

Second sup command is not working

Freeswitch is connected to both ecallmgr . Kamailio see both freeswitch as dispatchers. Cocuhdb has all the required dbs.

epmd: up and running on port 4369 with data:
name kazoo-rabbitmq at port 25672
name freeswitch at port 8031
name ecallmgr at port 11501
name kazoo_apps at port 11500
name couchdb at port 33480
 

 

but i get this error for sup command. 

Failed to connect to service kazoo_apps@xyz.pbx with cookie change_me
  Possible fixes:
    * Ensure the Kazoo service you are trying to connect to is running on the host
    * Ensure that you are using the same cookie as the Kazoo node, `sup -c <cookie>`
    * Verify that the hostname being used is a Kazoo node
kazoo_apps is not running!

/etc/kazoo/core/config.ini has the right cookie set for everything

requirements.yml

Link to comment
Share on other sites

ok so i have to run the sup command with -c "my_cookie"  to make it work. seems like the default cookie did not get set to the new one.

Also having trouble loggin to couchdb from browser with the given user. get the following error. :

{gen_server,call, [config, {set,"couch_httpd_auth","secret", "c5f3fe1a02a91777514373a7527b45c5",true,nil}, 30000]}

 

also through terminal  i get following errors on couchdb

[root@navy1 ~]# curl -X PUT http://couchdb:password@localhost:5984/test1234
{"error":"error","reason":"internal_server_error"}
[root@navy1 ~]# curl -X PUT http://couchdb:password@localhost:5984/test1234
{"error":"file_exists","reason":"The database could not be created, the file already exists."}
 

I am on lan and without a dns server so that might explain some stuff. I also ran this playbook multiple times which might also break some stuff. It's pretty late here so thats it for today from me.

I really appreciate the couchdb clustering and the rest of the clustering stuff. Reading through your playbooks my understanding of kazoo setup deepened to another level in record time. This also gave me a reason to finally learn and start using ansible!

Thanks for your efforts. Lets make this more robust!

Edited by Uzair Mahmud (see edit history)
Link to comment
Share on other sites

i spun up a bunch of vms for testing and only ran the script fresh on them. 

ok so i got rid of all the errors but adding pauses before those commands. 

Also figured out how to use tags in ansible to control what parts of the scripts to run and whihc parts to exclude.

One problem is i cant login to fauxton over interface 5984 or 5986. i get the following error:

CRASH REPORT Process config (<0.5859.0>) with 0 neighbors exited with reason: no match of right hand value {error,eacces} at config_writer:save_to_file/2(line:38) <= config:handle_call/3(line:242) <= gen_server:try_handle_call/4(line:629) <= gen_server:handle_msg/5(line:661) <= proc_lib:init_p_do_apply/3(line:240) at gen_server:terminate/7(line:826) <= proc_lib:init_p_do_apply/3(line:240); initial_call: {config,init,['Argument__1']}, ancestors: [config_sup,<0.87.0>], messages: [], links: [<0.88.0>], dictionary: [], trap_exit: false, status: running, heap_size: 17731, stack_size: 27, reductions: 76043

 

The final problem that i am having is that sup command is not taking the cookie i specify in groupvars/all. it still tries to use the default cookie change_me.

The rest is working extremely well.

Edited by Uzair Mahmud (see edit history)
Link to comment
Share on other sites

figured out the couchdb login problem from fauxton

had to do 

chown -R couchdb:couchdb /opt/couchdb

Something is resetting the persmissions for opt/couchdb/  folder . its probably just my setup that i encountering this. 

also sup problem goes away after a restart.  User error on that one

Edited by Uzair Mahmud (see edit history)
Link to comment
Share on other sites

I was able to fix:

  • CouchDB clustering errors - Allow time for CouchDB to start before attempting to cluster it
  • FreeSwitch clustering errors - Wait 30 seconds after DB creation to give Kazoo time to fully start
  • Dependency errors - Do not hard-code Kazoo versions

CouchDB Login Problem:

  • I ran into this issue as well, and it seems to be Kazoo related. I changed the owner of the /opt/couchdb folder, and restarted kazoo. It appears that Kazoo is changing the ownership. We might need to ask the 2600hz team for help with this.

You can re-install all of the Ansible roles with --force. Let me know if you run into any more issues.

Edited by Tom (see edit history)
Link to comment
Share on other sites

Thanks for the updates. checking them out right now. 

ok so i change the cookie to  "change_me"  and thats how i got the installation to work. 

if i change the erlang cookie to anything else and install and reboot also i keep getting the error

failed to connect to service kazoo_apps@xyz.pbx with cookie change_me
 

for couchdb the installation folder could be changed. thats the only other option for now .

 

Link to comment
Share on other sites

Are you thinking that we should retarget the installation to /opt/kazoocouchdb or something like that?

I haven't been able to replicate the cookie issue on a brand new cluster I'm afraid. If I'm able to find out anything more, I'll let you know.

Link to comment
Share on other sites

Thanks,

Another thing i noticed is that my kamailio is registered into the acl with subnet /32 while i am on subnet /8. I wonder if this can be auto extracted from network settings.

 

I will do more testing on my machines to figure out the sup cookie thing. as far as i know you are supposed to change the cookie in config.ini in kazoo folder and everything there is set to the right cookie.

Edited by Uzair Mahmud (see edit history)
Link to comment
Share on other sites

Subnets:

/32 is the correct ACL, since /32 is a single IP address. Since the clustering is automated, we can ensure that only the exact Kamailio IP addresses are whitelisted.

Cookie:

The Kazoo role does change the cookie. I'm really confused why it's not working. You might have to restart kazoo-applications and kazoo-ecallmgr for the cookie file used by sup to actually be written.

Edited by Tom (see edit history)
Link to comment
Share on other sites

2 hours ago, Sean Wysor said:

What is the owner being changed to? I do not know of anything offhand that sets permissions in opt outside of /opt/kazoo.

The kazoo user takes ownership of everything under /opt:

[tnewman@kazoo opt]$ ls -la /opt
total 0
drwxr-xr-x.  4 kazoo root     34 Oct 10 01:08 .
dr-xr-xr-x. 17 root  root    224 Oct  9 23:33 ..
drwxr-xr-x.  9 kazoo couchdb 122 Oct 10 01:06 couchdb
drwxr-xr-x.  8 kazoo daemon  107 Oct 10 01:11 kazoo

Here are the steps to verify this:

  1. sudo chown -R couchdb /opt/couchdb
  2. The permissions are now correct:
    [tnewman@kazoo opt]$ ls -la /opt
    total 0
    drwxr-xr-x.  4 kazoo   root     34 Oct 10 01:08 .
    dr-xr-xr-x. 17 root    root    224 Oct  9 23:33 ..
    drwxr-xr-x.  9 couchdb couchdb 122 Oct 10 01:06 couchdb
    drwxr-xr-x.  8 kazoo   daemon  107 Oct 10 01:11 kazoo

     

  3. sudo systemctl restart kazoo-applications
  4. The permissions are incorrect again:
    [tnewman@kazoo opt]$ ls -la /opt
    total 0
    drwxr-xr-x.  4 kazoo root     34 Oct 10 01:08 .
    dr-xr-xr-x. 17 root  root    224 Oct  9 23:33 ..
    drwxr-xr-x.  9 kazoo couchdb 122 Oct 10 01:06 couchdb
    drwxr-xr-x.  8 kazoo daemon  107 Oct 10 01:11 kazoo

     

Link to comment
Share on other sites

On 10/9/2017 at 6:56 PM, Sean Wysor said:

What is the owner being changed to? I do not know of anything offhand that sets permissions in opt outside of /opt/kazoo.

The kazoo-applications and kazoo-ecallmgr startup scripts take recursive ownership of /opt/. I have submitted a pull request at https://github.com/2600hz/kazoo-configs-core/pull/8, and I hope someone can review it for me.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...