BilalAbbasi Posted October 31, 2019 Report Share Posted October 31, 2019 Hi Team, I have successfully setup kazoo-bigcouch cluster, and everything is working good except that after some time i get some shard error, and all of the cluster stops working. example: sup crossbar_maintenance create_account admin sip.mogility.cloud admin Lmkt@ptcl1234 Command failed: {'EXIT',{{badmatch,{error,<<"No DB shards could be opened.">>}},[{kz_json_schema,default_object,1,[{file,"src/kz_json_schema.erl"},{line,997}]},{kzd_accounts,new,0,[{file,"src/kzd_accounts.erl"},{line,120}]},{crossbar_maintenance,create_account,4,[{file,"src/crossbar_maintenance.erl"},{line,394}]},{sup,in_kazoo,4,[{file,"src/sup.erl"},{line,98}]},{rpc,'-handle_call_call/6-fun-0-',5,[{file,"rpc.erl"},{line,197}]}]}} It goes away when i restart every database nodes, here are my database members(4 members of couch database). [root@db1zone1 ~]# curl 127.0.0.1:5984/_membership {"all_nodes":["bigcouch@db1zone1.xxxx.cloud","bigcouch@db1zone2.xxxx.cloud","bigcouch@db2zone1.xxxx.cloud","bigcouch@db2zone2.xxxx.cloud"],"cluster_nodes":["bigcouch@db1zone1.xxxx.cloud","bigcouch@db1zone2.xxxx.cloud","bigcouch@db2zone1.xxxx.cloud","bigcouch@db2zone2.xxxxx.cloud"]} And here are my cluster configurations in local.ini file [cluster] q=1 r=3 w=3 n=4 Can you please guid me whats the issue here. Regards Abbasi Link to comment Share on other sites More sharing options...
2600Hz Employees mc_ Posted October 31, 2019 2600Hz Employees Report Share Posted October 31, 2019 Possibly permissions on the database files themselves (I think they're under /srv/dbs ?). Try accessing a single bigcouch node and see what its logging. Also, n=4 (or any even number) is problematic in a network split. It is generally good to have n be odd to make it more likely to have a majority on one side of the split. Link to comment Share on other sites More sharing options...
BilalAbbasi Posted November 4, 2019 Author Report Share Posted November 4, 2019 @mc_ Thanks lot for your time, i have checked that there were no permission issues with the dbs files(directories and subdirectories), I can see that issue does not happen since last Friday(1st-Nov-19). If that issue gets in again, i will surely look into bigcouch logs and get back to you. Regards Abbasi Link to comment Share on other sites More sharing options...
BilalAbbasi Posted November 11, 2019 Author Report Share Posted November 11, 2019 @mc_ I got issue again, and here is my bigcouch log entry [Mon, 11 Nov 2019 14:10:43 GMT] [info] [<0.2254.0>] [ecf5d584] undefined - - 'GET' /faxes/_design/faxes/_view/schedule_accounts?group_level=1&group=true&reduce=true 500 [Mon, 11 Nov 2019 14:10:43 GMT] [info] [<0.2254.0>] [ecf5d584] 45.76.23.242 127.0.0.1:15984 GET /faxes/_design/faxes/_view/schedule_accounts?group_level=1&group=true&reduce=true 500 ok 1 [Mon, 11 Nov 2019 14:10:43 GMT] [info] [<0.285.0>] [c98bdde3] 52.40.138.157 144.202.79.50:5984 GET /_utils/database.html?accounts 304 ok 1 [Mon, 11 Nov 2019 14:10:43 GMT] [info] [<0.2263.0>] [7fb8cfb1] undefined - - 'GET' / 200 [Mon, 11 Nov 2019 14:10:43 GMT] [info] [<0.2263.0>] [7fb8cfb1] 45.76.23.242 undefined GET / 200 ok 1 [Mon, 11 Nov 2019 14:10:44 GMT] [info] [<0.280.0>] [331f2827] undefined - - 'GET' /_session 200 [Mon, 11 Nov 2019 14:10:44 GMT] [info] [<0.280.0>] [331f2827] 52.40.138.157 144.202.79.50:5984 GET /_session 200 ok 1 [Mon, 11 Nov 2019 14:10:44 GMT] [info] [<0.281.0>] [4eef4492] undefined - - 'GET' /_config/query_servers/ 200 [Mon, 11 Nov 2019 14:10:44 GMT] [info] [<0.281.0>] [4eef4492] 52.40.138.157 144.202.79.50:5984 GET /_config/query_servers/ 200 ok 0 [Mon, 11 Nov 2019 14:10:44 GMT] [error] [<0.282.0>] [c675e7a5] Uncaught error in HTTP request: {error, {internal_server_error, "No DB shards could be opened."}} [Mon, 11 Nov 2019 14:10:44 GMT] [info] [<0.282.0>] [c675e7a5] Stacktrace: [{fabric_util,get_shard,3, [{file,"src/fabric_util.erl"}, {line,67}]}, {fabric,get_security,2, [{file,"src/fabric.erl"}, {line,138}]}, {chttpd_db,do_db_req,2, [{file,"src/chttpd_db.erl"}, {line,198}]}, {chttpd,handle_request,1, [{file,"src/chttpd.erl"}, {line,198}]}, {mochiweb_http,headers,5, [{file,"src/mochiweb_http.erl"}, {line,126}]}, {proc_lib,init_p_do_apply,3, [{file,"proc_lib.erl"}, {line,227}]}] [Mon, 11 Nov 2019 14:10:44 GMT] [error] [<0.285.0>] [1d214b0c] Uncaught error in HTTP request: {error, {internal_server_error, "No DB shards could be opened."}} [Mon, 11 Nov 2019 14:10:44 GMT] [info] [<0.282.0>] [c675e7a5] undefined - - 'GET' /accounts/_all_docs?limit=11 500 [Mon, 11 Nov 2019 14:10:44 GMT] [info] [<0.285.0>] [1d214b0c] Stacktrace: [{fabric_util,get_shard,3, [{file,"src/fabric_util.erl"}, {line,67}]}, {fabric,get_security,2, [{file,"src/fabric.erl"}, {line,138}]}, {chttpd_db,do_db_req,2, [{file,"src/chttpd_db.erl"}, {line,198}]}, {chttpd,handle_request,1, [{file,"src/chttpd.erl"}, {line,198}]}, {mochiweb_http,headers,5, [{file,"src/mochiweb_http.erl"}, {line,126}]}, {proc_lib,init_p_do_apply,3, [{file,"proc_lib.erl"}, {line,227}]}] Regards Abbasi Link to comment Share on other sites More sharing options...
2600Hz Employees mc_ Posted November 12, 2019 2600Hz Employees Report Share Posted November 12, 2019 Could be a ulimit issue? Check those settings, make sure you have enough file descriptors... Link to comment Share on other sites More sharing options...
BilalAbbasi Posted December 19, 2019 Author Report Share Posted December 19, 2019 I just changed that limits and lets see how it goes. Link to comment Share on other sites More sharing options...
naveed6865 Posted March 20, 2020 Report Share Posted March 20, 2020 Hello I am also facing same issue when creating the account. I get the error 07:36:17.678 [error] |896ed795c4b53ea80b05793af26ae47a|kz_couch_db:50(<0.15686.2>) failed to create database account%2F05%2F6d%2F24901ce1d911ada86ef8b34206e8: {bad_response,{500,[{<<"X-Couch-Request-ID">>,<<"7bee89c5">>},{<<"Server">>,<<"CouchDB/1.1.1 (Erlang OTP/R15B03)">>},{<<"Date">>,<<"Fri, 20 Mar 2020 07:36:17 GMT">>},{<<"Content-Type">>,<<"application/json">>},{<<"Content-Length">>,<<"51">>},{<<"Cache-Control">>,<<"must-revalidate">>}],<<"{\"error\":\"error\",\"reason\":\"internal_server_error\"}\n">>}} 07:36:17.686 [critical] |896ed795c4b53ea80b05793af26ae47a|kz_couch_util:69(<0.15686.2>) response code 500 not expected : <<"{\"error\":\"internal_server_error\",\"reason\":\"No DB shards could be opened.\",\"stack\":[\"bad entry in stacktrace\",\"bad entry in stacktrace\",\"bad entry in stacktrace\",\"bad entry in stacktrace\",\"bad entry in stacktrace\",\"bad entry in stacktrace\"]}\n">> 07:36:17.686 [error] |896ed795c4b53ea80b05793af26ae47a|cb_accounts:1457(<0.15686.2>) failed to delete services: error: function_clause are you able to fix this, I have single cluster node, This issue happened, when i changed the hostname of server and after this we are unable to add more accounts, We changed back to old hostname, but we are stucked here now, We can create phones and users fine. there is no issue, Just accounts are not being created any more. Please advise Regards Naveed Link to comment Share on other sites More sharing options...
naveed6865 Posted March 20, 2020 Report Share Posted March 20, 2020 Ok i managed to fixed this, removed the one bogus cluster node from Bigcouch DB and after that its working. Thanks Naveed Link to comment Share on other sites More sharing options...
Joseph Watson Posted April 3, 2020 Report Share Posted April 3, 2020 I was just going to tell you, This looks like it could be an issue making quorum. In order for new DBs to be written to the cluster you need to have a quorum. If you have a node that's not responding correctly it can cause issues with the data handlers waiting for a response. Also if you are doing cluster replication over the WAN, that could be an issue as well if the nodes are to far apart and have high RTT. If that the case you may want to think about configuring your node into zones, which is a short term solution. Really doing cluster replication over WAN/Internet is not a "great" idea. Link to comment Share on other sites More sharing options...
naveed6865 Posted April 3, 2020 Report Share Posted April 3, 2020 Hello @Joseph Watson Thanks for the reply. Yes i noticed that if any of the node dont respond to rabbitmq, then i think this issue comes. Is there any way we can move this data to some other Cluster actively? I mean if we simply backup and restore the couchDB database to some other server, it gives this shard error. Somebody mention to change the shard manually, but if there are around 1000+ Docs need to change, who we can do that effectively? Regards Naveed Link to comment Share on other sites More sharing options...
Recommended Posts