Jump to content
KAZOOcon: hackathon signup and details here! ×

[Bigcouch Shard Error]


Recommended Posts

Hi Team,

I have successfully setup kazoo-bigcouch cluster, and everything is working good except that after some time i get some shard error, and all of the cluster stops working.

example:

sup crossbar_maintenance create_account admin sip.mogility.cloud admin Lmkt@ptcl1234
Command failed: {'EXIT',{{badmatch,{error,<<"No DB shards could be opened.">>}},[{kz_json_schema,default_object,1,[{file,"src/kz_json_schema.erl"},{line,997}]},{kzd_accounts,new,0,[{file,"src/kzd_accounts.erl"},{line,120}]},{crossbar_maintenance,create_account,4,[{file,"src/crossbar_maintenance.erl"},{line,394}]},{sup,in_kazoo,4,[{file,"src/sup.erl"},{line,98}]},{rpc,'-handle_call_call/6-fun-0-',5,[{file,"rpc.erl"},{line,197}]}]}}

It goes away when i restart every database nodes, here are my database members(4 members of couch database).

[root@db1zone1 ~]# curl 127.0.0.1:5984/_membership  

{"all_nodes":["bigcouch@db1zone1.xxxx.cloud","bigcouch@db1zone2.xxxx.cloud","bigcouch@db2zone1.xxxx.cloud","bigcouch@db2zone2.xxxx.cloud"],"cluster_nodes":["bigcouch@db1zone1.xxxx.cloud","bigcouch@db1zone2.xxxx.cloud","bigcouch@db2zone1.xxxx.cloud","bigcouch@db2zone2.xxxxx.cloud"]}

And here are my cluster configurations in local.ini file

 

[cluster]

q=1

r=3

w=3

n=4

 

Can you please guid me whats the issue here.

 

Regards

Abbasi

 

Link to comment
Share on other sites

  • 2600Hz Employees

Possibly permissions on the database files themselves (I think they're under /srv/dbs ?). Try accessing a single bigcouch node and see what its logging.

Also, n=4 (or any even number) is problematic in a network split. It is generally good to have n be odd to make it more likely to have a majority on one side of the split.

Link to comment
Share on other sites

@mc_

Thanks lot for your time, i have checked that there were no permission issues with the dbs files(directories and subdirectories), I can see that issue does not happen since last Friday(1st-Nov-19).

If that issue gets in again, i will surely look into bigcouch logs and get back to you.

 

Regards

Abbasi

Link to comment
Share on other sites

@mc_

I got issue again, and here is my bigcouch log entry

[Mon, 11 Nov 2019 14:10:43 GMT] [info] [<0.2254.0>] [ecf5d584] undefined - - 'GET' /faxes/_design/faxes/_view/schedule_accounts?group_level=1&group=true&reduce=true 500

[Mon, 11 Nov 2019 14:10:43 GMT] [info] [<0.2254.0>] [ecf5d584] 45.76.23.242 127.0.0.1:15984 GET /faxes/_design/faxes/_view/schedule_accounts?group_level=1&group=true&reduce=true 500 ok 1

[Mon, 11 Nov 2019 14:10:43 GMT] [info] [<0.285.0>] [c98bdde3] 52.40.138.157 144.202.79.50:5984 GET /_utils/database.html?accounts 304 ok 1

[Mon, 11 Nov 2019 14:10:43 GMT] [info] [<0.2263.0>] [7fb8cfb1] undefined - - 'GET' / 200

[Mon, 11 Nov 2019 14:10:43 GMT] [info] [<0.2263.0>] [7fb8cfb1] 45.76.23.242 undefined GET / 200 ok 1

[Mon, 11 Nov 2019 14:10:44 GMT] [info] [<0.280.0>] [331f2827] undefined - - 'GET' /_session 200

[Mon, 11 Nov 2019 14:10:44 GMT] [info] [<0.280.0>] [331f2827] 52.40.138.157 144.202.79.50:5984 GET /_session 200 ok 1

[Mon, 11 Nov 2019 14:10:44 GMT] [info] [<0.281.0>] [4eef4492] undefined - - 'GET' /_config/query_servers/ 200

[Mon, 11 Nov 2019 14:10:44 GMT] [info] [<0.281.0>] [4eef4492] 52.40.138.157 144.202.79.50:5984 GET /_config/query_servers/ 200 ok 0

[Mon, 11 Nov 2019 14:10:44 GMT] [error] [<0.282.0>] [c675e7a5] Uncaught error in HTTP request: {error,

                                                                {internal_server_error,

                                                                 "No DB shards could be opened."}}

[Mon, 11 Nov 2019 14:10:44 GMT] [info] [<0.282.0>] [c675e7a5] Stacktrace: [{fabric_util,get_shard,3,

                                            [{file,"src/fabric_util.erl"},

                                             {line,67}]},

                                           {fabric,get_security,2,

                                            [{file,"src/fabric.erl"},

                                             {line,138}]},

                                           {chttpd_db,do_db_req,2,

                                            [{file,"src/chttpd_db.erl"},

                                             {line,198}]},

                                           {chttpd,handle_request,1,

                                            [{file,"src/chttpd.erl"},

                                             {line,198}]},

                                           {mochiweb_http,headers,5,

                                            [{file,"src/mochiweb_http.erl"},

                                             {line,126}]},

                                           {proc_lib,init_p_do_apply,3,

                                            [{file,"proc_lib.erl"},

                                             {line,227}]}]

[Mon, 11 Nov 2019 14:10:44 GMT] [error] [<0.285.0>] [1d214b0c] Uncaught error in HTTP request: {error,

                                                                {internal_server_error,

                                                                 "No DB shards could be opened."}}

[Mon, 11 Nov 2019 14:10:44 GMT] [info] [<0.282.0>] [c675e7a5] undefined - - 'GET' /accounts/_all_docs?limit=11 500

[Mon, 11 Nov 2019 14:10:44 GMT] [info] [<0.285.0>] [1d214b0c] Stacktrace: [{fabric_util,get_shard,3,

                                            [{file,"src/fabric_util.erl"},

                                             {line,67}]},

                                           {fabric,get_security,2,

                                            [{file,"src/fabric.erl"},

                                             {line,138}]},

                                           {chttpd_db,do_db_req,2,

                                            [{file,"src/chttpd_db.erl"},

                                             {line,198}]},

                                           {chttpd,handle_request,1,

                                            [{file,"src/chttpd.erl"},

                                             {line,198}]},

                                           {mochiweb_http,headers,5,

                                            [{file,"src/mochiweb_http.erl"},

                                             {line,126}]},

                                           {proc_lib,init_p_do_apply,3,

                                            [{file,"proc_lib.erl"},

                                             {line,227}]}]

 

 

Regards

Abbasi

Link to comment
Share on other sites

  • 1 month later...
  • 3 months later...

Hello

I am also facing same issue when creating the account. I get the error

07:36:17.678 [error] |896ed795c4b53ea80b05793af26ae47a|kz_couch_db:50(<0.15686.2>) failed to create database account%2F05%2F6d%2F24901ce1d911ada86ef8b34206e8: {bad_response,{500,[{<<"X-Couch-Request-ID">>,<<"7bee89c5">>},{<<"Server">>,<<"CouchDB/1.1.1 (Erlang OTP/R15B03)">>},{<<"Date">>,<<"Fri, 20 Mar 2020 07:36:17 GMT">>},{<<"Content-Type">>,<<"application/json">>},{<<"Content-Length">>,<<"51">>},{<<"Cache-Control">>,<<"must-revalidate">>}],<<"{\"error\":\"error\",\"reason\":\"internal_server_error\"}\n">>}}
07:36:17.686 [critical] |896ed795c4b53ea80b05793af26ae47a|kz_couch_util:69(<0.15686.2>) response code 500 not expected : <<"{\"error\":\"internal_server_error\",\"reason\":\"No DB shards could be opened.\",\"stack\":[\"bad entry in stacktrace\",\"bad entry in stacktrace\",\"bad entry in stacktrace\",\"bad entry in stacktrace\",\"bad entry in stacktrace\",\"bad entry in stacktrace\"]}\n">>
07:36:17.686 [error] |896ed795c4b53ea80b05793af26ae47a|cb_accounts:1457(<0.15686.2>) failed to delete services: error: function_clause
 

are you able to fix this, I have single cluster node, This issue happened, when i changed the hostname of server and after this we are unable to add more accounts,  We changed back to old hostname, but we are stucked here now, We can create phones and users fine. there is no issue, Just accounts are not being created any more.

Please advise

Regards

Naveed

Link to comment
Share on other sites

  • 2 weeks later...

I was just going to tell you, This looks like it could be an issue making quorum. In order for new DBs to be written to the cluster you need to have a quorum. If you have a node that's not responding correctly it can cause issues with the data handlers waiting for a response.

Also if you are doing cluster replication over the WAN, that could be an issue as well if the nodes are to far apart and have high RTT. If that the case you may want to think about configuring your node into zones, which is a short term solution. Really doing cluster replication over WAN/Internet is not a "great" idea.

 

Link to comment
Share on other sites

Hello @Joseph Watson

Thanks for the reply. Yes i noticed that if any of the node dont respond to rabbitmq, then i think this issue comes. Is there any way we can move this data to some other Cluster actively? I mean if we simply backup and restore the couchDB database to some other server, it gives this shard error. Somebody mention to change the shard manually, but if there are around 1000+ Docs need to change, who we can do that effectively?

Regards

Naveed

Link to comment
Share on other sites

×
×
  • Create New...