Jump to content
  • 0
BilalAbbasi

[Bigcouch Shard Error]

Question

Hi Team,

I have successfully setup kazoo-bigcouch cluster, and everything is working good except that after some time i get some shard error, and all of the cluster stops working.

example:

sup crossbar_maintenance create_account admin sip.mogility.cloud admin Lmkt@ptcl1234
Command failed: {'EXIT',{{badmatch,{error,<<"No DB shards could be opened.">>}},[{kz_json_schema,default_object,1,[{file,"src/kz_json_schema.erl"},{line,997}]},{kzd_accounts,new,0,[{file,"src/kzd_accounts.erl"},{line,120}]},{crossbar_maintenance,create_account,4,[{file,"src/crossbar_maintenance.erl"},{line,394}]},{sup,in_kazoo,4,[{file,"src/sup.erl"},{line,98}]},{rpc,'-handle_call_call/6-fun-0-',5,[{file,"rpc.erl"},{line,197}]}]}}

It goes away when i restart every database nodes, here are my database members(4 members of couch database).

[root@db1zone1 ~]# curl 127.0.0.1:5984/_membership  

{"all_nodes":["bigcouch@db1zone1.xxxx.cloud","bigcouch@db1zone2.xxxx.cloud","bigcouch@db2zone1.xxxx.cloud","bigcouch@db2zone2.xxxx.cloud"],"cluster_nodes":["bigcouch@db1zone1.xxxx.cloud","bigcouch@db1zone2.xxxx.cloud","bigcouch@db2zone1.xxxx.cloud","bigcouch@db2zone2.xxxxx.cloud"]}

And here are my cluster configurations in local.ini file

 

[cluster]

q=1

r=3

w=3

n=4

 

Can you please guid me whats the issue here.

 

Regards

Abbasi

 

Share this post


Link to post
Share on other sites

9 answers to this question

Recommended Posts

  • 0

Possibly permissions on the database files themselves (I think they're under /srv/dbs ?). Try accessing a single bigcouch node and see what its logging.

Also, n=4 (or any even number) is problematic in a network split. It is generally good to have n be odd to make it more likely to have a majority on one side of the split.

Share this post


Link to post
Share on other sites
  • 0

@mc_

Thanks lot for your time, i have checked that there were no permission issues with the dbs files(directories and subdirectories), I can see that issue does not happen since last Friday(1st-Nov-19).

If that issue gets in again, i will surely look into bigcouch logs and get back to you.

 

Regards

Abbasi

Share this post


Link to post
Share on other sites
  • 0

@mc_

I got issue again, and here is my bigcouch log entry

[Mon, 11 Nov 2019 14:10:43 GMT] [info] [<0.2254.0>] [ecf5d584] undefined - - 'GET' /faxes/_design/faxes/_view/schedule_accounts?group_level=1&group=true&reduce=true 500

[Mon, 11 Nov 2019 14:10:43 GMT] [info] [<0.2254.0>] [ecf5d584] 45.76.23.242 127.0.0.1:15984 GET /faxes/_design/faxes/_view/schedule_accounts?group_level=1&group=true&reduce=true 500 ok 1

[Mon, 11 Nov 2019 14:10:43 GMT] [info] [<0.285.0>] [c98bdde3] 52.40.138.157 144.202.79.50:5984 GET /_utils/database.html?accounts 304 ok 1

[Mon, 11 Nov 2019 14:10:43 GMT] [info] [<0.2263.0>] [7fb8cfb1] undefined - - 'GET' / 200

[Mon, 11 Nov 2019 14:10:43 GMT] [info] [<0.2263.0>] [7fb8cfb1] 45.76.23.242 undefined GET / 200 ok 1

[Mon, 11 Nov 2019 14:10:44 GMT] [info] [<0.280.0>] [331f2827] undefined - - 'GET' /_session 200

[Mon, 11 Nov 2019 14:10:44 GMT] [info] [<0.280.0>] [331f2827] 52.40.138.157 144.202.79.50:5984 GET /_session 200 ok 1

[Mon, 11 Nov 2019 14:10:44 GMT] [info] [<0.281.0>] [4eef4492] undefined - - 'GET' /_config/query_servers/ 200

[Mon, 11 Nov 2019 14:10:44 GMT] [info] [<0.281.0>] [4eef4492] 52.40.138.157 144.202.79.50:5984 GET /_config/query_servers/ 200 ok 0

[Mon, 11 Nov 2019 14:10:44 GMT] [error] [<0.282.0>] [c675e7a5] Uncaught error in HTTP request: {error,

                                                                {internal_server_error,

                                                                 "No DB shards could be opened."}}

[Mon, 11 Nov 2019 14:10:44 GMT] [info] [<0.282.0>] [c675e7a5] Stacktrace: [{fabric_util,get_shard,3,

                                            [{file,"src/fabric_util.erl"},

                                             {line,67}]},

                                           {fabric,get_security,2,

                                            [{file,"src/fabric.erl"},

                                             {line,138}]},

                                           {chttpd_db,do_db_req,2,

                                            [{file,"src/chttpd_db.erl"},

                                             {line,198}]},

                                           {chttpd,handle_request,1,

                                            [{file,"src/chttpd.erl"},

                                             {line,198}]},

                                           {mochiweb_http,headers,5,

                                            [{file,"src/mochiweb_http.erl"},

                                             {line,126}]},

                                           {proc_lib,init_p_do_apply,3,

                                            [{file,"proc_lib.erl"},

                                             {line,227}]}]

[Mon, 11 Nov 2019 14:10:44 GMT] [error] [<0.285.0>] [1d214b0c] Uncaught error in HTTP request: {error,

                                                                {internal_server_error,

                                                                 "No DB shards could be opened."}}

[Mon, 11 Nov 2019 14:10:44 GMT] [info] [<0.282.0>] [c675e7a5] undefined - - 'GET' /accounts/_all_docs?limit=11 500

[Mon, 11 Nov 2019 14:10:44 GMT] [info] [<0.285.0>] [1d214b0c] Stacktrace: [{fabric_util,get_shard,3,

                                            [{file,"src/fabric_util.erl"},

                                             {line,67}]},

                                           {fabric,get_security,2,

                                            [{file,"src/fabric.erl"},

                                             {line,138}]},

                                           {chttpd_db,do_db_req,2,

                                            [{file,"src/chttpd_db.erl"},

                                             {line,198}]},

                                           {chttpd,handle_request,1,

                                            [{file,"src/chttpd.erl"},

                                             {line,198}]},

                                           {mochiweb_http,headers,5,

                                            [{file,"src/mochiweb_http.erl"},

                                             {line,126}]},

                                           {proc_lib,init_p_do_apply,3,

                                            [{file,"proc_lib.erl"},

                                             {line,227}]}]

 

 

Regards

Abbasi

Share this post


Link to post
Share on other sites
  • 0

Hello

I am also facing same issue when creating the account. I get the error

07:36:17.678 [error] |896ed795c4b53ea80b05793af26ae47a|kz_couch_db:50(<0.15686.2>) failed to create database account%2F05%2F6d%2F24901ce1d911ada86ef8b34206e8: {bad_response,{500,[{<<"X-Couch-Request-ID">>,<<"7bee89c5">>},{<<"Server">>,<<"CouchDB/1.1.1 (Erlang OTP/R15B03)">>},{<<"Date">>,<<"Fri, 20 Mar 2020 07:36:17 GMT">>},{<<"Content-Type">>,<<"application/json">>},{<<"Content-Length">>,<<"51">>},{<<"Cache-Control">>,<<"must-revalidate">>}],<<"{\"error\":\"error\",\"reason\":\"internal_server_error\"}\n">>}}
07:36:17.686 [critical] |896ed795c4b53ea80b05793af26ae47a|kz_couch_util:69(<0.15686.2>) response code 500 not expected : <<"{\"error\":\"internal_server_error\",\"reason\":\"No DB shards could be opened.\",\"stack\":[\"bad entry in stacktrace\",\"bad entry in stacktrace\",\"bad entry in stacktrace\",\"bad entry in stacktrace\",\"bad entry in stacktrace\",\"bad entry in stacktrace\"]}\n">>
07:36:17.686 [error] |896ed795c4b53ea80b05793af26ae47a|cb_accounts:1457(<0.15686.2>) failed to delete services: error: function_clause
 

are you able to fix this, I have single cluster node, This issue happened, when i changed the hostname of server and after this we are unable to add more accounts,  We changed back to old hostname, but we are stucked here now, We can create phones and users fine. there is no issue, Just accounts are not being created any more.

Please advise

Regards

Naveed

Share this post


Link to post
Share on other sites
  • 0

I was just going to tell you, This looks like it could be an issue making quorum. In order for new DBs to be written to the cluster you need to have a quorum. If you have a node that's not responding correctly it can cause issues with the data handlers waiting for a response.

Also if you are doing cluster replication over the WAN, that could be an issue as well if the nodes are to far apart and have high RTT. If that the case you may want to think about configuring your node into zones, which is a short term solution. Really doing cluster replication over WAN/Internet is not a "great" idea.

 

Share this post


Link to post
Share on other sites
  • 0

Hello @Joseph Watson

Thanks for the reply. Yes i noticed that if any of the node dont respond to rabbitmq, then i think this issue comes. Is there any way we can move this data to some other Cluster actively? I mean if we simply backup and restore the couchDB database to some other server, it gives this shard error. Somebody mention to change the shard manually, but if there are around 1000+ Docs need to change, who we can do that effectively?

Regards

Naveed

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

×
×
  • Create New...