10.9.Troubleshooting

If an operation seems to hang, see the output of status ().

Check for the presence of the following conditions:

  • The cluster line shows 0% CPU, no message traffic and an unchanging number of buffers wired, this is probably a bug. To reset, restart the cluster or the offending process if found. Restart is done by executing raw_exit (); over an SQL connection to the process in question.

  • The cluster line shows many threads waiting compared to total threads. If CPU is 0 and this does not change there could be a transaction that holds locks indefinitely. To clear, execute txn_killall (1); . Do this at a node that has local threads waiting. This is seen in the Lock Status paragraph of status ('') when connected to the node in question.

  • The cluster line shows a changing number in the pfs field. The system is swapping and slowed down.

  • If the status () itself hangs, try another process of the cluster. See that there is no temporary atomic activity like a long checkpoint. If the situation persists there is a bug. The checkpoint can be seen by the presence of the checkpinmt_in_progress file in each server's working directory.

  • To check the integrity of database files, do cl_exec() ('backup ''/dev/null'''); If this returns, the databases are OK. If one is found to be corrupt the corresponding server exits.