Known Replikwando Errors

From ISoft Wiki
Revision as of 12:13, 19 December 2016 by Broy (talk | contribs) (→‎Repushing: Added a case we recently investigated to the troubleshooting advice)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Errors in Replikwando

Replikwando is made to handle inconsistent internet connections, MySQL databases that get shut off randomly, and things like that. However, sometimes things happen that Replikwando isn't able to handle automatically in a smart fashion. It uses error logs to tell the administrator when something odd has come up and should probably be looked into.

This wiki page is meant to help explain what the different errors are, as well as what needs to be done to handle them (or if they can be safely ignored).

Error Logs

FromConnectionError Log

This error log contains errors and warnings coming from the company's local database.

Lost Connection To The Database

  • Cause

Replikwando wasn't able to see the local database. This can happen if the MySQL service is stopped or restarted, or if the process was too busy to respond for a moment, or if the network connection to the server host quit working for a moment (if the MySQL server isn't hosted locally).

  • Resolution

Replikwando is built so that things like this can happen. Therefore, it's safe to ignore this error. Replikwando will just reconnect to the database when it comes back.

  • Similar Errors

MySQL Server Has Gone Away

The Following Query Was Reported And Skipped

  • Cause

Someone enabled the skip-query interface for replikwando and then told it to skip a query. The query is then logged in the error log to record what was skipped in case the administrator needs to go back and evaluate it.

  • Resolution

This error can be ignored, as it is purely for informational purposes.

ToConnectionError Log

This error log contains errors generated from interacting with the remote MySQL database.

Lost Connection To The Database

  • Cause

Replikwando wasn't able to see the remote database. This can happen if the MySQL service is stopped or restarted, or if the process was too busy to respond for a moment, or if the internet connection for the site drops.

  • Resolution

Replikwando is built so that things like this can happen. Therefore, it's safe to ignore this error. Replikwando will just reconnect to the database when it comes back.

  • Similar Errors

MySQL Server Has Gone Away

Duplicate Key

  • Cause

This error means that it tried to push a key to the remote database, but it already had the values being specified. This is normally caused by a query that gets sent, is received by the server, but the server reply just doesn't get back to Replikwando before it tries again. That means that if the remote server goes down or the internet connection is interrupted for even a moment, this can occur.

  • Resolution

It is safe to ignore this error. Newer versions no longer halt on this error, as it is indicative of the data already being on the remote server.

There is no '*'%' registered

  • Cause

This error occurs when the remote server is asked to run a stored procedure (like a function or trigger), but the user who created it or is defined as the executor no longer exists.

  • Resolution

You do not need to stop Replikwando to fix this issue. Dump the functions from the remote database into a text file, find-and-replace all instances of the obsoleted name with the name of a valid user, and re-insert them.

QueueConnectionError Log

Cannot find file: <path><something.bin>

  • Cause

This message appears when Replikwando cannot locate the data file. Replication.ini is probably pointing to the wrong location.

  • Resolution

Open the company's SQLYOG and run this query:

SHOW VARIABLES LIKE '%data%';

The results list should show you the path to where MySQL looks for the data file. Edit replication.ini and set the binlog path to where SQLYOG says the data file is located.

Database Is Locked

  • Cause

This error occurs when one thread/process/person is writing into the SQLite database file while another is trying to use it. It means that whatever got the error wasn't able to query the table or write data into it. It's most common to see this error when an administrator is running queries against the SQLite database file while Replikwando is running (which is safe to do, but it can get annoying having to run your query more than once).

  • Resolution

It is safe to ignore this error. Replikwando does not halt on this error; it just tries to do what it was doing again, until it succeeds. This error should no longer occur (a spinlock was implemented to resolve it).

Failed to get the last binlog position

  • Cause

This warning occurs when Replikwando looks for where it left off last time in your binary log, but can't find an entry. This is NORMAL to see when you're starting up a new session (no LocalQueueFile.sqlite file). If you see it during normal execution, something bad happened to the binary logging.

  • Resolution

It is safe to ignore this error if you see it when running a new instance of Replikwando. If it happens during normal operation, you should check to see that

    • MySQL server binary logging is enabled
    • The internal MySQL tables that handle logging aren't corrupted (use the query 'SHOW BINARY LOGS' to verify)
    • The binary log files haven't been moved or something

Bad Behavior

Repushing

Replikwando looks like it's working, but only runs the same six "junk" queries over and over

  • Cause

Replikwando cannot open/read the binary log files.

  • Resolution

Verify that the binary log files can be read by the user Replikwando is running under. This can be done via the mysqlbinlog program (which Replikwando uses for this purpose also).

mysqlbinlog --base64-output=DECODE-ROWS -v <binary log file>

Repush finishes, but no record is left in the messages table, so it never resyncs

  • Cause

If errors occur during repush, Replikwando refuses to notify HTP that the database should be synced, or else data would be lost.

  • Resolution

Correct whatever is causing the errors during repush.

As of 1.8.0, the following error will show when this problem occurs: "Repush failed! No message sent to Heartbeat; Database will not sync to HTP!"

Tables and views push, but functions and procedures are missing

  • Cause

This issue occurs when Replikwando does not have permission to SELECT from mysql.proc. See this MySQL doc page for more information.

  • Resolution

Replikwando must be given SELECT privs to mysql.proc. This table shows functions and procedures associated with all databases on a server, but cannot be altered unless you for some reason give the Replikwando user UPDATE status. Just don't do that :)

Replikwando2.exe sits at "Waiting for Active Thread" and never begins a repush

  • Possible Cause
  • The client's database has crashed.
  • Resolution
  • Delete and re-create the client's database on our side.

Replikwando runs and seems to work right, but dies during repushes

  • Possible Cause
  • MySQL Server version is higher than <replikwando_install_folder>/bin/mysqlbinlog.exe (ie: server is 5.6, mysqlbinlog.exe is 5.5)
    • The ToConnection log will show that queries are getting mangled during the repush.
  • Resolution
  • Update <replikwando_install_folder>/bin/mysqlbinlog.exe with a newer version

General

Replikwando2 (the gui version) 'works', but ReplikwandoService doesn't

  • Cause

'It doesn't work' isn't a working description of an error. This problem has several manifestations:

  • ReplikwandoService immediately exits without even getting started
  • ReplikwandoService starts, but won't repush
  • Repushing works, but keeping up-to-date does not
  • ReplikwandoService will not install

There are a whole bunch of things that could cause this.

  • The service might be running under a user that doesn't have permissions/access to necessary files
  • File permissions are screwed up
  • Files became corrupted after Replikwando2 got started
  • Magical kernel-level call interception
  • Firewall exceptions exist for one executable or the other
  • Resolution
  • Make sure the user account being used to run ReplikwandoService has access to all necessary files
  • Turn off antivirus (or, even better, add Replikwandoservice to the whitelist for the AV)
  • Make sure there are entries in any running firewall program

Replikwando2 successfully re-pushes, but then pushes tons of junk data

  • Possible cause

Replikwando is not using the correct mysqlbinlog.exe file.

  • Resolution
  • Change replication.ini's [fromserver].mysqlbinpath to be the same bin path that the MySQL instance is running
    • Windows: This query will tell you where to look for the right bin folder:
      SHOW VARIABLES LIKE 'basedir';
      
    • Linux: The Replikwando install creates a bin/mysql/mysqlbinlog.exe entry that you should point to
  • Possible cause

The client's binlog file is corrupted.

  • Resolution
  • Run this query in their SQLYOG:
SHOW BINARY LOGS;

If you see any of the following, the binary log is corrupt:

  • Binlog names are inconsistent ('mysqlbinlog-000001.bin', 'servername-000001.bin' in the same list)
  • Binlog names repeat (seeing 'mysqlbinlog-000001.bin','mysqlbinlog-000001.bin','mysqlbinlog-000001.bin' repeated over and over and the file sizes are identical)
  • First, make a backup of the client's entire ITrack database, in case something terrible happens.
  • Next, run this query in the client's SQLYOG:
RESET MASTER;

You should now be able to have Replikwando successfully re-push the client's data with normal replication afterwards.

Server is full of mysqlbinlog.exe instances

  • Possible Cause
  • Replikwando has lost the handle to mysqlbinlog.exe.
    • It is unknown why this happens, but is a system-level issue.
  • Resolution
  • Kill all but the most recent instance of mysqlbinlog.exe. Replikwando spawns these regularly, so if a mysqlbinlog.exe process has been around for awhile and is not changing, it's probably safe to kill it.
    • Do NOT kill an active mysqlbinlog.exe! Replikwando is actively receiving data from it, and things can go horribly wrong if you kill it!
  • A more permanent resolution is to use a different spawning method (we currently use ::CreateProcess). I believe (but may be wrong) that this issue occurs because ::CreateProcess returns COULDNT_CREATE_PROCESS or something like that, even though the process is made. It could also be an error in the handling of the closing of a windows-based thread, both of which should be investigated.
  • Potentially Resolved in 1.8.0.