MDEV-39101 Make BACKUP SERVER mutually exclusive with itself and BACKUP STAGE#4892
MDEV-39101 Make BACKUP SERVER mutually exclusive with itself and BACKUP STAGE#4892mariadb-andrzejjarzabek wants to merge 9 commits intoMariaDB:MDEV-14992from
Conversation
This introduces a basic driver Sql_cmd_backup, storage engine interfaces, and basic copying of InnoDB data files. On Windows, we pass a target directory name; elsewhere, we pass a target directory handle. TODO: Copy the InnoDB log and prevent concurrent log_t::set_archive(). TODO: Implement proper mutual exclusion with buf_page_t::flush(). TODO: Implement other storage engine interfaces. TODO: Implement locking to exclude concurrent BACKUP STAGE. innodb_log_recovery_start: Make settable, for incremental backup. Alas, the space.get_create_lsn() < checkpoint logic does not work yet.
Implement file-level InnoDB page locking for backup. buf_flush_list_space(): Check for concurrent backup before writing each page. This is inefficient, but this function may be invoked from multiple threads concurrently, and it cannot be changed easily, especially for fil_crypt_thread(). TODO: Implement finer-grained locking around copying page ranges.
|
Note - this solution uses MDL backup lock which makes it block on concurrent BACKUP SERVER and BACKUP STAGE as required. However BACKUP STAGE also upgrades that lock for various actions like flushing and DDL logging. As written, the various actions performed by BACKUP SERVER do not have access to the MDL request/ticket. If they should attempt to use the MDL lock themselves to perform the backup, we should be mindful of possible interactions. |
…UP STAGE BACKUP SERVER blocks when another BACKUp SERVER is in progress or a BACKUP STAGE has been started. Takes advantage of existing mechanism using MDL which BACKUP STAGE uses to block concurrent backups.
6b0f480 to
531a30e
Compare
|
|
dr-m
left a comment
There was a problem hiding this comment.
Thank you. In a few hours, I plan to update the MDEV-14992 branch soon with a minimal version of this initial part, without any DEBUG_SYNC instrumentation, and with a test that works with CMAKE_BUILD_TYPE=RelWithDebInfo.
As far as I understand from MDEV-36025, acquiring only MDL_BACKUP_START is not going to be sufficient. You should quickly find this out when implementing backup for ENGINE=Aria (MDEV-39092).
Before the backup_end step, we must acquire more locks. The locks can and should be released before the backup_finalize step, to avoid unnecessary waits.
| #include <my_global.h> | ||
|
|
||
| class THD; |
There was a problem hiding this comment.
Usually, my_global.h is supposed to be included first in every file that includes these files. You added an #include for this file to sql/sql_backup.cc, which used to do #include "my_global.h" as the very first thing. If we retain that order, no changes to the sql directory other than sql/sql_backup.cc will be necessary.
| inline void set_type(enum_mdl_type type_arg) | ||
| { | ||
| DBUG_ASSERT(ticket == NULL); | ||
| DBUG_ASSERT(ticket == nullptr); | ||
| type= type_arg; | ||
| } |
There was a problem hiding this comment.
It is shorter and more idiomatic to write !ticket.
There was a problem hiding this comment.
While I sympathise with nullptr, I'd avoid unrelated changes here and everywhere else.
There was a problem hiding this comment.
I'm happy to back out this change, however it seems to me a healthy principle for header files to include (even if transitively) all definitions and declarations they require, rather than count on files that include them to also include other files in specified order. If the convention is to include "my_global.h" before other includes, then headers relying on my_global.h should themselves include it? However since it's neither related not needed, I'm leaving it out.
| #include "debug_sync.h" | ||
| #include "mdl.h" | ||
| #include "my_global.h" |
There was a problem hiding this comment.
Had the #include been added after the my_global.h, then no changes to those files would be necessary.
sql/sql_backup.cc
Outdated
| bool fail = my_mkdir(target.str, 0755, MYF(MY_WME)); | ||
|
|
||
| #ifndef _WIN32 | ||
| int dir= open(target.str, O_DIRECTORY); | ||
| if (dir < 0) | ||
| int dir; | ||
| if(!fail) | ||
| { | ||
| my_error(EE_CANT_MKDIR, MYF(ME_BELL), target.str, errno); | ||
| return true; | ||
| dir = open(target.str, O_DIRECTORY); | ||
| if (dir < 0) | ||
| { | ||
| my_error(EE_CANT_MKDIR, MYF(ME_BELL), target.str, errno); | ||
| fail= true; | ||
| } | ||
| } | ||
| #endif | ||
|
|
||
| bool fail= plugin_foreach_with_mask(thd, backup_start, | ||
| MYSQL_STORAGE_ENGINE_PLUGIN, | ||
| PLUGIN_IS_DELETED|PLUGIN_IS_READY, | ||
| IF_WIN(const_cast<char*>(target.str), | ||
| reinterpret_cast<void*>(dir))); | ||
| if (!fail) | ||
| fail= plugin_foreach_with_mask(thd, backup_step, |
There was a problem hiding this comment.
This excessive reindentation exercise, which is violating some of our white space rules (there should be no space before assignment operators, and there should be space after if) would have been prevented by making use of a simple goto label:
if (my_mkdir(target.str, 0755, MYF(MY_WME)))
{
err_exit:
thd->mdl_context.release_lock(mdl_request.ticket);
return true;
}
// …
if (dir < 0)
{
my_error(EE_CANT_MKDIR, MYF(ME_BELL), target.str, errno);
goto err_exit;
}Simply, each subsequent return true will be replaced with goto err_exit.
| # Check that BACKUP SERVER blocks when another BACKUP SERVER is in progress | ||
|
|
||
| --source include/have_debug_sync.inc |
There was a problem hiding this comment.
This test is covering more than the scenario that is listed first. It’s customary to have a common setup section before any tests start. I think that a better name for this test would be main.backup_server_locking. I was thinking if --suite=backup would be a more suitable place for this test, but then I rememberered that some targets, such as AppVeyor, only cover --suite=main. It’s good to have wider coverage for this kind of a test.
Usually, if we include have_debug_sync.inc we also include have_debug.inc so that the test will be filtered out quicker when the tests are being run against a non-debug-instrumented server. It would be even better to remove the DEBUG_SYNC dependency or to make it optional. At least some of the locking can be demonstrated without using any DEBUG_SYNC.
| --connect (con1,localhost,root,,test) | ||
| --connect (con2,localhost,root,,test) | ||
|
|
||
| --connection con1 |
There was a problem hiding this comment.
connect implies a switch of the current connection. Hence, only the first of these three statements should be needed here. We’re also missing count_sessions.inc at the start of the test and wait_until_count_sessions.inc at the end of the test.
There was a problem hiding this comment.
count_sessions.inc is broken and we're gradually moving away from it. See e.g. 4654b4f, we're now fixing affected test rather than adding wait_until_count_sessions.inc everywhere.
One of the biggest problems was check testcase connection, which may affect value recorded by count_sessions.inc
| --connection con2 | ||
| --reap | ||
|
|
||
| --connection default | ||
|
|
||
| --disconnect con1 | ||
| --disconnect con2 |
There was a problem hiding this comment.
The connections should be closed as soon as possible, so that they can be closed faster on the server side and the test can complete faster, even after adding the missing wait_until_count_sessions.inc.
|
|
||
| --connection con2 | ||
| --replace_result $MYSQLTEST_VARDIR <MYSQLTEST_VARDIR> | ||
| --send_eval BACKUP SERVER TO '$MYSQLTEST_VARDIR/another_directory' |
There was a problem hiding this comment.
There is no rmdir for either other_directory or another_directory. Tests are not supposed to leave any garbage files or directories behind.
sql/sql_backup.cc
Outdated
| plugin_foreach_with_mask(thd, backup_finalize, MYSQL_STORAGE_ENGINE_PLUGIN, | ||
| PLUGIN_IS_DELETED|PLUGIN_IS_READY, nullptr); | ||
| plugin_foreach_with_mask(thd, backup_finalize, MYSQL_STORAGE_ENGINE_PLUGIN, | ||
| PLUGIN_IS_DELETED|PLUGIN_IS_READY, nullptr); | ||
| #ifndef _WIN32 | ||
| close(dir); | ||
| close(dir); | ||
| #endif | ||
|
|
||
| } | ||
| thd->mdl_context.release_lock(mdl_request.ticket); |
There was a problem hiding this comment.
The lock must be released before backup_finalize to avoid unnecessary conflicts. Quoting the MDEV-14992 description:
backup_finalizeafterBACKUP STAGE ENDreleased locks
InnoDB:
- copy and trim the last 0 to 2 “hot”
ib_*.logfiles- if
backup_starthad setinnodb_log_archive=ON:
- restore
innodb_log_archive, innodb_log_file_size- delete any
ib_*.logfiles that had been created outside the backup directoryRocksDB:
- in MDEV-38362, stream the files that were hard-linked in
backup_endThe streaming backup (MDEV-38362) would likely make use of a working directory for the backup, to facilitate efficient copying of the
innodb_log_archive=ONfiles as well as anyENGINE=RocksDBfiles.
During this stage, we are working on our private copy in dir, or deleting ib_*.log files from the server data directory. These operations are assumed to be invisible to any other threads. There is no need to block any operations on the server data directory at this final processing step.
|
I force-pushed #4817 and included the non-debug test |
BACKUP SERVER blocks when another BACKUp SERVER is in progress or a BACKUP STAGE has been started. Takes advantage of existing mechanism using MDL which BACKUP STAGE uses to block concurrent backups.