Commit Graph

84 Commits

Author SHA1 Message Date
Ben Johnson
1c16aae550 Revert sync lock removal
This commit reverts the removal of the SQLite write lock during
WAL sync (998e831c5c). The change
caused validation mismatch errors during the long-running test
although the restored database did not appear to be corrupted so
perhaps it's simply a locking issue during validation.
2021-03-07 07:30:25 -07:00
Ben Johnson
8947adc312 Expose additional DB configuration settings
This commit exposes the monitor interval, checkpoint interval,
minimum checkpoint page count, and maximum checkpoint page count
via the YAML configuration file.
2021-03-06 08:33:19 -07:00
Ben Johnson
998e831c5c Remove SQLite write lock during WAL sync
Originally, Litestream relied on a SQLite write lock to ensure
transactions were atomically replicated. However, this was changed
so that Litestream itself now validates the transaction boundaries.
As such, the write lock on the database is no longer needed. The
read lock is sufficient to prevent WAL rollover and the WAL is
append only so it is safe to read up to a known position calculated
via fstat().

WAL validation change was made in 031a526b9a

The locking code, however, was moved in this commit to the
post-checkpoint copy to ensure the end-of-file is not overwritten
by an aggressive writers.
2021-03-06 07:51:35 -07:00
Ben Johnson
a14a74d678 Fix release of non-OFD locks
This commit removes short-lived `os.Open()` calls on the database
file because this can cause locks to be released when `os.File.Close()`
is later called if the operating system does not support OFD
(Open File Descriptor) locks.
2021-02-28 06:44:02 -07:00
Ben Johnson
d802e15b4f Fix error handling when DB.init() fails
The `DB.init()` can fail temporarily for a variety of reasons such
as the database being locked. Previously, the DB would save the
`*sql.DB` connection even if a step failed and this prevented the
database from attempting initialization again. This change makes it
so that the connection is only saved if initialization is successful.
On failure, the initialization process will be retried on next sync.
2021-02-24 15:43:28 -07:00
Ben Johnson
37442babfb Revert validation mismatch temp file persistence
This commit reverts 4e469f8 which was used for debugging the validation
stall corruption issue. It can cause the disk to fill with temporary
files though so it is being reverted.
2021-02-09 06:44:42 -07:00
Ben Johnson
7f81890bae Fix shadow wal corruption on stalled validation
This commit fixes a timing bug that occurs in a specific scenario
where the shadow wal sync stalls because of an s3 validation and
the catch up write to the shadow wal is large enough to allow a
window between WAL reads and the final copy.

The file copy has been replaced by direct writes of the frame
buffer to the shadow to ensure that every validated byte is exactly
what is being written to the shadow wal. The one downside to this
change is that the frame buffer will grow with the transaction
size so it will use additional heap. This can be replaced by a
spill-to-disk implementation but this should work well in the
short term.
2021-02-06 07:28:15 -07:00
Ben Johnson
6fd11ccab5 Enforce max WAL index.
This commit sets a hard upper limit for the WAL index to (1<<31)-1.
The index is hex-encoded in file names as a 4-byte unsigned integer
so limit ensures all index values are below any upper limit and are
unaffected by any signed int limit.

A WAL file is typically at least 4MB so you would need to write
8 petabytes to reach this upper limit.
2021-02-02 15:11:50 -07:00
Ben Johnson
6c49fba592 Check checkpoint result during restore 2021-02-02 15:04:20 -07:00
Ben Johnson
f17768e830 Log WAL frame checksum mismatch
Currently, the WAL copy function can encounter a checksum mismatch in a
WAL frame and it will return an error. This can occur for partial writes
and is recovered from moments later. This commit changes the error to a
log write instead.
2021-01-31 08:52:12 -07:00
Ben Johnson
4e469f8b02 Persist primary/replica copies after validation mismatch
This commit changes `ValidateReplica()` to persist copies of the
primary & replica databases for inspection if a validation mismatch
occurs.
2021-01-31 08:47:06 -07:00
Ben Johnson
ad7bf7f974 Reduce logging output
Previously, there were excessive log messages for checkpoints and
retention. These have been removed or combined into a single log
message where appropriate.
2021-01-31 08:12:18 -07:00
Ben Johnson
39a6fabb9f Fix restore logging. 2021-01-26 17:01:00 -07:00
Ben Johnson
67eeb49101 Allow replica URL to be used for commands
This commit refactors the commands to allow a replica URL when
restoring a database. If the first CLI arg is a URL with a scheme,
the it is treated as a replica URL.
2021-01-26 16:33:16 -07:00
Ben Johnson
94411923a7 Fix unit test 2021-01-21 13:52:35 -07:00
Ben Johnson
e92db9ef4b Enforce stricter validation on restart.
Previously, the sync would validate the last page written to ensure
that replication picked up from the last position. However, a large
WAL file followed by a series of shorter checkpointed WAL files means
that the last page could be the same even if multiple checkpoints
have occurred.

To fix this, the WAL header must match the shadow WAL header when
starting litestream since there are no guarantees about checkpoints.
2021-01-21 13:44:05 -07:00
Ben Johnson
031a526b9a Only copy committed WAL pages 2021-01-21 12:44:11 -07:00
Ben Johnson
7fb98df240 cleanup 2021-01-18 15:58:49 -07:00
Ben Johnson
139d836d7a Fix file/dir mode 2021-01-18 15:23:28 -07:00
Ben Johnson
35d755e7f2 Remove debugging code 2021-01-18 10:33:30 -07:00
Ben Johnson
358dcd4650 Copy shadow WAL immediately after init 2021-01-18 10:01:16 -07:00
Ben Johnson
2ce4052300 Remove write lock during db checksum 2021-01-18 07:05:27 -07:00
Ben Johnson
3c4fd152c9 Add more checksum logging 2021-01-18 06:38:03 -07:00
Ben Johnson
d259d9b9e3 Fix checksum logging 2021-01-17 10:19:39 -07:00
Ben Johnson
90a1d959d4 Remove size from s3 filenames 2021-01-17 10:02:06 -07:00
Ben Johnson
04d75507e3 Fix checksum hex padding 2021-01-17 09:52:09 -07:00
Ben Johnson
4b65e6a88f Log validation position 2021-01-17 07:38:13 -07:00
Ben Johnson
07a65cbac7 Fix crc64 unit test 2021-01-16 10:04:03 -07:00
Ben Johnson
6ac6a8536d Obtain write lock during validation. 2021-01-16 09:27:43 -07:00
Ben Johnson
25fec29e1a Clear last position on replica sync error 2021-01-16 07:45:08 -07:00
Ben Johnson
cbc2dce6dc Add busy timeout 2021-01-16 07:33:32 -07:00
Ben Johnson
1b8cfc8a41 Add validation interval 2021-01-15 16:37:04 -07:00
Ben Johnson
b94ee366e5 Fix snapshot only restore 2021-01-15 13:12:15 -07:00
Ben Johnson
e1c9e09161 Update wal segment naming 2021-01-14 15:26:29 -07:00
Ben Johnson
1e4e9633cc Add s3 sync interval 2021-01-14 15:04:26 -07:00
Ben Johnson
a42f83f3cb Add LITESTREAM_CONFIG env var 2021-01-13 13:17:38 -07:00
Ben Johnson
57a02a8628 S3 replica 2021-01-13 10:14:54 -07:00
Ben Johnson
faa5765745 Add retention policy, remove WAL subdir 2021-01-12 15:22:37 -07:00
Ben Johnson
1fa1313b0b Add trace logging. 2021-01-11 11:04:29 -07:00
Ben Johnson
bcdb553267 Use database owner/group 2021-01-11 09:39:08 -07:00
Ben Johnson
60cb2c97ca Set default max checkpoint. 2021-01-10 09:46:27 -07:00
Ben Johnson
8871d75a8e Fix max checkpoint size check 2021-01-05 14:17:13 -07:00
Ben Johnson
c22eea13ad Add checkpoint tests 2021-01-05 14:07:17 -07:00
Ben Johnson
f4d055916a Add DB sync tests 2021-01-05 13:59:16 -07:00
Ben Johnson
979cabcdb9 Add some DB.Sync() tests 2021-01-01 10:02:03 -07:00
Ben Johnson
5134bc3328 Add test coverage for DB.CRC64 2021-01-01 09:26:23 -07:00
Ben Johnson
78d9de6512 Add DB path tests 2021-01-01 09:00:23 -07:00
Ben Johnson
065f641526 Change validation to use CRC-64 2021-01-01 08:24:11 -07:00
Ben Johnson
9d0e79c2cf Add db metrics 2020-12-31 16:30:56 -07:00
Ben Johnson
da5087c14c Fix vet issue 2020-12-31 10:56:29 -07:00