litestream

Author	SHA1	Message	Date
Ben Johnson	8589111717	Implement streaming WAL segment iterator Currently, WALSegmentIterator implementations read to the end of the end of their list of segments and return EOF. This commit adds the ability to push additional segments to in-process iterators and notify their callers that new segments are available. This is only implemented for the file-based iterator but other segment iterators may get this implementation in the future or have a wrapping iterator provide a polling-based implementation.	2022-02-11 13:50:44 -07:00
Ben Johnson	006e4b7155	Update index & offset encoding Previously, the index & offsets were encoded as 8-character hex strings, however, this limits the maximum value to a `uint32`. This is normally not an issue, however, indices could go over the maximum value of 4 billion over time and the offset could exceed this value for an especially large WAL update. For safety, these encodings have been updated to 16-character hex encodings.	2022-02-08 13:14:49 -07:00
Ben Johnson	30a8d07a81	Add WAL overrun validation Under high write load, it is possible for write transactions from another process to overrun the WAL between the time when Litestream performs a RESTART checkpoint and when it obtains the write lock immediately after. This change adds validation that an overrun has not occurred and, if it has, it will start a new generation.	2022-02-07 13:35:20 -07:00
Ben Johnson	76e53dc6ea	Remove built-in validation option Previously, Litestream had a validator that worked most of the time but also caused some false positives. It is difficult to provide validation from with Litestream without controlling outside processes that can also affect the database. As such, validation has been moved out to the external CI test runner which provides a more consistent validation process.	2022-02-06 11:37:06 -07:00
Ben Johnson	762c7ae531	Implement FileWatcher	2022-02-06 09:51:04 -07:00
Ben Johnson	4349398ff5	Remove shadow WAL iterator This commit removes the shadow WAL iterator and replaces it with a fileWalSegmentIterator instead. This works since the shadow WAL now has the same structure as the replica WAL. This reduces duplicate code and will make it so read replication can be daisy chained in the future.	2022-01-31 16:09:02 -07:00
Ben Johnson	5d811f2e39	Fix golangci-lint issues	2022-01-31 09:21:20 -07:00
Ben Johnson	f6c859061b	Fix CodeQL warnings	2022-01-31 08:53:21 -07:00
Ben Johnson	dbdde21341	Use sqlite3_file_control(SQLITE_FCNTL_PERSIST_WAL) to persist WAL Previously, Litestream would avoid closing the SQLite3 connection in order to ensure that the WAL file was not cleaned up by the database if it was the last connection. This commit changes the behavior by introducing a file control call to perform the same action. This allows us to close the database file normally in all cases.	2022-01-28 15:12:43 -07:00
Ben Johnson	84d08f547a	Add end-to-end replication/restore testing	2022-01-15 09:05:46 -07:00
Ben Johnson	3f0ec9fa9f	Refactor Restore() This commit refactors out the complexity of downloading ordered WAL files in parallel to a type called `WALDownloader`. This makes it easier to test the restore separately from the download.	2022-01-04 15:03:59 -07:00
Ben Johnson	531e19ed6f	Refactor checksum calculation; improve test coverage	2021-12-12 10:25:20 -07:00
Ben Johnson	77274abf81	Refactor shadow WAL to use segments	2021-07-23 07:46:21 -06:00
Ben Johnson	fc897b481f	Group replica wal segments by index This commit changes the replica path format to group segments within a single index in the same directory. This is to eventually add the ability to seek to a record on file-based systems without having to iterate over the records. The DB shadow WAL will also be changed to this same format to support live replicas.	2021-06-14 15:24:05 -06:00
Ben Johnson	55c17b9d8e	Move WAL checksum validation message to trace logging Checksum mismatch can regularly occur now that write locks have been removed during WAL sync. This does not pose any corruption risk but does sound scary to end users. Moving this to trace logging instead.	2021-06-06 09:12:29 -06:00
Ben Johnson	fb80bc10ae	Refactor replica system	2021-05-21 07:44:36 -06:00
Ben Johnson	331f6072bf	Fix snapshot-only restore This commit fixes a bug introduced by parallel restore (`03831e2`) where snapshot-only restores were not being handled correctly and Litestream would hang indefinitely. Now the restore will check explicitly for snapshot-only restores and exit the restore process early to avoid WAL handling completely.	2021-04-24 07:48:25 -06:00
Ben Johnson	1d1fd6e686	Remove SQLite write lock during WAL sync (again) This commit reattempts a change to remove the write lock that was previously tried in `998e831`. This change will reduce the number of locks on the database which should help reduce error messages that applications see when they do not have busy_timeout set. In addition to the lock removal, a passive checkpoint is issued immediately before the read lock is obtained to prevent additional checkpoints by the application itself. SQLite does not support checkpoints from an active transaction so it cannot be done afterward.	2021-04-22 16:35:04 -06:00
Ben Johnson	03831e2d06	Download WAL files in parallel during restore This commit changes the restore to download multiple WAL files to the local disk in parallel while another goroutine applies those files in order. Downloading & applying the WAL files in serial reduces the total throughput as WAL files are typically made up of multiple small files.	2021-04-21 16:07:29 -06:00
Ben Johnson	1c01af4e69	Fix snapshot selection during restore-by-index This commit fixes a bug where restoring to a specific index will incorrectly choose the latest snapshot instead of choosing the latest snapshot that occurred before the given index.	2021-04-21 12:09:05 -06:00
Ben Johnson	84830bc4ad	Improve restoration logging This commit splits out logging for downloading a WAL file and applying the WAL file to the database to get more accurate timing measurements.	2021-04-18 09:33:53 -06:00
Ben Johnson	3ad157d841	Remove -dry-run flag in restore This flag is being removed because it's not actually that useful in practice and it just makes the restoration code more complicated.	2021-04-18 09:21:50 -06:00
Ben Johnson	247896b8b7	Remove reference to "wal" in first db init command This commit changes the error message of the first SQL command executed during initialization. Typically, it wraps the error with a message of "enable wal" since it is enabling the WAL mode but that can be confusing if the DB connection or file is invalid. Instead, the error is returned as-is and we can determine the source of the error since it is the only unwrapped DB-related error.	2021-04-15 11:51:22 -06:00
Ben Johnson	462330ead6	Support ARM release builds	2021-04-10 08:39:10 -06:00
Ben Johnson	0529ce74b7	Sync on close This commit changes the `replicate` command so that it performs a final DB sync & replica sync before it exits to ensure it has backed up all WAL frames at the time of exit.	2021-03-21 08:43:55 -06:00
Ben Johnson	aa54e4698d	Merge pull request #109 from benbjohnson/wal-mismatch-validation-info Add WAL validation debug information	2021-03-07 07:55:02 -07:00
Ben Johnson	0bd1b13b94	Add wal validation debug information on error This commit adds the WAL header and shadow path to "wal header mismatch" errors to help debug issues. The mismatch seems to happen more often than I would expect on restart. This error doesn't cause any corruption; it simply causes a generation to restart which requires a snapshot.	2021-03-07 07:48:43 -07:00
Ben Johnson	1c16aae550	Revert sync lock removal This commit reverts the removal of the SQLite write lock during WAL sync (`998e831c5c`). The change caused validation mismatch errors during the long-running test although the restored database did not appear to be corrupted so perhaps it's simply a locking issue during validation.	2021-03-07 07:30:25 -07:00
Ben Johnson	8947adc312	Expose additional DB configuration settings This commit exposes the monitor interval, checkpoint interval, minimum checkpoint page count, and maximum checkpoint page count via the YAML configuration file.	2021-03-06 08:33:19 -07:00
Ben Johnson	998e831c5c	Remove SQLite write lock during WAL sync Originally, Litestream relied on a SQLite write lock to ensure transactions were atomically replicated. However, this was changed so that Litestream itself now validates the transaction boundaries. As such, the write lock on the database is no longer needed. The read lock is sufficient to prevent WAL rollover and the WAL is append only so it is safe to read up to a known position calculated via fstat(). WAL validation change was made in `031a526b9a` The locking code, however, was moved in this commit to the post-checkpoint copy to ensure the end-of-file is not overwritten by an aggressive writers.	2021-03-06 07:51:35 -07:00
Ben Johnson	a14a74d678	Fix release of non-OFD locks This commit removes short-lived `os.Open()` calls on the database file because this can cause locks to be released when `os.File.Close()` is later called if the operating system does not support OFD (Open File Descriptor) locks.	2021-02-28 06:44:02 -07:00
Ben Johnson	d802e15b4f	Fix error handling when DB.init() fails The `DB.init()` can fail temporarily for a variety of reasons such as the database being locked. Previously, the DB would save the `*sql.DB` connection even if a step failed and this prevented the database from attempting initialization again. This change makes it so that the connection is only saved if initialization is successful. On failure, the initialization process will be retried on next sync.	2021-02-24 15:43:28 -07:00
Ben Johnson	37442babfb	Revert validation mismatch temp file persistence This commit reverts `4e469f8` which was used for debugging the validation stall corruption issue. It can cause the disk to fill with temporary files though so it is being reverted.	2021-02-09 06:44:42 -07:00
Ben Johnson	7f81890bae	Fix shadow wal corruption on stalled validation This commit fixes a timing bug that occurs in a specific scenario where the shadow wal sync stalls because of an s3 validation and the catch up write to the shadow wal is large enough to allow a window between WAL reads and the final copy. The file copy has been replaced by direct writes of the frame buffer to the shadow to ensure that every validated byte is exactly what is being written to the shadow wal. The one downside to this change is that the frame buffer will grow with the transaction size so it will use additional heap. This can be replaced by a spill-to-disk implementation but this should work well in the short term.	2021-02-06 07:28:15 -07:00
Ben Johnson	6fd11ccab5	Enforce max WAL index. This commit sets a hard upper limit for the WAL index to (1<<31)-1. The index is hex-encoded in file names as a 4-byte unsigned integer so limit ensures all index values are below any upper limit and are unaffected by any signed int limit. A WAL file is typically at least 4MB so you would need to write 8 petabytes to reach this upper limit.	2021-02-02 15:11:50 -07:00
Ben Johnson	6c49fba592	Check checkpoint result during restore	2021-02-02 15:04:20 -07:00
Ben Johnson	f17768e830	Log WAL frame checksum mismatch Currently, the WAL copy function can encounter a checksum mismatch in a WAL frame and it will return an error. This can occur for partial writes and is recovered from moments later. This commit changes the error to a log write instead.	2021-01-31 08:52:12 -07:00
Ben Johnson	4e469f8b02	Persist primary/replica copies after validation mismatch This commit changes `ValidateReplica()` to persist copies of the primary & replica databases for inspection if a validation mismatch occurs.	2021-01-31 08:47:06 -07:00
Ben Johnson	ad7bf7f974	Reduce logging output Previously, there were excessive log messages for checkpoints and retention. These have been removed or combined into a single log message where appropriate.	2021-01-31 08:12:18 -07:00
Ben Johnson	39a6fabb9f	Fix restore logging.	2021-01-26 17:01:00 -07:00
Ben Johnson	67eeb49101	Allow replica URL to be used for commands This commit refactors the commands to allow a replica URL when restoring a database. If the first CLI arg is a URL with a scheme, the it is treated as a replica URL.	2021-01-26 16:33:16 -07:00
Ben Johnson	94411923a7	Fix unit test	2021-01-21 13:52:35 -07:00
Ben Johnson	e92db9ef4b	Enforce stricter validation on restart. Previously, the sync would validate the last page written to ensure that replication picked up from the last position. However, a large WAL file followed by a series of shorter checkpointed WAL files means that the last page could be the same even if multiple checkpoints have occurred. To fix this, the WAL header must match the shadow WAL header when starting litestream since there are no guarantees about checkpoints.	2021-01-21 13:44:05 -07:00
Ben Johnson	031a526b9a	Only copy committed WAL pages	2021-01-21 12:44:11 -07:00
Ben Johnson	7fb98df240	cleanup	2021-01-18 15:58:49 -07:00
Ben Johnson	139d836d7a	Fix file/dir mode	2021-01-18 15:23:28 -07:00
Ben Johnson	35d755e7f2	Remove debugging code	2021-01-18 10:33:30 -07:00
Ben Johnson	358dcd4650	Copy shadow WAL immediately after init	2021-01-18 10:01:16 -07:00
Ben Johnson	2ce4052300	Remove write lock during db checksum	2021-01-18 07:05:27 -07:00
Ben Johnson	3c4fd152c9	Add more checksum logging	2021-01-18 06:38:03 -07:00

1 2 3

111 Commits