This commit refactors out the complexity of downloading ordered WAL
files in parallel to a type called `WALDownloader`. This makes it
easier to test the restore separately from the download.
This commit fixes an issue where the reference is taken
on the loop variable rather than the slice element when
computing the minimum snapshot within a generation so
it can cause the wrong snapshot to be chosen.
This commit changes the replica path format to group segments within
a single index in the same directory. This is to eventually add the
ability to seek to a record on file-based systems without having
to iterate over the records. The DB shadow WAL will also be changed
to this same format to support live replicas.
This commit removes short-lived `os.Open()` calls on the database
file because this can cause locks to be released when `os.File.Close()`
is later called if the operating system does not support OFD
(Open File Descriptor) locks.
This commit adds the ability to periodically perform snapshots on
an interval that is separate from retention. For example, this lets
you retain backups for 24 hours but you can snapshot your database
every six hours to improve recovery time.
This commit fixes a timing bug that occurs in a specific scenario
where the shadow wal sync stalls because of an s3 validation and
the catch up write to the shadow wal is large enough to allow a
window between WAL reads and the final copy.
The file copy has been replaced by direct writes of the frame
buffer to the shadow to ensure that every validated byte is exactly
what is being written to the shadow wal. The one downside to this
change is that the frame buffer will grow with the transaction
size so it will use additional heap. This can be replaced by a
spill-to-disk implementation but this should work well in the
short term.
This commit refactors the commands to allow a replica URL when
restoring a database. If the first CLI arg is a URL with a scheme,
the it is treated as a replica URL.
This was originally meant to add a TRUNCATE checkpoint before starting
a new generation, however, there is a write lock that blocks the
checkpoint and it's more complicated to roll it back and attempt the
truncation.