I recently noticed that the cost for ListBucket calls was increasing for an
application that was using Litestream. After investigating it seemed that the
bucket had retained the entire history of data, while Litestream was
continually logging that it was deleting the same data:
```
2022-10-30T12:00:27Z (s3): wal segmented deleted before 0792d3393bf79ced/00000233: n=1428
<snip>
2022-10-30T13:00:24Z (s3): wal segmented deleted before 0792d3393bf79ced/00000233: n=1428
```
This is occuring because the DeleteObjects call is a batch item, that returns
the individual object deletion errors in the response[1]. The S3 replica client
discards the response, and only handles errors in the original API call. I had
a misconfigured IAM policy that meant all deletes were failing, but this never
actually bubbled up as a real error.
To fix this, I added a check for the response body to handle any errors the
operation might have encountered. Because this may include a large number of
errors (in this case 1428 of them), the output is summarized to avoid an overly
large error message. When items are not found, they will not return an error[2]
- they will still be marked as deleted, so this change should be in-line with
the original intentions of this code.
1: https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html#API_DeleteObjects_Example_2
2: https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html
The `monitor-delay-interval` has been added to the DB config so that
users can change the time period between WAL checks after a file
change notification has occurred. This can be useful to batch up
changes in larger files in the shadow WAL or to reduce or eliminate
the delay in propagating changes during read replication.
Setting the interval to zero or less will disable it.
Currently, WALSegmentIterator implementations read to the end of
the end of their list of segments and return EOF. This commit adds
the ability to push additional segments to in-process iterators and
notify their callers that new segments are available. This is only
implemented for the file-based iterator but other segment iterators
may get this implementation in the future or have a wrapping
iterator provide a polling-based implementation.
Previously, the index & offsets were encoded as 8-character hex
strings, however, this limits the maximum value to a `uint32`. This
is normally not an issue, however, indices could go over the maximum
value of 4 billion over time and the offset could exceed this value
for an especially large WAL update. For safety, these encodings have
been updated to 16-character hex encodings.
Under high write load, it is possible for write transactions from
another process to overrun the WAL between the time when Litestream
performs a RESTART checkpoint and when it obtains the write lock
immediately after. This change adds validation that an overrun has
not occurred and, if it has, it will start a new generation.
Previously, Litestream had a validator that worked most of the time
but also caused some false positives. It is difficult to provide
validation from with Litestream without controlling outside processes
that can also affect the database. As such, validation has been moved
out to the external CI test runner which provides a more consistent
validation process.