* filer: propagate lazy metadata deletes to remote mounts
Delete operations now call the remote backend for mounted remote-only entries before removing filer metadata, keeping remote state aligned and preserving retry semantics on remote failures.
Made-with: Cursor
* filer: harden remote delete metadata recovery
Persist remote-delete metadata pendings so local entry removal can be retried after failures, and return explicit errors when remote client resolution fails to prevent silent local-only deletes.
Made-with: Cursor
* filer: streamline remote delete client lookup and logging
Avoid a redundant mount trie traversal by resolving the remote client directly from the matched mount location, and add parity logging for successful remote directory deletions.
Made-with: Cursor
* filer: harden pending remote metadata deletion flow
Retry pending-marker writes before local delete, fail closed when marking cannot be persisted, and start remote pending reconciliation only after the filer store is initialised to avoid nil store access.
Made-with: Cursor
* filer: avoid lazy fetch in pending metadata reconciliation
Use a local-only entry lookup during pending remote metadata reconciliation so cache misses do not trigger remote lazy fetches.
Made-with: Cursor
* filer: serialise concurrent index read-modify-write in pending metadata deletion
Add remoteMetadataDeletionIndexMu to Filer and acquire it for the full
read→mutate→commit sequence in markRemoteMetadataDeletionPending and
clearRemoteMetadataDeletionPending, preventing concurrent goroutines
from overwriting each other's index updates.
Made-with: Cursor
* filer: start remote deletion reconciliation loop in NewFiler
Move the background goroutine for pending remote metadata deletion
reconciliation from SetStore (where it was gated by sync.Once) to
NewFiler alongside the existing loopProcessingDeletion goroutine.
The sync.Once approach was problematic: it buried a goroutine launch
as a side effect of a setter, was unrecoverable if the goroutine
panicked, could race with store initialisation, and coupled its
lifecycle to unrelated shutdown machinery. The existing nil-store
guard in reconcilePendingRemoteMetadataDeletions handles the window
before SetStore is called.
* filer: skip remote delete for replicated deletes from other filers
When isFromOtherCluster is true the delete was already propagated to
the remote backend by the originating filer. Repeating the remote
delete on every replica doubles API calls, and a transient remote
failure on the replica would block local metadata cleanup — leaving
filers inconsistent.
* filer: skip pending marking for directory remote deletes
Directory remote deletes are idempotent and do not need the
pending/reconcile machinery that was designed for file deletes where
the local metadata delete might fail after the remote object is
already removed.
* filer: propagate remote deletes for children in recursive folder deletion
doBatchDeleteFolderMetaAndData iterated child files but only called
NotifyUpdateEvent and collected chunks — it never called
maybeDeleteFromRemote for individual children. This left orphaned
objects in the remote backend when a directory containing remote-only
files was recursively deleted.
Also fix isFromOtherCluster being hardcoded to false in the recursive
call to doBatchDeleteFolderMetaAndData for subdirectories.
* filer: simplify pending remote deletion tracking to single index key
Replace the double-bookkeeping scheme (individual KV entry per path +
newline-delimited index key) with a single index key that stores paths
directly. This removes the per-path KV writes/deletes, the base64
encoding round-trip, and the transaction overhead that was only needed
to keep the two representations in sync.
* filer: address review feedback on remote deletion flow
- Distinguish missing remote config from client initialization failure
in maybeDeleteFromRemote error messages.
- Use a detached context (30s timeout) for pending-mark and
pending-clear KV writes so they survive request cancellation after
the remote object has already been deleted.
- Emit NotifyUpdateEvent in reconcilePendingRemoteMetadataDeletions
after a successful retry deletion so downstream watchers and replicas
learn about the eventual metadata removal.
* filer: remove background reconciliation for pending remote deletions
The pending-mark/reconciliation machinery (KV index, mutex, background
loop, detached contexts) handled the narrow case where the remote
object was deleted but the subsequent local metadata delete failed.
The client already receives the error and can retry — on retry the
remote not-found is treated as success and the local delete proceeds
normally. The added complexity (and new edge cases around
NotifyUpdateEvent, multi-filer consistency during reconciliation, and
context lifetime) is not justified for a transient store failure the
caller already handles.
Remove: loopProcessingRemoteMetadataDeletionPending,
reconcilePendingRemoteMetadataDeletions, markRemoteMetadataDeletionPending,
clearRemoteMetadataDeletionPending, listPendingRemoteMetadataDeletionPaths,
encodePendingRemoteMetadataDeletionIndex, FindEntryLocal, and all
associated constants, fields, and test infrastructure.
* filer: fix test stubs and add early exit on child remote delete error
- Refactor stubFilerStore to release lock before invoking callbacks and
propagate callback errors, preventing potential deadlocks in tests
- Implement ListDirectoryPrefixedEntries with proper prefix filtering
instead of delegating to the unfiltered ListDirectoryEntries
- Add continue after setting err on child remote delete failure in
doBatchDeleteFolderMetaAndData to skip further processing of the
failed entry
* filer: propagate child remote delete error instead of silently continuing
Replace `continue` with early `break` when maybeDeleteFromRemote fails
for a child entry during recursive folder deletion. The previous
`continue` skipped the error check at the end of the loop body, so a
subsequent successful entry would overwrite err and the remote delete
error was silently lost. Now the loop breaks, the existing error check
returns the error, and NotifyUpdateEvent / chunk collection are
correctly skipped for the failed entry.
* filer: delete remote file when entry has Remote pointer, not only when remote-only
Replace IsInRemoteOnly() guard with entry.Remote == nil check in
maybeDeleteFromRemote. IsInRemoteOnly() requires zero local chunks and
RemoteSize > 0, which incorrectly skips remote deletion for cached
files (local chunks exist) and zero-byte remote objects (RemoteSize 0).
The correct condition is whether the entry has a remote backing object
at all.
---------
Co-authored-by: Chris Lu <chris.lu@gmail.com>