You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add a new capability for loader plugins to natively store state in the target system and expose it to Meltano. This would enable Meltano to retrieve state from loaders at the beginning of a run (similar to state backends) and allow loaders to continuously update state during the sync.
Background
Currently, state management in Meltano pipelines relies on state backends or the BookmarkWriter class (see src/meltano/core/plugin/singer/target.py). State messages from targets are captured and persisted externally. This proposal suggests allowing loaders to natively store state in the target system itself and expose a capability for Meltano to retrieve that state.
Proposed Solution
Loaders would expose a new capability (tentatively named store-target, target-state, or similar) that indicates they can:
Retrieve state: At the beginning of a run, dump the current state from the target system
Store state: During the sync, continuously update state in the target system as state messages are received
Meltano would:
Detect this capability in the loader plugin
At the beginning of a run, query the loader for existing state (similar to state backend retrieval)
Pass this state to the tap for incremental extraction
During the sync, the loader receives and stores state messages directly in the target system
On the next run, retrieve the updated state from the target
Benefits
Simplified architecture: State lives where the data lives (in the target system)
Improved reliability: State is atomically stored with data in the target
Better accuracy: State reflects what was actually loaded into the target
Reduced dependencies: Less reliance on separate state backend systems
Natural disaster recovery: State survives as long as the target data survives
Architecture
Current Flow
sequenceDiagram
participant Meltano
participant StateBackend
participant Tap
participant Target
Meltano->>StateBackend: Read previous state
StateBackend->>Meltano: Return state
Meltano->>Tap: Start with state
Tap->>Target: Stream records + STATE messages
Target->>Meltano: Echo STATE messages (stdout)
Meltano->>StateBackend: Save STATE continuously
Note over StateBackend: BookmarkWriter persists<br/>state during sync
Loading
Proposed Flow with store-target Capability
sequenceDiagram
participant Meltano
participant Loader
participant TargetSystem
participant Tap
Note over Meltano: Beginning of run
Meltano->>Loader: Query state (via capability)
Loader->>TargetSystem: Retrieve stored state
TargetSystem->>Loader: Return state
Loader->>Meltano: Dump state
Meltano->>Tap: Start with state
Note over Meltano: During sync
Tap->>Loader: Stream records + STATE messages
Loader->>TargetSystem: Store records + STATE continuously
Note over TargetSystem: State persisted<br/>with data
Note over Meltano: Next run
Meltano->>Loader: Query state again
Note over Loader: Cycle repeats
Loading
State Storage Architecture
graph TB
subgraph "Run N"
A[Meltano] -->|1. Query state| B[Loader Plugin]
B -->|2. Retrieve| C[Target System State Table]
C -->|3. Return state| B
B -->|4. Dump state| A
A -->|5. Pass state| D[Tap]
D -->|6. Records + STATE| B
B -->|7. Store continuously| C
end
subgraph "Run N+1"
A2[Meltano] -->|1. Query state| B2[Loader Plugin]
B2 -->|2. Retrieve updated| C
C -->|3. Return new state| B2
end
C -.->|State persists| C
Loading
State Update Pattern During Sync
sequenceDiagram
participant Tap
participant Loader
participant TargetDB
Note over Tap,TargetDB: Continuous state updates during sync
Tap->>Loader: RECORD (stream1, id=1)
Tap->>Loader: RECORD (stream1, id=2)
Tap->>Loader: STATE (stream1: bookmark=ts2)
Loader->>TargetDB: UPDATE _meltano_state<br/>SET state = {stream1: ts2}
Tap->>Loader: RECORD (stream2, id=5)
Tap->>Loader: STATE (stream2: bookmark=ts5)
Loader->>TargetDB: UPDATE _meltano_state<br/>SET state = {stream1: ts2, stream2: ts5}
Note over TargetDB: State updated atomically<br/>with data commits
Loading
Implementation Considerations
Capability Declaration
Loaders would declare the capability in their plugin definition:
capabilities:
- store-target # or target-state, native-state, etc.
State Retrieval Interface
At the beginning of a run, Meltano would query the loader for state. Potential approaches:
Summary
Add a new capability for loader plugins to natively store state in the target system and expose it to Meltano. This would enable Meltano to retrieve state from loaders at the beginning of a run (similar to state backends) and allow loaders to continuously update state during the sync.
Background
Currently, state management in Meltano pipelines relies on state backends or the
BookmarkWriterclass (seesrc/meltano/core/plugin/singer/target.py). State messages from targets are captured and persisted externally. This proposal suggests allowing loaders to natively store state in the target system itself and expose a capability for Meltano to retrieve that state.Proposed Solution
Loaders would expose a new capability (tentatively named
store-target,target-state, or similar) that indicates they can:Meltano would:
Benefits
Architecture
Current Flow
sequenceDiagram participant Meltano participant StateBackend participant Tap participant Target Meltano->>StateBackend: Read previous state StateBackend->>Meltano: Return state Meltano->>Tap: Start with state Tap->>Target: Stream records + STATE messages Target->>Meltano: Echo STATE messages (stdout) Meltano->>StateBackend: Save STATE continuously Note over StateBackend: BookmarkWriter persists<br/>state during syncProposed Flow with store-target Capability
sequenceDiagram participant Meltano participant Loader participant TargetSystem participant Tap Note over Meltano: Beginning of run Meltano->>Loader: Query state (via capability) Loader->>TargetSystem: Retrieve stored state TargetSystem->>Loader: Return state Loader->>Meltano: Dump state Meltano->>Tap: Start with state Note over Meltano: During sync Tap->>Loader: Stream records + STATE messages Loader->>TargetSystem: Store records + STATE continuously Note over TargetSystem: State persisted<br/>with data Note over Meltano: Next run Meltano->>Loader: Query state again Note over Loader: Cycle repeatsState Storage Architecture
graph TB subgraph "Run N" A[Meltano] -->|1. Query state| B[Loader Plugin] B -->|2. Retrieve| C[Target System State Table] C -->|3. Return state| B B -->|4. Dump state| A A -->|5. Pass state| D[Tap] D -->|6. Records + STATE| B B -->|7. Store continuously| C end subgraph "Run N+1" A2[Meltano] -->|1. Query state| B2[Loader Plugin] B2 -->|2. Retrieve updated| C C -->|3. Return new state| B2 end C -.->|State persists| CState Update Pattern During Sync
sequenceDiagram participant Tap participant Loader participant TargetDB Note over Tap,TargetDB: Continuous state updates during sync Tap->>Loader: RECORD (stream1, id=1) Tap->>Loader: RECORD (stream1, id=2) Tap->>Loader: STATE (stream1: bookmark=ts2) Loader->>TargetDB: UPDATE _meltano_state<br/>SET state = {stream1: ts2} Tap->>Loader: RECORD (stream2, id=5) Tap->>Loader: STATE (stream2: bookmark=ts5) Loader->>TargetDB: UPDATE _meltano_state<br/>SET state = {stream1: ts2, stream2: ts5} Note over TargetDB: State updated atomically<br/>with data commitsImplementation Considerations
Capability Declaration
Loaders would declare the capability in their plugin definition:
State Retrieval Interface
At the beginning of a run, Meltano would query the loader for state. Potential approaches:
State Storage in Target System
The actual storage mechanism would be loader-specific:
Database loaders: Dedicated state table
File-based loaders: Metadata file
Cloud storage loaders: Object metadata or dedicated state object
State Update During Sync
Similar to current
BookmarkWriterbehavior, but handled by the loader:Open Questions
Backward compatibility:
Multi-tap scenarios:
State initialization:
State migration:
Protocol standardization:
Naming: What should this capability be called?
store-targettarget-statenative-statestate-storageComparison with Current Approach
Related Work
src/meltano/core/plugin/singer/target.py(BookmarkWriter)Next Steps
Note: This is a proposal for discussion. Feedback, alternative approaches, and use case insights are welcome!