pg_durable: Microsoft Open Sources In-Database Durable Execution for PostgreSQL

Microsoft just open-sourced pg_durable, a PostgreSQL extension that brings durable workflow execution directly into the database engine. The project eliminates the need for separate orchestration services like Temporal or AWS Step Functions by managing workflow state inside PostgreSQL itself. With over 2,000 GitHub stars within weeks of release, developer interest is significant.

TL;DR: Microsoft has open-sourced pg_durable, a PostgreSQL extension providing in-database durable execution, eliminating the need for external workflow engines. According to GitHub documentation, the extension manages workflow state directly in PostgreSQL, reducing infrastructure overhead by consolidating orchestration logic alongside transactional data.

What Is pg_durable and Why Did Microsoft Open Source It?

pg_durable is an open-source PostgreSQL extension, released by Microsoft, that implements durable execution patterns natively within the database layer. The extension allows developers to define, execute, and monitor long-running workflows without deploying external orchestration infrastructure. Microsoft chose to open-source the project to foster community adoption and accelerate development of reliable workflow tooling in the PostgreSQL ecosystem.

The extension treats workflow state as a first-class database concern. Instead of relying on a separate workflow engine that communicates with PostgreSQL through network calls, pg_durable embeds the orchestration logic where the data already lives. This architectural decision reduces latency, eliminates synchronization complexity, and simplifies operational overhead for teams already running PostgreSQL in production.

Microsoft’s motivation aligns with broader industry trends toward consolidating infrastructure. Running fewer distinct services means fewer failure points. The open-source approach also allows organizations to inspect, audit, and contribute to the codebase directly, building trust in the execution guarantees the extension provides.

The repository includes documentation, example workflows, and integration guides. Developers can define workflows using SQL functions and PL/pgSQL, leveraging existing PostgreSQL skills rather than learning a new domain-specific language. The learning curve is correspondingly shallow for teams already proficient with PostgreSQL administration and development.

How Does pg_durable Handle Durable Execution Inside PostgreSQL?

pg_durable implements durable execution by persisting workflow state to PostgreSQL tables after every step, ensuring that no progress is lost even if the database process restarts or a transaction fails midway. Each workflow instance is tracked as a row in a dedicated state table, with individual step outcomes recorded as the workflow progresses through its defined stages.

The extension uses PostgreSQL’s native transaction and write-ahead logging (WAL) mechanisms to guarantee durability. When a workflow step completes, the state change is committed within a database transaction. If a crash occurs before the next step begins, the workflow resumes from the last committed checkpoint upon recovery. This approach provides exactly-once execution semantics without requiring external coordination services.

pg_durable introduces several custom SQL functions for workflow lifecycle management. Developers call functions to create workflow instances, signal completion of individual steps, query current status, and handle failures or retries. The extension manages the underlying state transitions internally, abstracting away the complexity of durable execution from application code.

Key architectural components include a workflow registry (defining available workflow templates), an instance tracker (managing active executions), an event queue (handling external signals and timers), and a history log (recording every state transition for debugging and auditing). These components map directly to PostgreSQL tables and indexes, benefiting from the database engine’s mature storage and query optimization capabilities.

The extension supports workflow versioning, allowing developers to update workflow definitions without breaking in-flight instances. Old instances continue executing under their original definition, while new instances adopt the updated logic. This versioning model is critical for production systems where workflow schemas evolve over time.

What Problems Does In-Database Workflow Execution Solve?

Traditional workflow orchestration requires running a separate service alongside the database, introducing operational complexity, network latency, and potential consistency issues between the workflow engine’s state and the database’s transactional state. pg_durable eliminates these problems by co-locating workflow execution with the data it operates on, removing the need for distributed synchronization between two distinct systems.

Common pain points that pg_durable addresses include infrastructure sprawl (maintaining separate workflow engines), data consistency risks (workflow state diverging from database state during failures), increased latency (network round-trips between the orchestrator and database), and operational burden (monitoring and scaling two systems instead of one). By consolidating orchestration into PostgreSQL, teams reduce their operational surface area significantly.

Consider a payment processing workflow that deducts funds, creates a transaction record, and sends a notification. With an external orchestrator, a failure between the deduction and record creation could leave the system in an inconsistent state. pg_durable executes all steps within the database’s transactional boundary, ensuring atomicity and eliminating the risk of partial completion that would require manual reconciliation.

The extension is particularly well-suited for organizations already heavily invested in PostgreSQL. Teams can leverage existing backup strategies, replication configurations, and monitoring tools for their workflow state without learning a new operational paradigm. The total cost of ownership decreases because fewer infrastructure components require maintenance, licensing, and expertise.

How Do You Install and Configure pg_durable?

pg_durable installs as a standard PostgreSQL extension using the standard CREATE EXTENSION command after building from source. The installation process requires a PostgreSQL server running version 14 or later, a C compiler, and the PostgreSQL development headers. The extension compiles into a shared library that PostgreSQL loads at runtime.

The basic installation steps involve cloning the GitHub repository, running make and make install, and then executing CREATE EXTENSION pg_durable; in the target database. The extension creates its internal schema, including state tables, event queues, and helper functions, during the extension creation process. No external dependencies or additional services are required beyond PostgreSQL itself.

Configuration options allow administrators to tune workflow execution parameters. Settings include maximum retry counts for failed steps, timeout thresholds for long-running workflows, concurrency limits for parallel step execution, and retention policies for completed workflow history. These parameters are set through PostgreSQL’s standard configuration system, either in postgresql.conf or via ALTER SYSTEM commands.

After installation, developers register workflow definitions using SQL functions provided by the extension. A workflow definition specifies the sequence of steps, error handling rules, and timeout policies. Once registered, workflows can be instantiated by calling a creation function with the appropriate parameters. The extension handles scheduling, execution, and state persistence automatically.

Monitoring active workflows is done through SQL queries against the extension’s state tables. Administrators can view running instances, inspect step-level status, check error logs, and review historical execution data. Integration with standard PostgreSQL monitoring tools like pg_stat_statements is supported out of the box, providing visibility into workflow performance alongside general database metrics.

What Are the Performance Characteristics of pg_durable?

pg_durable achieves durable execution with sub-millisecond latency for state transitions by keeping all workflow state inside PostgreSQL’s transaction engine. The extension leverages PostgreSQL’s native write-ahead logging (WAL) to guarantee durability without requiring a separate coordination service, which eliminates network round-trips that typically add 2–10 milliseconds in distributed orchestrators like Temporal. Microsoft’s benchmarks, shared in the project’s GitHub repository, demonstrate that pg_durable can sustain over 10,000 workflow step completions per second on a single PostgreSQL instance running on commodity hardware.

The architecture avoids the serialization overhead common in external workflow engines. Since state transitions happen within the same database session as business logic, pg_durable reduces the total number of network hops from application to persistence layer. This is a measurable advantage.

Latency for individual state transitions typically falls between 0.5 and 2 milliseconds, depending on the complexity of the workflow step and the underlying disk I/O performance. For workflows that require calling external APIs, pg_durable records the pending call state before executing the outbound request, ensuring that a crash mid-call does not result in lost progress. Upon recovery, the engine replays the step from the recorded state.

Throughput scales vertically with the PostgreSQL instance’s capacity. Users have reported that adding CPU cores and faster NVMe storage yields near-linear throughput improvements up to 32 cores. Beyond that point, connection contention and lock management become bottlenecks. The project documentation recommends connection pooling through PgBouncer for deployments expecting more than 500 concurrent workflow instances.

Memory overhead remains modest. Each active workflow consumes approximately 2–4 kilobytes of shared buffer space for its state record, meaning a PostgreSQL instance with 4 GB of shared buffers can manage hundreds of thousands of simultaneously active workflows without memory pressure.

MetricTypical ValueNotes
State transition latency0.5–2 msSingle-instance, local SSD
Throughput10,000+ steps/secSingle instance, commodity hardware
Memory per workflow2–4 KBShared buffer footprint
Max concurrent workflows500,000+With 4 GB shared buffers
Recovery time< 5 secondsAfter clean shutdown

How Does pg_durable Compare to Temporal and DBOS?

Temporal is a standalone, distributed workflow orchestration platform that runs as a separate cluster of services, while pg_durable embeds the orchestration engine directly into PostgreSQL as a extension. This architectural difference has significant implications for operational complexity, latency characteristics, and deployment topology. Temporal requires operating a separate cluster with its own persistence layer (Cassandra or PostgreSQL), while pg_durable consolidates everything into the database you likely already run.

DBOS, developed by researchers at Stanford and MIT, takes a similar in-database approach but operates as a higher-level framework layered on top of PostgreSQL rather than as a native extension. pg_durable sits closer to the metal. It integrates with PostgreSQL’s internal transaction machinery, which gives it lower overhead per state transition compared to frameworks that operate at the application layer.

Temporal excels in multi-service, multi-team environments where workflows span dozens of microservices across different programming languages. Its gRPC-based API supports clients in Go, Java, Python, TypeScript, and PHP. pg_durable currently targets PostgreSQL-centric architectures where the primary programming language is one with strong PostgreSQL driver support, particularly Rust and Python.

In terms of raw throughput, Temporal clusters can horizontally scale to handle millions of concurrent workflows by adding more worker nodes. pg_durable’s throughput is bounded by a single PostgreSQL instance’s capacity, though PostgreSQL’s logical replication can distribute read-only workflow state queries across replicas. For write-heavy orchestration, vertical scaling is the primary path.

  • Deployment complexity: pg_durable requires installing an extension; Temporal requires provisioning a cluster
  • Latency per step: pg_durable achieves sub-2ms; Temporal typically sees 5–15ms due to network hops
  • Horizontal scalability: Temporal scales to millions of workflows; pg_durable scales vertically to hundreds of thousands
  • Language support: Temporal supports 8+ languages; pg_durable works with any language that has a PostgreSQL driver
  • Operational overhead: pg_durable adds near-zero ops burden; Temporal requires dedicated infrastructure management
  • Consistency model: pg_durable uses PostgreSQL’s ACID transactions; Temporal uses eventual consistency in some configurations
  • Observability: Temporal has a built-in web UI; pg_durable relies on PostgreSQL’s system catalogs and query tools
  • Maturity: Temporal is production-proven at scale at companies like Datadog and Snap; pg_durable is a newer Microsoft open-source project

What Are the Limitations and Trade-offs of pg_durable?

The most significant limitation of pg_durable is its single-database constraint: all workflow state must reside in one PostgreSQL instance, which caps horizontal scalability at the throughput of that single node. Unlike Temporal, which distributes workflow execution across a cluster of machines, pg_durable cannot natively partition workflow state across multiple PostgreSQL instances for write scaling. This makes it unsuitable for workloads requiring millions of concurrent workflows distributed across geographic regions.

Another trade-off involves the programming model. pg_durable workflows are defined using SQL functions and procedural logic executed within the database session. This contrasts with Temporal’s approach, where workflows are written in general-purpose programming languages with full access to language-specific libraries and tooling. Developers accustomed to writing orchestration logic in TypeScript or Go may find SQL-based workflow definitions less ergonomic for complex branching and error handling patterns.

The extension’s maturity also presents a consideration. As a relatively new open-source project from Microsoft, pg_durable has a smaller community, fewer production deployments, and less battle-tested edge case handling compared to Temporal, which has been running in production at major companies since 2020. The documentation, while functional, does not yet cover the breadth of operational scenarios that Temporal’s documentation addresses.

  • No native multi-region or cross-database workflow support
  • Workflow logic expressed in SQL may feel unfamiliar to application developers
  • Smaller ecosystem of integrations and community-contributed connectors
  • Limited built-in observability compared to dedicated orchestration platforms
  • Vertical scaling ceiling determined by single PostgreSQL instance capacity
  • Fewer client libraries and SDKs available across programming languages
  • Upgrade path requires PostgreSQL extension management, which can be more involved than updating a standalone service
  • Testing workflow logic requires a running PostgreSQL instance with the extension installed

Which Production Workloads Fit pg_durable Best?

pg_durable fits production workloads that already rely heavily on PostgreSQL as their primary data store and where workflow orchestration needs are moderate in scale—typically under 100,000 concurrent workflow instances. Use cases include order processing pipelines, subscription lifecycle management, ETL orchestration, and multi-step API integration patterns where the business data and workflow state naturally coexist in the same database. Teams that want durable execution without introducing a new operational dependency find pg_durable particularly appealing.

Financial services companies processing trade settlements can benefit from pg_durable’s ACID-guaranteed state transitions, ensuring that no settlement step is lost even during database failover. SaaS platforms managing user onboarding flows—account creation, email verification, provisioning, billing setup—can encode these multi-step processes as pg_durable workflows, gaining automatic retry and recovery without standing up Temporal infrastructure.

IoT platforms that ingest device telemetry and trigger multi-step processing pipelines also fit well. Since the telemetry data often lands in PostgreSQL anyway, co-locating the orchestration logic eliminates a network hop and simplifies the architecture. Batch processing systems that chain SQL-heavy transformation steps together can use pg_durable to track progress and resume from failures without external job schedulers.

The common thread across these workloads is PostgreSQL centrality. When your database is already the system of record and your orchestration needs don’t require multi-service, multi-language coordination at massive scale, pg_durable provides a compelling simplification. It removes an entire layer of infrastructure.

  • Order processing with payment capture, inventory deduction, and shipping coordination
  • Subscription lifecycle: trial start, conversion, renewal, cancellation cascades
  • ETL pipelines with sequential transformation and validation steps
  • User onboarding flows spanning account creation through provisioning
  • Trade settlement processing requiring ACID-guaranteed state transitions
  • Document approval workflows with multi-step sign-off chains
  • IoT telemetry processing with triggered action sequences
  • Batch report generation with dependent calculation stages

How Does pg_durable Handle Failures and Recovery?

pg_durable handles failures by recording every state transition as a WAL-logged database transaction, which means PostgreSQL’s crash recovery mechanism automatically restores workflow state to the last committed step after any unexpected shutdown. When PostgreSQL restarts following a crash, the recovery process replays the WAL to bring the database to a consistent state, and pg_durable workflows resume from their last recorded step without manual intervention. This mechanism provides the same durability guarantees that PostgreSQL offers for all transactional data.

For partial failures—where an individual workflow step fails due to a transient error such as a network timeout or a temporary external service unavailability—pg_durable supports configurable retry policies. Developers can specify maximum retry counts, backoff intervals, and exponential backoff multipliers directly in the workflow definition. The extension tracks retry attempts in the workflow state table, ensuring that retries survive database restarts.

Workflow-level timeouts prevent stuck workflows from consuming resources indefinitely. If a step does not complete within its configured timeout, pg_durable marks the step as timed out and triggers the workflow’s error handling logic. This prevents resource leaks. Administrators can query timed-out workflows using standard SQL against pg_durable’s system views.

For planned maintenance, pg_durable supports graceful shutdown. Active workflows complete their current step before the extension signals readiness for shutdown. After maintenance completes and PostgreSQL restarts, workflows resume automatically. The project documentation reports recovery times under 5 seconds for clean shutdowns and under 30 seconds for crash recoveries on instances managing 50,000 active workflows.

Failure ScenarioRecovery BehaviorTypical Recovery Time
Database crashWAL replay restores last committed state10–30 seconds
Clean shutdown/restartActive steps complete, then resume< 5 seconds
Individual step failureConfigurable retry with backoffImmediate (retry scheduled)
Step timeoutMarked as timed out, error handler invokedPer configured timeout
Disk failure (with replication)Standby promotes, workflows resumeDepends on replication setup

Frequently Asked Questions

Does pg_durable require a separate database or service?

No, pg_durable runs entirely inside PostgreSQL as a native extension, requiring no separate database, service cluster, or external dependency. All workflow state is stored in tables within your existing PostgreSQL database, and the engine operates as part of the database process itself. Installation follows the standard PostgreSQL extension pattern of running CREATE EXTENSION pg_durable after loading the shared library into the PostgreSQL instance.

How does pg_durable compare to Temporal for workflow orchestration?

pg_durable embeds orchestration inside PostgreSQL with sub-2ms latency per state transition, while Temporal runs as a separate cluster typically introducing 5–15ms latency per step due to network communication between workers and the Temporal server. Temporal supports horizontal scaling across multiple nodes and multi-language SDKs, making it better suited for large-scale, polyglot microservice environments. pg_durable trades horizontal scalability for operational simplicity, eliminating the need to manage a separate orchestration infrastructure.

Can pg_durable handle high-throughput transactional workloads?

Microsoft’s benchmarks show pg_durable sustaining over 10,000 workflow step completions per second on a single PostgreSQL instance with commodity hardware, which covers the throughput requirements of most mid-range applications. However, since all workflow state lives in one PostgreSQL instance, write throughput is ultimately bounded by that instance’s capacity and cannot be horizontally scaled across multiple database nodes. For workloads exceeding several hundred thousand concurrent workflows, a distributed orchestrator like Temporal may be more appropriate.

What PostgreSQL versions does pg_durable support?

pg_durable requires PostgreSQL 14 or later, as it relies on features introduced in that version including enhanced procedure execution and specific WAL improvements. The project’s GitHub repository provides pre-built binaries for PostgreSQL 14, 15, and 16 on Linux (Debian and RHEL-based distributions), with source compilation available for other platforms. The development team has indicated plans to support PostgreSQL 17 following its stable release.

Summary

pg_durable represents a pragmatic approach to durable execution by embedding workflow orchestration directly into PostgreSQL, eliminating the operational overhead of running a separate coordination service. Here are the key takeaways:

  • Sub-2ms latency per state transition by keeping workflow state inside PostgreSQL’s transaction engine, compared to 5–15ms for external orchestrators
  • 10,000+ steps per second sustained throughput on commodity hardware, suitable for most mid-range production workloads
  • Zero additional infrastructure required—no separate cluster, no new service to monitor, no additional deployment pipeline
  • PostgreSQL 14+ required, with the extension leveraging WAL-based crash recovery for automatic workflow resumption
  • Best suited for PostgreSQL-centric architectures where business data and workflow state naturally coexist in the same database

If your team already runs PostgreSQL and needs durable workflow execution without introducing Temporal’s operational complexity, pg_durable deserves a close look. Check out the pg_durable GitHub repository to explore the source code, benchmarks, and documentation.