RentalTideRentalTideDocs
Dashboard

Reliability technical appendix

Infrastructure details, RTO and RPO targets, and the specific architecture choices behind RentalTide's continuity guarantees

Reliability technical appendix

This page is for technical decision makers — IT, security, procurement — who want the architecture behind the reliability overview. It documents where RentalTide runs, what we replicate, how fast we recover, and what dependencies sit outside our direct control.


Hosting and regions

RentalTide is hosted entirely on Amazon Web Services. We do not operate any on-premise servers and do not store production data on employee devices.

ComponentPrimary regionNotes
Application servers (API + web)us-east-1Containerised, autoscaled across three Availability Zones
Primary database (PostgreSQL Aurora)us-east-1Multi-AZ, synchronous replication
Object storage (photos, attachments, PDFs)us-east-1S3 with versioning and cross-region replication
Background jobs and scheduled tasksus-east-1Lambda, run independently of the web tier
Static site delivery (booking widget, docs)Global edgeCloudFront, served from 400+ points of presence worldwide
Backups (snapshots and continuous)us-east-1 + us-west-2Cross-region replication for disaster recovery

The booking widget and customer-facing checkout pages are served from CloudFront's global edge, so a us-east-1 regional event does not take the booking widget offline immediately — pages still render from cached assets, and any submission queues up against the API layer.


Database resilience

The primary database is Aurora Serverless v2 PostgreSQL.

  • Synchronous replication across three Availability Zones. A write is acknowledged only when it is durable on storage in two zones. Loss of a single zone is invisible to the application.
  • Automatic failover within a region. If the writer node becomes unhealthy, Aurora promotes a reader to writer in under 60 seconds.
  • Continuous backup. Every write is shipped to S3 in near-real-time. Point-in-time restore is available for any second within the last seven days.
  • Daily automated snapshots. Retained for 35 days.
  • Cross-region snapshot replication. Daily snapshots are copied to us-west-2 for full-region disaster recovery.

For full-region recovery, the recovery point objective is bounded by replication lag to us-west-2, which is typically under five minutes.


Application tier

The application tier runs as containers on AWS ECS Fargate behind an Application Load Balancer.

  • Autoscaled across three Availability Zones. A single AZ loss removes roughly one-third of capacity; autoscaling refills within minutes.
  • Health checks every 15 seconds. Failing instances are removed from the load balancer automatically.
  • Rolling deploys. New container versions are rolled out a fraction at a time. Health checks gate progression. A bad deploy stops itself before reaching full fleet.
  • Automatic rollback. Deploys that fail health checks revert to the previous task definition without manual intervention.
  • Zero-downtime deploys. New containers come up healthy before old containers are drained.

Recovery time and recovery point objectives

Recovery time objective (RTO) is the maximum acceptable time from incident start to service restoration. Recovery point objective (RPO) is the maximum acceptable data loss measured in time.

Failure scenarioRTO targetRPO targetMechanism
Single application instance crash60 secondsZeroContainer restart, load balancer reroute
Single Availability Zone failure2 minutesZeroMulti-AZ Aurora and ECS distribute traffic to healthy AZs
Failed deploy5 minutesZeroAutomated rollback to previous task definition
Aurora writer node failure90 secondsZeroAurora failover promotes a reader
Application-level bug (data corruption)30 minutesBounded by detection timeTargeted SQL repair using journal_entries audit log
Catastrophic data corruption6 hours5 minutesPoint-in-time restore from continuous backup
Full region outage (us-east-1)4 hours5 minutesRestore from cross-region snapshot in us-west-2

RTO and RPO targets are reviewed quarterly and tested via game days at least twice per year.


Audit and reconstruction

Beyond raw database backups, RentalTide maintains an append-only journal (journal_entries) for every financial mutation: payments, refunds, fees, taxes, AR adjustments. This is double-entry by design.

This means:

  • The cache columns on rental_bookings (amount paid, outstanding balance, amount refunded) can be reconstructed from the journal at any time.
  • A bug that corrupts cache fields without corrupting the journal is recoverable by re-running the cache sync from journal entries.
  • Forensic queries (who paid what, when, on which booking) survive any cache corruption.

We treat journal_entries as the source of truth. The reliability of that table is the reliability of your books.


External dependencies

RentalTide depends on a handful of external SaaS providers. We hold ourselves to the union of their availability, so we choose them carefully.

ProviderWhat we use it forTheir stated SLAFailure behaviour
StripePayments, terminals, Connect99.99%Card capture and refunds fail with clear errors; no data loss
Auth0Staff authentication99.9%Existing sessions stay valid; new sign-ins blocked
SendGridTransactional email99.95%Emails queue; retried for 72 hours
TwilioSMS and voice99.95%SMS retries; voice routing degrades to next provider
AWSCompute, storage, database99.99% per AZMulti-AZ design tolerates single-AZ loss invisibly
OpenAI / AnthropicAI features (docs bot, phone, summaries)99.9%AI features degrade gracefully; core ops unaffected

A failure of any single external provider does not take RentalTide down. The most common visible effect is one feature (typically payment capture or outbound email) showing transient errors while the provider recovers.


Backups and retention

AssetFrequencyRetentionStorage location
Aurora continuous backupReal-time7 daysS3, us-east-1
Aurora daily snapshotsDaily, 04:00 UTC35 daysS3, us-east-1
Cross-region snapshot copyDaily35 daysS3, us-west-2
S3 object versioningEvery writeIndefiniteS3, us-east-1
S3 cross-region replicationEvery writeIndefiniteS3, us-west-2
Application logsReal-time90 daysCloudWatch, us-east-1
Audit / journal entriesReal-timeIndefiniteAurora journal_entries table

Backups are encrypted at rest with AWS-managed keys. Cross-region replication uses separate keys per region to limit the blast radius of a key compromise.


Security controls relevant to availability

Availability is also a security concern. We protect against:

  • DDoS — AWS Shield Standard is always on. CloudFront absorbs volumetric attacks at the edge.
  • Credential theft — Auth0 enforces MFA for staff accounts, and bearer tokens are short-lived (1 hour) with rotating refresh tokens.
  • Insider risk — production database access requires SSO and is audited. No engineer has standing access to customer data; access is just-in-time and logged.
  • Ransomware — backups are immutable for their retention period (S3 Object Lock on snapshots).

Status, communication, and runbooks

When something goes wrong, you will hear from us in these places:

  • status.rentaltide.com — public status page, updated within minutes of detection. This is the source of truth for the platform overall.
  • Your support channel — Slack (for partner-tier customers) or email, with location-specific impact and ETA.
  • Postmortems — every incident with more than 15 minutes of customer impact gets a written postmortem within 5 business days, in the format described in our internal COE template.

Internally, we maintain detailed runbooks for each failure mode under /docs/contingency/ in our codebase. These cover region failover, internet outage procedures, server crash response, and physical-disaster business continuity. They are not public, but the executive summaries are available on request for procurement and security reviews.


Game days and chaos testing

Twice a year (typically March and September, ahead of peak season), we run game-day exercises that test:

  • Aurora writer failover under load
  • ECS task termination across an entire AZ
  • Point-in-time database restore into a separate environment
  • Cross-region snapshot restore
  • Incident response paging, status page updates, and customer communication

Results from each game day are recorded internally and fed back into our recovery time objectives.


Reporting and questions

For SOC 2 reports, vendor security assessments, or anything that requires a signed document, reach out to your account contact or security@rentaltide.com. We will route the request to the right place.

For technical questions about a specific failure mode, reach out in your support channel — we are happy to walk through any of the above in more detail.

Was this page helpful?
Need help? Contact Support.See what’s new. Check out changelog.Questions? Book a video chat.
Ask AI
Responses are generated using AI and may contain mistakes.
Ask questions about RentalTide and get help with your integration.