How-To 12 min read

How to Export Intercom or HubSpot Conversation Data to PostgreSQL (and What to Do Once It Lands)

Support data is some of the most analytically valuable data in a company — and one of the hardest to model in a warehouse. Intercom and HubSpot both expose it through REST APIs, but the throttle math, the conversation-vs-parts join, and the contact-deduplication problem catch most first-time pipelines. This guide walks the four practical paths to PostgreSQL, the schemas that actually work for joining tickets to CSAT and revenue, and the rate-limit shape that decides whether a self-built ETL is worth your weekend.

Converge Converge Team

Why pipe Intercom or HubSpot data into Postgres in the first place?

The reason to pipe Intercom or HubSpot data into Postgres is that the in-product analytics in both tools answer roughly five questions well and break the moment you ask the sixth. Once you want to join Intercom or HubSpot conversation volume against revenue, segment CSAT by SKU, or attribute first-touch support contact to retention 90 days later, you need the raw rows in a Postgres database you control.

The native dashboards are tuned for support managers, not analysts. Intercom Reports surfaces median response time, CSAT, and resolution counts; HubSpot Service Hub gives you ticket volume by pipeline and SLA breach. Neither lets you cleanly cross-join with Stripe MRR, Segment events, or your product database. Once a CFO or a head of CX asks "which support tags correlate with churn in cohort N," you're in warehouse territory.

The four use cases that drive most pipelines, in roughly the order they show up:

  • Cross-channel reporting. Join support volume against marketing channel, product surface, or revenue band. Native tools can't.
  • Cohort retention analysis. Did customers who hit support in week 1 retain better than those who didn't? Requires conversation timestamps joined to subscription events.
  • BI tool consolidation. Looker, Metabase, Tableau, or Hex already point at your warehouse — adding another data source is cheaper than another BI seat.
  • Long-term retention. Intercom's UI returns the last ~90 days of conversations fast; older data still exists but searching it is slow. Postgres has no such ceiling.

If none of these apply, stop here — the native dashboards are fine and the ETL is overhead you don't need.

What does an Intercom-to-Postgres pipeline actually look like?

An Intercom-to-Postgres pipeline has three moving pieces: an extractor that paginates through Intercom's REST API (or consumes the streaming Webhook + initial backfill), a transformer that flattens the nested conversation_parts structure into normalized rows, and a loader that does upserts into Postgres on the natural keys. The hard part of the pipeline is the second piece — Intercom's conversation object embeds the message thread, and a naive load creates one wide row per thread instead of one row per message.

The endpoints that matter:

  • GET /conversations — list with cursor pagination. Returns conversation summaries (no parts).
  • GET /conversations/{id} — full conversation with conversation_parts.conversation_parts[] array. One API call per conversation if you want messages.
  • GET /contacts — paginated contact list. Custom attributes are top-level fields on the contact object, not nested.
  • GET /companies — for B2B accounts. Linked to contacts via company_id.
  • Webhook topics conversation.user.created, conversation.admin.replied, conversation.user.replied — for incremental sync after the initial backfill.

The fan-out is the trap. A workspace with 50,000 conversations needs 50,000 individual GETs to extract messages — at Intercom's default rate limit of 10,000 calls/minute for private apps (per Intercom's rate-limiting docs), the backfill takes roughly 5 minutes of pure API time, plus your network and Postgres write throughput. A workspace with 5 million historical conversations is a different conversation — that's an 8-hour backfill at sustained limit, and Intercom splits the 10,000/min window into ten 10-second buckets (1,666 calls per 10s), so bursty extractors hit the wall earlier than the per-minute number suggests.

The minimum viable extractor structure, in pseudocode:

  1. List conversations with per_page=150 and the starting_after cursor; persist cursor between runs.
  2. For each conversation, GET the full record and write to a raw landing table (raw_intercom_conversations) keyed on conversation ID.
  3. In a separate transform step, normalize: one row per conversation in conversations, one row per part in conversation_parts, one row per contact in contacts.
  4. For incremental runs, drive off the webhook stream into the raw table, then re-run the transform.

Keep raw and modeled tables separate. The first time Intercom adds a new field — and they do, every quarter — you'll regret schemas built directly on the API response.

What does a HubSpot-to-Postgres pipeline look like, and how is it different?

A HubSpot-to-Postgres pipeline looks structurally similar to the Intercom one but differs in two important ways. HubSpot's CRM API exposes Conversations as a top-level object (separate from Tickets), with Threads and Messages as nested resources. Unlike Intercom, HubSpot doesn't return the full thread in one call — you list threads, then list messages per thread, then optionally join to the Contact and Ticket objects. The HubSpot rate limit is also tighter than Intercom's: 100 requests per 10 seconds on Starter/Free plans, 190/10s on Professional or Enterprise (per HubSpot's API usage guidelines, updated March 2026).

The endpoint map looks like this:

  • GET /conversations/v3/conversations/threads — paginate threads. Returns thread IDs plus a small summary.
  • GET /conversations/v3/conversations/threads/{threadId}/messages — messages in a thread. One call per thread.
  • GET /crm/v3/objects/contacts — paginated contact extraction with the ?properties= query parameter to opt into custom fields.
  • GET /crm/v3/objects/tickets — Service Hub tickets. Related to conversations via the conversation_id association, not a foreign key column.
  • Webhook subscriptionsconversation.creation, conversation.propertyChange, conversation.deletion for incremental capture.

The associations model is the part that surprises ETL authors used to Intercom. In HubSpot, the link between a contact and a ticket is a separate API call: GET /crm/v4/objects/contacts/{contactId}/associations/tickets. Naive pipelines miss this and end up with orphan ticket rows that look like they came from no one. Use the Batch APIs (/crm/v3/objects/contacts/batch/read, up to 100 IDs per call) to bring association reads down to a manageable count.

The other practical difference: HubSpot's daily caps matter. Privately distributed apps get 250,000 calls/day on Starter, 625,000 on Professional, 1,000,000 on Enterprise. A nightly full re-extract of a 200,000-contact account fits comfortably on Professional but cuts close on Starter — design the pipeline as incremental from day one (Last-Modified-Date filter on contacts, webhook-driven for conversations) rather than expecting full refreshes to work forever.

What schemas should the destination tables actually use?

Two principles: keep a raw landing layer that mirrors the API response (JSONB columns are fine here) and a modeled layer with normalized rows you'd actually query. Don't try to query the raw layer directly — schema drift will burn you. Don't try to skip the raw layer — debugging a transform without the original payload is misery.

A minimum viable Intercom schema:

CREATE TABLE conversations (
  id              TEXT PRIMARY KEY,
  contact_id      TEXT NOT NULL,
  state           TEXT NOT NULL,           -- open, closed, snoozed
  priority        TEXT,
  assignee_id     TEXT,
  team_id         TEXT,
  source_type     TEXT,                    -- conversation, push, email, etc.
  source_subject  TEXT,
  created_at      TIMESTAMPTZ NOT NULL,
  updated_at      TIMESTAMPTZ NOT NULL,
  first_admin_reply_at  TIMESTAMPTZ,
  closed_at       TIMESTAMPTZ,
  csat_rating     SMALLINT,
  tags            TEXT[],                  -- denormalized for query convenience
  raw             JSONB NOT NULL           -- full API payload
);
CREATE INDEX conversations_contact_idx ON conversations (contact_id);
CREATE INDEX conversations_created_idx ON conversations (created_at);

CREATE TABLE conversation_parts (
  id              TEXT PRIMARY KEY,
  conversation_id TEXT NOT NULL REFERENCES conversations(id),
  part_type       TEXT NOT NULL,           -- comment, note, assignment, close
  author_type     TEXT NOT NULL,           -- user, admin, bot
  author_id       TEXT,
  body            TEXT,
  created_at      TIMESTAMPTZ NOT NULL,
  raw             JSONB NOT NULL
);
CREATE INDEX conversation_parts_conv_idx ON conversation_parts (conversation_id, created_at);

CREATE TABLE contacts (
  id              TEXT PRIMARY KEY,
  external_id     TEXT,                    -- your app's user ID
  email           TEXT,
  name            TEXT,
  role            TEXT,
  company_id      TEXT,
  custom_attrs    JSONB,
  created_at      TIMESTAMPTZ NOT NULL,
  last_seen_at    TIMESTAMPTZ,
  raw             JSONB NOT NULL
);
CREATE INDEX contacts_email_idx ON contacts (LOWER(email));
CREATE INDEX contacts_external_idx ON contacts (external_id);

The HubSpot analog is almost the same shape with different names:

CREATE TABLE hs_conversations (
  thread_id       TEXT PRIMARY KEY,
  channel         TEXT,                    -- email, live_chat, fb_messenger
  inbox_id        TEXT,
  status          TEXT,                    -- OPEN, CLOSED
  assigned_to     TEXT,
  associated_contact_id TEXT,
  associated_ticket_id  TEXT,
  created_at      TIMESTAMPTZ NOT NULL,
  latest_message_at  TIMESTAMPTZ,
  raw             JSONB NOT NULL
);

CREATE TABLE hs_messages (
  id              TEXT PRIMARY KEY,
  thread_id       TEXT NOT NULL REFERENCES hs_conversations(thread_id),
  sender_type     TEXT NOT NULL,           -- VISITOR, AGENT, BOT, SYSTEM
  sender_id       TEXT,
  text            TEXT,
  rich_text       TEXT,
  created_at      TIMESTAMPTZ NOT NULL,
  raw             JSONB NOT NULL
);
CREATE INDEX hs_messages_thread_idx ON hs_messages (thread_id, created_at);

CREATE TABLE hs_contacts (
  id              TEXT PRIMARY KEY,
  email           TEXT,
  firstname       TEXT,
  lastname        TEXT,
  lifecyclestage  TEXT,
  hs_lead_status  TEXT,
  raw             JSONB NOT NULL
);

If you load both sources into the same warehouse, the unifying layer is a support_conversations view that UNION ALLs the two with a source column ('intercom' or 'hubspot') and aliased columns. That's the table your BI dashboards actually query.

Should you build the ETL yourself or use a managed connector?

For most teams the answer is buy on day one and only consider building if the per-row pricing of managed tools becomes prohibitive at your volume — usually somewhere north of 10 million monthly synced rows. The connectors from Fivetran, Airbyte, Stitch, Hevo, and Estuary all maintain the schema-drift handling for you, and that's the part that quietly eats engineering hours.

ToolPricing modelIntercomHubSpotSelf-hostBest for
FivetranPer active row, tieredYesYesNoEnterprises with budget for a managed source-of-truth
Airbyte (Cloud)Per credit / per GBYesYesYes (OSS)Teams who want self-hosted or hybrid
StitchPer million rowsYesYesNoMid-market teams wanting Singer taps
HevoPer event tierYesYesNoNo-code teams, prebuilt transformations
Estuary FlowPer GB processedYesYesYes (private deployment)Real-time CDC use cases
DIY (Singer tap + custom)Engineering hourstap-intercom (OSS)tap-hubspot (OSS)YesTeams already running Meltano or Airflow

The under-appreciated middle option is self-hosting open-source Airbyte. The Intercom and HubSpot connectors are both maintained by the Airbyte team, schema discovery is handled, and the only ongoing cost is the EC2 or Fly instance you run it on. For a team that already has a data engineer and doesn't want a per-row bill, this is usually the sweet spot.

The reason to build from scratch is narrow: you need transforms that can't be expressed in dbt downstream (rare), you have PII redaction requirements that exclude shipping payloads to a SaaS extractor, or you're below the free tier of every managed tool and don't want a paid contract at all. For most teams, "buy now, replace if you outgrow it" is the correct call — the data lives in Postgres either way, so a future migration is a connector swap, not a re-architecture.

How do you handle the rate limits without losing data?

The way to handle Intercom and HubSpot rate limits without losing data is to read the response headers, respect the 10-second windows on both APIs, and design every extractor as a resumable job with a persisted cursor. Both Intercom and HubSpot return rate-limit metadata in headers (X-RateLimit-Remaining, X-RateLimit-Reset on Intercom; X-HubSpot-RateLimit-Remaining and X-HubSpot-RateLimit-Interval-Milliseconds on HubSpot) — use them, don't guess.

Concrete tactics that work in production:

  • Reactive throttling. Before each request, check the remaining count in the last response. If it's below 10% of the limit, sleep until the reset timestamp. This costs almost nothing and prevents the 429 storm.
  • Exponential backoff with jitter. On 429, sleep min(60s, 2^attempt + random(0, 1)s) and retry up to 5 times. The jitter matters when multiple worker pods hit the limit simultaneously — without it they all retry at the same instant and the storm repeats.
  • Batch the contact reads. HubSpot's POST /crm/v3/objects/contacts/batch/read takes up to 100 IDs per call. A naive per-contact extractor that calls the single-get endpoint 200,000 times is 200,000 rate-limit decrements; the batched version is 2,000.
  • Use webhooks for incrementals. Backfill historical data with the REST API, then subscribe to webhook topics for new conversations. This keeps steady-state API usage near zero and means full backfills are a one-time operation.
  • Persist cursors before processing the batch. If the loader crashes mid-batch you want to resume from the last successful page, not re-paginate from page zero. Write the cursor to a metadata table inside the same transaction as the row inserts.

The number that catches people: Intercom splits the 10,000/minute private-app limit into 10-second buckets of 1,666 calls (per their rate-limiting docs). An extractor that bursts 5,000 calls in 8 seconds is well under the per-minute budget but will hit a 429 immediately. Pace inside the 10-second window, not the minute.

What can you actually do with the data once it's in Postgres?

Once the data is in Postgres, the four most useful queries — in roughly decreasing order of "obviously useful" — are conversation volume joined to revenue, first-response time by channel, support contact frequency as a retention predictor, and tag frequency for root-cause analysis. None of these queries are possible inside Intercom or HubSpot themselves.

The conversation-volume-vs-revenue query, in raw SQL:

SELECT
  date_trunc('week', c.created_at) AS week,
  COUNT(DISTINCT c.id) AS conversations,
  COUNT(DISTINCT c.contact_id) AS unique_customers,
  SUM(s.mrr) FILTER (WHERE s.status = 'active') AS active_mrr
FROM conversations c
JOIN contacts ct ON c.contact_id = ct.id
LEFT JOIN stripe_subscriptions s
  ON LOWER(s.customer_email) = LOWER(ct.email)
WHERE c.created_at >= NOW() - INTERVAL '90 days'
GROUP BY 1
ORDER BY 1;

The retention predictor — "did week-1 support contact correlate with month-6 retention?":

WITH cohort AS (
  SELECT s.customer_email, s.created_at AS signup_at
  FROM stripe_subscriptions s
  WHERE s.created_at BETWEEN '2026-01-01' AND '2026-01-31'
),
contacted_in_week_1 AS (
  SELECT DISTINCT ct.email
  FROM conversations c
  JOIN contacts ct ON c.contact_id = ct.id
  JOIN cohort co ON LOWER(co.customer_email) = LOWER(ct.email)
  WHERE c.created_at BETWEEN co.signup_at AND co.signup_at + INTERVAL '7 days'
)
SELECT
  CASE WHEN cw.email IS NOT NULL THEN 'contacted' ELSE 'not contacted' END AS group_,
  COUNT(*) AS cohort_size,
  AVG(CASE WHEN s2.status = 'active' THEN 1.0 ELSE 0.0 END) AS retained_at_6mo
FROM cohort co
LEFT JOIN contacted_in_week_1 cw ON LOWER(co.customer_email) = cw.email
LEFT JOIN stripe_subscriptions s2 ON LOWER(s2.customer_email) = LOWER(co.customer_email)
GROUP BY 1;

Once these queries exist as views, point Metabase or Looker at Postgres and the dashboards build themselves. The hard work is already done — the support data is in a place where it can be joined to everything else you care about.

What are the common gotchas after the first sync?

The common gotchas after the first sync land in five buckets: contact deduplication, soft-deleted records, timezone handling on created_at, custom-attribute schema drift, and the GDPR / CCPA deletion-propagation requirement. Plan for all five gotchas up front because retrofitting them is painful.

  1. Contact deduplication. Both tools allow the same email to appear on multiple contacts (different sources, merge that never happened, anonymous-to-known promotion). Treat LOWER(email) as the natural key in your unified layer, not the source-system ID.
  2. Soft deletes. Intercom's archive endpoint flips archived: true but keeps the row; HubSpot returns archived contacts only when you set ?archived=true. A naive pipeline that ignores archived records ends up with stale contact data forever.
  3. Timezones. Both APIs return Unix timestamps in seconds (Intercom) or ISO-8601 in UTC (HubSpot). Store as TIMESTAMPTZ in Postgres and convert for display, never the other way around. The week-cohort joins fail silently if half your data is in local time.
  4. Custom-attribute schema drift. A new custom field added in Intercom or HubSpot doesn't update your destination schema. Keep custom attributes in JSONB rather than promoting them to columns until they prove durable.
  5. Right-to-erasure propagation. When a customer requests deletion, the source tool deletes its row but the warehouse copy stays unless you wire up a parallel deletion path. Subscribe to contact.deleted webhooks (Intercom) or use HubSpot's privacy/v3/deletes endpoint, and propagate to Postgres on the same day.

Once these are handled, the pipeline runs itself. Most teams check on it once a month and only intervene when a new field shows up in the source product.

Is there a path that skips the pipeline entirely?

Yes, and it's worth considering before you commit a quarter of engineering time. If you're piping data out of Intercom or HubSpot primarily because the per-seat pricing is forcing you to move support workflow elsewhere anyway, the cleanest move is to do the platform swap and the warehouse export in the same migration window — choose a support tool that ships a clean data export by default, and skip the connector entirely.

The export-quality bar matters here. Some tools export only CSV summaries, which loses the conversation-parts relationship; others expose a full REST API with the same shape as their internal data model, which is what you actually want for a warehouse. Converge, for example, includes a full data export (companies + customers + conversations + messages with platform metadata preserved) at the $49/month flat rate for up to 15 agents, which keeps the unified-inbox tooling and the warehouse export adjacent rather than requiring a separate pipeline product. Whether that swap makes sense depends on your existing investment in Intercom or HubSpot workflows — but it's the option most pipeline write-ups don't mention.

If the platform change isn't on the table, the connector path remains the right answer. Pick Airbyte (self-hosted) or Hevo (managed, no-code) as your starting point, follow the schema in this guide, respect the 10-second rate-limit windows, and you'll have working dashboards within a week.

Key Takeaways

  • Decide whether you actually need a warehouse pipeline — native dashboards answer most operational questions; only build the ETL when you need to join support data against revenue, retention, or product events.
  • Plan around Intercom's 10,000/minute private-app rate limit divided into 10-second buckets (1,666 calls per bucket), not the per-minute number — bursty extractors hit 429s well below the headline limit.
  • Respect HubSpot's tighter window — 100/10s on Starter, 190/10s on Professional, 110/10s on public OAuth apps — and use the batch read endpoints to cut association calls by 100x.
  • Keep a raw JSONB landing layer plus a normalized modeled layer; debugging a transform without the original payload wastes more time than the disk it costs.
  • Use Airbyte (self-hosted, OSS) or Hevo (managed, no-code) before writing a custom extractor — the schema-drift handling alone is worth the cost.
  • Treat the conversation_parts join as the structural challenge — one row per part in the destination, not one wide row per conversation, or every meaningful query becomes impossible.
  • Wire deletion propagation from day one — subscribe to contact.deleted webhooks and HubSpot's privacy/v3/deletes endpoint, then mirror the delete in Postgres on the same day for GDPR and CCPA compliance.

Frequently Asked Questions

Yes. Managed connectors from Fivetran, Stitch, Airbyte Cloud, Hevo, and Estuary Flow all support Intercom-to-Postgres as a prebuilt source-destination pair. Setup is OAuth + destination credentials + table mappings, usually under 30 minutes. Self-hosted Airbyte adds infrastructure work but removes the per-row bill.

At Intercom's default 10,000 calls/minute rate limit for private apps, a workspace with 50,000 conversations takes roughly 5–10 minutes of pure API time (mostly the per-conversation GET to fetch parts). A workspace with 5 million conversations is closer to 8–9 hours sustained. Plan the backfill as a one-time off-hours job and run incrementals (webhooks or last-modified filters) afterward.

Tickets are HubSpot's Service Hub ticket-tracking objects with statuses, pipelines, and SLAs. Conversations are the message threads themselves. A single conversation can be associated with a ticket but doesn't have to be — visitor chats that never escalated to support stay as conversations only. For a full warehouse, extract both and link them via the conversation_id association in the CRM associations API.

Mostly. Fivetran, Airbyte, and Hevo all detect new fields and either auto-add columns (Fivetran's default) or surface a schema-change event for you to approve (Airbyte's default). Custom-property changes on contacts and conversations are handled cleanly; structural changes (a new resource type) usually require a connector version update from the vendor. Keep custom attributes in JSONB on your end as a buffer.

Yes, with a separate tool — Hightouch, Census, Polytomic, or Grouparoo all do reverse ETL into both Intercom and HubSpot. The common use case is enriching contact records in the source CRM with computed segments or lifecycle stages calculated in the warehouse. The forward ETL (source to Postgres) and the reverse ETL (Postgres to source) are separate pipelines with separate rate-limit budgets — plan accordingly.

Ready to try Converge?

$49/month flat. Up to 15 agents. 7-day free trial, no credit card required.

Start Free Trial