API Reference

REST · Bearer auth · JSON

API Reference

The WellMarked API extracts the main content from any webpage and returns clean, structured Markdown. All API access is over HTTPS to api.wellmarked.io.

Base URL: https://api.wellmarked.io

Official SDKs

Typed responses · polymorphic jobs · typed errors

PythonPyPI

pip install wellmarked

JavaScript / TypeScriptnpm

npm install wellmarked

Every code sample below has tabs for curl, python, javascript, and typescript — pick whichever fits your stack.

Authentication

Authenticate requests by including your API key in the Authorization header. Keys start with wm_.

Authorization Header

Authorization: Bearer wm_your_api_key_here

You can get your API key from the account dashboard. Keep your key secret — never expose it in client-side code.

POST

/extract

Extract clean Markdown from a URL.

Parameters

Parameter	Type	Required	Description
url	string	required	The URL to extract content from
render_js	boolean	optional	Use Playwright for JS-rendered pages (Pro+)

Request

python

# pip install wellmarked
from wellmarked import WellMarked

with WellMarked(api_key="wm_...") as wm:
    result = wm.extract("https://example.com/article")
    print(result.markdown)
    print(result.metadata.title, "by", result.metadata.author)

Response

200 OK

{
  "markdown": "## Article Title\n\nClean paragraph text...",
  "metadata": {
    "title": "Article Title",
    "author": "Jane Smith",
    "date": "2026-05-01",
    "url": "https://example.com/article",
    "retrieved_at": "2026-05-16T12:34:56+00:00"
  },
  "request_id": "b3d2f1a0-..."
}

metadata.retrieved_at is the ISO 8601 timestamp at which WellMarked actually fetched the page — useful for cache-freshness decisions in downstream pipelines. Distinct from metadata.date, which is the article's published date and is often null. Returned on every extraction surface — single /extract, every /bulk item, and every /crawl page.

POST

/bulk

Pro+

Submit multiple URLs for concurrent extraction. Each URL in the request counts as one request toward your quota — the whole batch is reserved atomically. Processing is asynchronous; you receive a job_id to poll for results, and jobs are retained for 6 hours.

Plan Limits

Plan	Bulk Access	Max URLs/Request
Free	Not available	---
Pro	✓	50
Growth	✓	200
Enterprise	✓	Unlimited

Parameters

Parameter	Type	Required	Description
urls	string[]	required	Array of URLs to extract (must contain at least one)
render_js	boolean	optional	Enable JS rendering for all URLs
webhook_url	string	optional	HTTPS URL we POST a signed `job.completed` notification to when the job finishes — no polling needed. See Webhooks.
webhook_include_results	boolean	optional	When `true`, inline the full `results` array in the webhook payload (capped at ~5 MB). Defaults to `false` (thin payload with `results_url`).

Request

python

from wellmarked import WellMarked

with WellMarked(api_key="wm_...") as wm:
    job = wm.bulk([
        "https://example.com/article-1",
        "https://example.com/article-2",
        "https://example.com/article-3",
    ])
    print(f"Queued job {job.job_id} with {job.total} URLs")

    job = wm.wait_for_job(job.job_id)        # blocks until status == "done"
    for item in job.results:
        if item.ok:
            print(item.metadata.title)
        else:
            print(f"{item.url} failed: {item.error}")

Response

200 OK

{
  "job_id": "d4e5f6a7-b8c9-4d0e-a1f2-3b4c5d6e7f8a",
  "status": "queued",
  "total": 3,
  "completed": 0,
  "results": [],
  "webhook_signing_secret": "whsec_..."
}

status is one of queued, processing, or done. Use the returned job_id with GET /bulk/{job_id} to poll results. Enterprise jobs are drained ahead of Pro, and Pro ahead of Free, when there's queue contention.

webhook_signing_secret is returned once — on the submission that first mints it for your account. Save it before discarding the response; subsequent submissions return null. If you lose it, mint a new one with POST /webhook/rotate.

GET

/bulk/{job_id}

Poll the status and results of a bulk extraction job. Returns 404 job_not_found if the job ID is unknown or its 6-hour retention window has passed, and 403 forbidden if the job belongs to another account.

Request

python

from wellmarked import WellMarked

with WellMarked(api_key="wm_...") as wm:
    # get_job is polymorphic — works for both bulk and crawl ids.
    job = wm.get_job(job_id)
    print(f"{job.status}: {job.completed}/{job.total}")
    # Or block until done in one call (polls every 2s up to a 5-minute timeout):
    job = wm.wait_for_job(job_id)

Response (job done)

200 OK

{
  "job_id": "d4e5f6a7-b8c9-4d0e-a1f2-3b4c5d6e7f8a",
  "status": "done",
  "total": 3,
  "completed": 3,
  "results": [
    {
      "url": "https://example.com/article-1",
      "markdown": "## Article 1\n\nExtracted markdown...",
      "metadata": {
        "title": "Article 1",
        "author": null,
        "date": null,
        "url": "https://example.com/article-1",
        "retrieved_at": "2026-01-15T10:30:00.812Z"
      },
      "error": null
    },
    {
      "url": "https://example.com/article-2",
      "markdown": null,
      "metadata": null,
      "error": "target_timeout"
    }
  ],
  "created_at": "2026-01-15T10:29:58Z",
  "finished_at": "2026-01-15T10:30:02Z"
}

Per-URL failures appear in-band with markdown / metadata set to null and an error code — they do not fail the whole job.

POST

/crawl

Pro+

Crawl a site starting from a root URL — BFS through all same-site links and extract Markdown from every page reached. Returns a queued job; poll GET /crawl/{job_id} for progress and results. "Same site" is the registered domain (eTLD+1) of the root URL.

Plan	Crawl Access	Max Depth	Max Pages
Free	✗	—	—
Pro	✓	5	2,000
Growth	✓	10	10,000
Enterprise	✓	Unlimited	Unlimited

Each successfully extracted page consumes one request from your monthly quota — failed pages (timeouts, robots-disallowed, no-content) are not billed. If you run out of quota mid-crawl, the job stops and returns what it has with truncated_reason: "quota_exhausted". Robots.txt for the root host is honoured for every page.

Request

python

from wellmarked import WellMarked

with WellMarked(api_key="wm_...") as wm:
    job = wm.crawl("https://docs.example.com", depth=2)
    print(f"Queued crawl {job.job_id}")

    # wait_for_job is polymorphic — works for both bulk and crawl ids.
    job = wm.wait_for_job(job.job_id)
    for page in job.results:
        if page.ok:
            print(f"depth={page.depth} {page.metadata.title}")

Response

200 OK

{
  "job_id": "9aaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa",
  "status": "queued",
  "total": 0,
  "completed": 0,
  "truncated": false,
  "truncated_reason": null,
  "results": [],
  "webhook_signing_secret": "whsec_..."
}

total starts at 0 and grows as the crawler discovers and processes pages — unlike /bulk, the page count isn't known up front. Use the returned job_id with GET /crawl/{job_id} to poll results — or pass webhook_url on submit and we'll POST you when it's done (see Webhooks). Same one-time webhook_signing_secret semantics as /bulk.

GET

/crawl/{job_id}

Poll the status and results of a crawl job. Same retention (6 hours) and auth model as the bulk endpoint — jobs you don't own return 403 forbidden, expired or unknown jobs return 404 job_not_found.

Request

python

from wellmarked import WellMarked, CrawlJob

with WellMarked(api_key="wm_...") as wm:
    # get_job is polymorphic — works for both bulk and crawl ids.
    job = wm.get_job(job_id)
    print(f"{job.status}: {job.completed} pages crawled")
    if isinstance(job, CrawlJob) and job.truncated:
        print(f"truncated: {job.truncated_reason}")

Response (job done)

200 OK

{
  "job_id": "9aaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa",
  "status": "done",
  "total": 47,
  "completed": 47,
  "truncated": false,
  "truncated_reason": null,
  "created_at": "2026-05-15T10:00:00Z",
  "finished_at": "2026-05-15T10:02:30Z",
  "results": [
    {
      "url": "https://docs.example.com",
      "depth": 0,
      "markdown": "## Welcome\n\n...",
      "metadata": { "title": "Welcome", "author": null, "date": null, "url": "https://docs.example.com", "retrieved_at": "2026-05-15T10:00:02Z" },
      "error": null
    },
    {
      "url": "https://docs.example.com/api",
      "depth": 1,
      "markdown": "## API Reference\n\n...",
      "metadata": { "title": "API Reference", "author": null, "date": null, "url": "https://docs.example.com/api", "retrieved_at": "2026-05-15T10:01:12Z" },
      "error": null
    }
  ]
}

truncated_reason is one of null (completed normally), "page_cap_reached" (hit your plan's page cap), or "quota_exhausted" (monthly quota ran out mid-crawl).

Webhooks

Pro+

Pass webhook_url to POST /bulk or POST /crawl and we'll POST a signed job.completed notification the moment the job is done — no polling needed.

1. Submit a job with a webhook

python

from wellmarked import WellMarked

with WellMarked(api_key="wm_...") as wm:
    job = wm.bulk(
        ["https://example.com/a", "https://example.com/b"],
        webhook_url="https://yourapp.com/hooks/wm",
    )
    # First submission with a webhook_url ever — store the secret.
    # Subsequent submissions return None for this field.
    if job.webhook_signing_secret is not None:
        save_to_env("WELLMARKED_WEBHOOK_SECRET", job.webhook_signing_secret)

The very first submission with a webhook_url mints your account's HMAC signing secret and returns it as webhook_signing_secret on the response — one-time visibility, save it before discarding. Subsequent submissions return null for the field. Lost it? POST /webhook/rotate.

2. What we POST to your endpoint

Default ("thin") payload — metadata and a results_url you can GET with your normal API key:

POST to your webhook_url

{
  "event":        "job.completed",
  "job_id":       "1c4f9a02-b8c9-4d0e-a1f2-3b4c5d6e7f8a",
  "kind":         "bulk",
  "status":       "done",
  "total":        50,
  "completed":    48,
  "finished_at":  "2026-05-27T14:02:14.812Z",
  "results_url":  "https://api.wellmarked.io/bulk/1c4f9a02-b8c9-4d0e-a1f2-3b4c5d6e7f8a"
}

For crawl jobs ("kind": "crawl") the payload also carries truncated and truncated_reason. Pass webhook_include_results: true on submission to inline the full results array (capped at ~5 MB — over the cap the payload silently falls back to the thin shape with results_truncated_for_size: true).

3. Verify the signature

Each delivery carries these headers:

Header	Purpose
X-WellMarked-Delivery-Id	UUID, stable across retries. Use as your idempotency key.
X-WellMarked-Timestamp	Unix seconds when this attempt was signed. Reject if drift exceeds 5 minutes.
X-WellMarked-Signature	`v1,<base64 HMAC-SHA256>` over `${delivery_id}.${timestamp}.${raw_body_bytes}`. Key is `bytes.fromhex(secret.removeprefix("whsec_"))`.

Both SDKs ship a verifier — use it rather than reimplementing HMAC by hand. The verifier works in Node 18.17+, Cloudflare Workers, Deno, Bun, and modern browsers.

python

# pip install wellmarked
from fastapi import FastAPI, Request, Response
from wellmarked import verify_webhook, WebhookVerificationError, WellMarked

app = FastAPI()
SECRET = os.environ["WELLMARKED_WEBHOOK_SECRET"]
wm = WellMarked(api_key=os.environ["WELLMARKED_API_KEY"])

@app.post("/hooks/wm")
async def hook(request: Request):
    try:
        payload = verify_webhook(
            secret=SECRET,
            headers=request.headers,
            body=await request.body(),    # MUST be raw bytes
        )
    except WebhookVerificationError:
        return Response(status_code=401)

    # Default payload is "thin": metadata + results_url. Fetch results
    # with your normal API key against /bulk/{job_id} or /crawl/{job_id}.
    job = wm.get_job(payload["job_id"])
    for item in job.results:
        ...
    return Response(status_code=200)

4. Delivery semantics

Your endpoint must respond 2xx within 10 seconds. Anything else (timeout, 4xx, 5xx, DNS) triggers a retry.
Retry schedule: 30s → 5m → 30m → 2h → 12h → 24h — seven attempts total over ~38 hours. After that the delivery is dead-lettered.
X-WellMarked-Delivery-Id is stable across retries — use it to deduplicate on your side. X-WellMarked-Timestamp and X-WellMarked-Signature are recomputed every attempt so the timestamp tolerance check stays valid.
Treat deliveries as at-least-once.
We do not follow redirects from your endpoint. A 3xx is treated as a non-2xx fail.

GET

/usage

Returns your usage for the current billing period. The counter resets on your monthly billing anchor day at 00:00 UTC — for paid users, that's whatever day of the month your subscription was created or last changed; for free-tier users without an active subscription, it's the 1st of each calendar month. Annual subscribers still get monthly resets (their billing period is yearly but quota is monthly). Bulk submissions count as len(urls) at submission time, regardless of how many individual extractions succeed.

Request

python

from wellmarked import WellMarked

with WellMarked(api_key="wm_...") as wm:
    u = wm.get_usage()
    print(f"{u.used}/{u.limit} ({u.remaining} left this period)")

Response

200 OK

{
  "plan": "pro",
  "period": "2026-05-16",
  "used": 1042,
  "limit": 10000,
  "remaining": 8958
}

period is the start date of the user's current monthly window (YYYY-MM-DD) for paid subscribers, or the calendar month (YYYY-MM) for free-tier users. Calling GET /usage does not count toward your monthly quota.

POST

/keys/rotate

Mint a new API key for the authenticated account. The previous key is invalidated immediately — the moment this call returns 200, the bearer token you sent stops working.

No request body. Recommended path for agents and automated clients that need to recover from a lost key without dropping into the cookie-authenticated dashboard. Does not count toward your monthly quota.

Request

python

from wellmarked import WellMarked

with WellMarked(api_key="wm_...") as wm:
    rotated = wm.rotate_key()
    # The SDK automatically swaps to the new key for subsequent calls,
    # but persist this somewhere durable — the previous key is dead.
    print("Store this — it's the only time you'll see it:", rotated.api_key)

Response

200 OK

{
  "api_key": "wm_a1b2c3d4...x9y0z",
  "rotated_at": "2026-05-13T15:32:00.123456+00:00"
}

The raw api_key is returned once. Persist it before discarding the response — there is no recovery flow. If you lose a key, sign in to the dashboard to rotate again.

POST

/webhook/rotate

Mint a new webhook signing secret for the authenticated account. The previous secret is invalidated immediately — any deliveries already in the retry queue will be signed with the NEW secret on their next attempt.

Use this when you've lost the secret returned in an earlier /bulk or /crawl response, or when you suspect compromise. No request body. Does not count toward your monthly quota.

Request

python

from wellmarked import WellMarked

with WellMarked(api_key="wm_...") as wm:
    rotated = wm.rotate_webhook_secret()
    # Save this — it's the only time you'll see it.
    # The previous secret stops working immediately. In-flight retries
    # are re-signed with the new secret on their next attempt.
    save_to_env("WELLMARKED_WEBHOOK_SECRET", rotated.webhook_signing_secret)

Response

200 OK

{
  "webhook_signing_secret": "whsec_...",
  "rotated_at": "2026-05-13T15:32:00.123456+00:00"
}

The raw secret is returned once. Persist it before discarding the response — recovery is only possible by rotating again. See Webhooks for the full signing scheme.

Error Codes

All error responses share the shape { "error": { "code": "...", "message": "...", "retry_after"?: number } }. retry_after (seconds) is present on 429.

Status	Code	Meaning
401	missing_api_key	No `Authorization: Bearer ...` header sent
401	invalid_api_key	Key format is bad or key not found
403	account_inactive	Account has been deactivated
403	plan_not_supported	Bulk or crawl requires Pro, Growth, or Enterprise plan
403	forbidden	Bulk/crawl job belongs to another account
404	job_not_found	Bulk/crawl job not found or expired (6h TTL)
422	no_content	Could not identify main content on page
422	target_timeout	Target URL timed out
422	js_rendering_disabled	`render_js=true` on a server without it enabled
422	bulk_cap_exceeded	URL count exceeds plan limit (Pro: 50, Growth: 200)
422	crawl_depth_exceeded	Requested crawl depth exceeds plan limit (Pro: 5)
422	webhook_url_invalid	`webhook_url` is not `https://` or resolves to a private/loopback host
429	rate_limit_too_fast	Per-second rate limit hit — Free 5/s · Pro 20/s · Growth 100/s · Enterprise unlimited. Retry-After-Ms header has the precise back-off window.
429	rate_limit_exceeded	Monthly plan limit reached (or batch would exceed it)

Rate Limits

Rate limits are based on your plan's monthly request allocation. The counter resets on your monthly billing anchor day at 00:00 UTC — paid users on the day-of-month they subscribed (or last changed their subscription), free-tier users on the 1st of each calendar month. Annual subscribers still get monthly resets — their billing period is yearly but the quota window is one month long.

Plan	Requests/mo	Overage	JS Rendering
Free	1,000	---	No
Pro	10,000	$0.0035/req	Yes
Growth	40,000	$0.0020/req	Yes
Enterprise	250,000	$0.0012/req	Yes

API Playground

Code Examples

Basic extraction

python

# pip install wellmarked
from wellmarked import WellMarked

with WellMarked(api_key="wm_...") as wm:
    result = wm.extract("https://example.com/article")
    print(result.markdown)
    print(result.metadata.title, "by", result.metadata.author)

With JS rendering (Pro+)

python

from wellmarked import WellMarked

with WellMarked(api_key="wm_...") as wm:
    result = wm.extract("https://spa-app.com/page", render_js=True)
    print(result.markdown)

Bulk extraction with polling (Pro+)

python

# pip install wellmarked
from wellmarked import WellMarked

with WellMarked(api_key="wm_...") as wm:
    # 1. Submit the batch
    job = wm.bulk([
        "https://example.com/article-1",
        "https://example.com/article-2",
        "https://example.com/article-3",
    ])
    print(f"Queued {job.total} URLs as {job.job_id}")

    # 2. Block until the worker finishes (default: poll every 2s, 5-minute cap)
    job = wm.wait_for_job(job.job_id)

    # 3. Inspect results — each item has `markdown` or an `error` code.
    for item in job.results:
        if item.ok:
            print(f"  {item.url} -> {len(item.markdown)} chars")
        else:
            print(f"  {item.url} -> ERROR {item.error}")

Submit bulk with a webhook (Pro+)

python

from wellmarked import WellMarked

with WellMarked(api_key="wm_...") as wm:
    job = wm.bulk(
        ["https://example.com/a", "https://example.com/b"],
        webhook_url="https://yourapp.com/hooks/wm",
    )
    # First submission with a webhook_url ever — store the secret.
    # Subsequent submissions return None for this field.
    if job.webhook_signing_secret is not None:
        save_to_env("WELLMARKED_WEBHOOK_SECRET", job.webhook_signing_secret)

Receive and verify a webhook delivery

python

# pip install wellmarked
from fastapi import FastAPI, Request, Response
from wellmarked import verify_webhook, WebhookVerificationError, WellMarked

app = FastAPI()
SECRET = os.environ["WELLMARKED_WEBHOOK_SECRET"]
wm = WellMarked(api_key=os.environ["WELLMARKED_API_KEY"])

@app.post("/hooks/wm")
async def hook(request: Request):
    try:
        payload = verify_webhook(
            secret=SECRET,
            headers=request.headers,
            body=await request.body(),    # MUST be raw bytes
        )
    except WebhookVerificationError:
        return Response(status_code=401)

    # Default payload is "thin": metadata + results_url. Fetch results
    # with your normal API key against /bulk/{job_id} or /crawl/{job_id}.
    job = wm.get_job(payload["job_id"])
    for item in job.results:
        ...
    return Response(status_code=200)

Check usage

python

from wellmarked import WellMarked

with WellMarked(api_key="wm_...") as wm:
    u = wm.get_usage()
    print(f"{u.used}/{u.limit} ({u.remaining} left this period)")