The WellMarked API extracts the main content from any webpage and returns clean, structured Markdown. All API access is over HTTPS to api.wellmarked.io.
Base URL: https://api.wellmarked.io
Every code sample below has tabs for curl, python, javascript, and typescript — pick whichever fits your stack.
Authenticate requests by including your API key in the Authorization header. Keys start with wm_.
Authorization: Bearer wm_your_api_key_here
You can get your API key from the account dashboard. Keep your key secret — never expose it in client-side code.
Extract clean Markdown from a URL.
| Parameter | Type | Required | Description |
|---|---|---|---|
| url | string | required | The URL to extract content from |
| render_js | boolean | optional | Use Playwright for JS-rendered pages (Pro+) |
# pip install wellmarked
from wellmarked import WellMarked
with WellMarked(api_key="wm_...") as wm:
result = wm.extract("https://example.com/article")
print(result.markdown)
print(result.metadata.title, "by", result.metadata.author){
"markdown": "## Article Title\n\nClean paragraph text...",
"metadata": {
"title": "Article Title",
"author": "Jane Smith",
"date": "2026-05-01",
"url": "https://example.com/article",
"retrieved_at": "2026-05-16T12:34:56+00:00"
},
"request_id": "b3d2f1a0-..."
}metadata.retrieved_at is the ISO 8601 timestamp at which WellMarked actually fetched the page — useful for cache-freshness decisions in downstream pipelines. Distinct from metadata.date, which is the article's published date and is often null. Returned on every extraction surface — single /extract, every /bulk item, and every /crawl page.
Submit multiple URLs for concurrent extraction. Each URL in the request counts as one request toward your quota — the whole batch is reserved atomically. Processing is asynchronous; you receive a job_id to poll for results, and jobs are retained for 6 hours.
| Plan | Bulk Access | Max URLs/Request |
|---|---|---|
| Free | Not available | --- |
| Pro | ✓ | 50 |
| Growth | ✓ | 200 |
| Enterprise | ✓ | Unlimited |
| Parameter | Type | Required | Description |
|---|---|---|---|
| urls | string[] | required | Array of URLs to extract (must contain at least one) |
| render_js | boolean | optional | Enable JS rendering for all URLs |
| webhook_url | string | optional | HTTPS URL we POST a signed job.completed notification to when the job finishes — no polling needed. See Webhooks. |
| webhook_include_results | boolean | optional | When true, inline the full results array in the webhook payload (capped at ~5 MB). Defaults to false (thin payload with results_url). |
from wellmarked import WellMarked
with WellMarked(api_key="wm_...") as wm:
job = wm.bulk([
"https://example.com/article-1",
"https://example.com/article-2",
"https://example.com/article-3",
])
print(f"Queued job {job.job_id} with {job.total} URLs")
job = wm.wait_for_job(job.job_id) # blocks until status == "done"
for item in job.results:
if item.ok:
print(item.metadata.title)
else:
print(f"{item.url} failed: {item.error}"){
"job_id": "d4e5f6a7-b8c9-4d0e-a1f2-3b4c5d6e7f8a",
"status": "queued",
"total": 3,
"completed": 0,
"results": [],
"webhook_signing_secret": "whsec_..."
}status is one of queued, processing, or done. Use the returned job_id with GET /bulk/{job_id} to poll results. Enterprise jobs are drained ahead of Pro, and Pro ahead of Free, when there's queue contention.
webhook_signing_secret is returned once — on the submission that first mints it for your account. Save it before discarding the response; subsequent submissions return null. If you lose it, mint a new one with POST /webhook/rotate.
Poll the status and results of a bulk extraction job. Returns 404 job_not_found if the job ID is unknown or its 6-hour retention window has passed, and 403 forbidden if the job belongs to another account.
from wellmarked import WellMarked
with WellMarked(api_key="wm_...") as wm:
# get_job is polymorphic — works for both bulk and crawl ids.
job = wm.get_job(job_id)
print(f"{job.status}: {job.completed}/{job.total}")
# Or block until done in one call (polls every 2s up to a 5-minute timeout):
job = wm.wait_for_job(job_id){
"job_id": "d4e5f6a7-b8c9-4d0e-a1f2-3b4c5d6e7f8a",
"status": "done",
"total": 3,
"completed": 3,
"results": [
{
"url": "https://example.com/article-1",
"markdown": "## Article 1\n\nExtracted markdown...",
"metadata": {
"title": "Article 1",
"author": null,
"date": null,
"url": "https://example.com/article-1",
"retrieved_at": "2026-01-15T10:30:00.812Z"
},
"error": null
},
{
"url": "https://example.com/article-2",
"markdown": null,
"metadata": null,
"error": "target_timeout"
}
],
"created_at": "2026-01-15T10:29:58Z",
"finished_at": "2026-01-15T10:30:02Z"
}Per-URL failures appear in-band with markdown / metadata set to null and an error code — they do not fail the whole job.
Crawl a site starting from a root URL — BFS through all same-site links and extract Markdown from every page reached. Returns a queued job; poll GET /crawl/{job_id} for progress and results. "Same site" is the registered domain (eTLD+1) of the root URL.
| Plan | Crawl Access | Max Depth | Max Pages |
|---|---|---|---|
| Free | ✗ | — | — |
| Pro | ✓ | 5 | 2,000 |
| Growth | ✓ | 10 | 10,000 |
| Enterprise | ✓ | Unlimited | Unlimited |
Each successfully extracted page consumes one request from your monthly quota — failed pages (timeouts, robots-disallowed, no-content) are not billed. If you run out of quota mid-crawl, the job stops and returns what it has with truncated_reason: "quota_exhausted". Robots.txt for the root host is honoured for every page.
from wellmarked import WellMarked
with WellMarked(api_key="wm_...") as wm:
job = wm.crawl("https://docs.example.com", depth=2)
print(f"Queued crawl {job.job_id}")
# wait_for_job is polymorphic — works for both bulk and crawl ids.
job = wm.wait_for_job(job.job_id)
for page in job.results:
if page.ok:
print(f"depth={page.depth} {page.metadata.title}"){
"job_id": "9aaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa",
"status": "queued",
"total": 0,
"completed": 0,
"truncated": false,
"truncated_reason": null,
"results": [],
"webhook_signing_secret": "whsec_..."
}total starts at 0 and grows as the crawler discovers and processes pages — unlike /bulk, the page count isn't known up front. Use the returned job_id with GET /crawl/{job_id} to poll results — or pass webhook_url on submit and we'll POST you when it's done (see Webhooks). Same one-time webhook_signing_secret semantics as /bulk.
Poll the status and results of a crawl job. Same retention (6 hours) and auth model as the bulk endpoint — jobs you don't own return 403 forbidden, expired or unknown jobs return 404 job_not_found.
from wellmarked import WellMarked, CrawlJob
with WellMarked(api_key="wm_...") as wm:
# get_job is polymorphic — works for both bulk and crawl ids.
job = wm.get_job(job_id)
print(f"{job.status}: {job.completed} pages crawled")
if isinstance(job, CrawlJob) and job.truncated:
print(f"truncated: {job.truncated_reason}"){
"job_id": "9aaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa",
"status": "done",
"total": 47,
"completed": 47,
"truncated": false,
"truncated_reason": null,
"created_at": "2026-05-15T10:00:00Z",
"finished_at": "2026-05-15T10:02:30Z",
"results": [
{
"url": "https://docs.example.com",
"depth": 0,
"markdown": "## Welcome\n\n...",
"metadata": { "title": "Welcome", "author": null, "date": null, "url": "https://docs.example.com", "retrieved_at": "2026-05-15T10:00:02Z" },
"error": null
},
{
"url": "https://docs.example.com/api",
"depth": 1,
"markdown": "## API Reference\n\n...",
"metadata": { "title": "API Reference", "author": null, "date": null, "url": "https://docs.example.com/api", "retrieved_at": "2026-05-15T10:01:12Z" },
"error": null
}
]
}truncated_reason is one of null (completed normally), "page_cap_reached" (hit your plan's page cap), or "quota_exhausted" (monthly quota ran out mid-crawl).
Pass webhook_url to POST /bulk or POST /crawl and we'll POST a signed job.completed notification the moment the job is done — no polling needed.
from wellmarked import WellMarked
with WellMarked(api_key="wm_...") as wm:
job = wm.bulk(
["https://example.com/a", "https://example.com/b"],
webhook_url="https://yourapp.com/hooks/wm",
)
# First submission with a webhook_url ever — store the secret.
# Subsequent submissions return None for this field.
if job.webhook_signing_secret is not None:
save_to_env("WELLMARKED_WEBHOOK_SECRET", job.webhook_signing_secret)The very first submission with a webhook_url mints your account's HMAC signing secret and returns it as webhook_signing_secret on the response — one-time visibility, save it before discarding. Subsequent submissions return null for the field. Lost it? POST /webhook/rotate.
Default ("thin") payload — metadata and a results_url you can GET with your normal API key:
{
"event": "job.completed",
"job_id": "1c4f9a02-b8c9-4d0e-a1f2-3b4c5d6e7f8a",
"kind": "bulk",
"status": "done",
"total": 50,
"completed": 48,
"finished_at": "2026-05-27T14:02:14.812Z",
"results_url": "https://api.wellmarked.io/bulk/1c4f9a02-b8c9-4d0e-a1f2-3b4c5d6e7f8a"
}For crawl jobs ("kind": "crawl") the payload also carries truncated and truncated_reason. Pass webhook_include_results: true on submission to inline the full results array (capped at ~5 MB — over the cap the payload silently falls back to the thin shape with results_truncated_for_size: true).
Each delivery carries these headers:
| Header | Purpose |
|---|---|
| X-WellMarked-Delivery-Id | UUID, stable across retries. Use as your idempotency key. |
| X-WellMarked-Timestamp | Unix seconds when this attempt was signed. Reject if drift exceeds 5 minutes. |
| X-WellMarked-Signature | v1,<base64 HMAC-SHA256> over `${delivery_id}.${timestamp}.${raw_body_bytes}`. Key is bytes.fromhex(secret.removeprefix("whsec_")). |
Both SDKs ship a verifier — use it rather than reimplementing HMAC by hand. The verifier works in Node 18.17+, Cloudflare Workers, Deno, Bun, and modern browsers.
# pip install wellmarked
from fastapi import FastAPI, Request, Response
from wellmarked import verify_webhook, WebhookVerificationError, WellMarked
app = FastAPI()
SECRET = os.environ["WELLMARKED_WEBHOOK_SECRET"]
wm = WellMarked(api_key=os.environ["WELLMARKED_API_KEY"])
@app.post("/hooks/wm")
async def hook(request: Request):
try:
payload = verify_webhook(
secret=SECRET,
headers=request.headers,
body=await request.body(), # MUST be raw bytes
)
except WebhookVerificationError:
return Response(status_code=401)
# Default payload is "thin": metadata + results_url. Fetch results
# with your normal API key against /bulk/{job_id} or /crawl/{job_id}.
job = wm.get_job(payload["job_id"])
for item in job.results:
...
return Response(status_code=200)30s → 5m → 30m → 2h → 12h → 24h — seven attempts total over ~38 hours. After that the delivery is dead-lettered.X-WellMarked-Delivery-Id is stable across retries — use it to deduplicate on your side. X-WellMarked-Timestamp and X-WellMarked-Signature are recomputed every attempt so the timestamp tolerance check stays valid.Returns your usage for the current billing period. The counter resets on your monthly billing anchor day at 00:00 UTC — for paid users, that's whatever day of the month your subscription was created or last changed; for free-tier users without an active subscription, it's the 1st of each calendar month. Annual subscribers still get monthly resets (their billing period is yearly but quota is monthly). Bulk submissions count as len(urls) at submission time, regardless of how many individual extractions succeed.
from wellmarked import WellMarked
with WellMarked(api_key="wm_...") as wm:
u = wm.get_usage()
print(f"{u.used}/{u.limit} ({u.remaining} left this period)"){
"plan": "pro",
"period": "2026-05-16",
"used": 1042,
"limit": 10000,
"remaining": 8958
}period is the start date of the user's current monthly window (YYYY-MM-DD) for paid subscribers, or the calendar month (YYYY-MM) for free-tier users. Calling GET /usage does not count toward your monthly quota.
Mint a new API key for the authenticated account. The previous key is invalidated immediately — the moment this call returns 200, the bearer token you sent stops working.
No request body. Recommended path for agents and automated clients that need to recover from a lost key without dropping into the cookie-authenticated dashboard. Does not count toward your monthly quota.
from wellmarked import WellMarked
with WellMarked(api_key="wm_...") as wm:
rotated = wm.rotate_key()
# The SDK automatically swaps to the new key for subsequent calls,
# but persist this somewhere durable — the previous key is dead.
print("Store this — it's the only time you'll see it:", rotated.api_key){
"api_key": "wm_a1b2c3d4...x9y0z",
"rotated_at": "2026-05-13T15:32:00.123456+00:00"
}The raw api_key is returned once. Persist it before discarding the response — there is no recovery flow. If you lose a key, sign in to the dashboard to rotate again.
Mint a new webhook signing secret for the authenticated account. The previous secret is invalidated immediately — any deliveries already in the retry queue will be signed with the NEW secret on their next attempt.
Use this when you've lost the secret returned in an earlier /bulk or /crawl response, or when you suspect compromise. No request body. Does not count toward your monthly quota.
from wellmarked import WellMarked
with WellMarked(api_key="wm_...") as wm:
rotated = wm.rotate_webhook_secret()
# Save this — it's the only time you'll see it.
# The previous secret stops working immediately. In-flight retries
# are re-signed with the new secret on their next attempt.
save_to_env("WELLMARKED_WEBHOOK_SECRET", rotated.webhook_signing_secret){
"webhook_signing_secret": "whsec_...",
"rotated_at": "2026-05-13T15:32:00.123456+00:00"
}The raw secret is returned once. Persist it before discarding the response — recovery is only possible by rotating again. See Webhooks for the full signing scheme.
All error responses share the shape { "error": { "code": "...", "message": "...", "retry_after"?: number } }. retry_after (seconds) is present on 429.
| Status | Code | Meaning |
|---|---|---|
| 401 | missing_api_key | No `Authorization: Bearer ...` header sent |
| 401 | invalid_api_key | Key format is bad or key not found |
| 403 | account_inactive | Account has been deactivated |
| 403 | plan_not_supported | Bulk or crawl requires Pro, Growth, or Enterprise plan |
| 403 | forbidden | Bulk/crawl job belongs to another account |
| 404 | job_not_found | Bulk/crawl job not found or expired (6h TTL) |
| 422 | no_content | Could not identify main content on page |
| 422 | target_timeout | Target URL timed out |
| 422 | js_rendering_disabled | `render_js=true` on a server without it enabled |
| 422 | bulk_cap_exceeded | URL count exceeds plan limit (Pro: 50, Growth: 200) |
| 422 | crawl_depth_exceeded | Requested crawl depth exceeds plan limit (Pro: 5) |
| 422 | webhook_url_invalid | `webhook_url` is not `https://` or resolves to a private/loopback host |
| 429 | rate_limit_too_fast | Per-second rate limit hit — Free 5/s · Pro 20/s · Growth 100/s · Enterprise unlimited. Retry-After-Ms header has the precise back-off window. |
| 429 | rate_limit_exceeded | Monthly plan limit reached (or batch would exceed it) |
Rate limits are based on your plan's monthly request allocation. The counter resets on your monthly billing anchor day at 00:00 UTC — paid users on the day-of-month they subscribed (or last changed their subscription), free-tier users on the 1st of each calendar month. Annual subscribers still get monthly resets — their billing period is yearly but the quota window is one month long.
| Plan | Requests/mo | Overage | JS Rendering |
|---|---|---|---|
| Free | 1,000 | --- | No |
| Pro | 10,000 | $0.0035/req | Yes |
| Growth | 40,000 | $0.0020/req | Yes |
| Enterprise | 250,000 | $0.0012/req | Yes |
Sign in to access the live API playground and test extractions with your API key.
Sign in to try it# pip install wellmarked
from wellmarked import WellMarked
with WellMarked(api_key="wm_...") as wm:
result = wm.extract("https://example.com/article")
print(result.markdown)
print(result.metadata.title, "by", result.metadata.author)from wellmarked import WellMarked
with WellMarked(api_key="wm_...") as wm:
result = wm.extract("https://spa-app.com/page", render_js=True)
print(result.markdown)# pip install wellmarked
from wellmarked import WellMarked
with WellMarked(api_key="wm_...") as wm:
# 1. Submit the batch
job = wm.bulk([
"https://example.com/article-1",
"https://example.com/article-2",
"https://example.com/article-3",
])
print(f"Queued {job.total} URLs as {job.job_id}")
# 2. Block until the worker finishes (default: poll every 2s, 5-minute cap)
job = wm.wait_for_job(job.job_id)
# 3. Inspect results — each item has `markdown` or an `error` code.
for item in job.results:
if item.ok:
print(f" {item.url} -> {len(item.markdown)} chars")
else:
print(f" {item.url} -> ERROR {item.error}")from wellmarked import WellMarked
with WellMarked(api_key="wm_...") as wm:
job = wm.bulk(
["https://example.com/a", "https://example.com/b"],
webhook_url="https://yourapp.com/hooks/wm",
)
# First submission with a webhook_url ever — store the secret.
# Subsequent submissions return None for this field.
if job.webhook_signing_secret is not None:
save_to_env("WELLMARKED_WEBHOOK_SECRET", job.webhook_signing_secret)# pip install wellmarked
from fastapi import FastAPI, Request, Response
from wellmarked import verify_webhook, WebhookVerificationError, WellMarked
app = FastAPI()
SECRET = os.environ["WELLMARKED_WEBHOOK_SECRET"]
wm = WellMarked(api_key=os.environ["WELLMARKED_API_KEY"])
@app.post("/hooks/wm")
async def hook(request: Request):
try:
payload = verify_webhook(
secret=SECRET,
headers=request.headers,
body=await request.body(), # MUST be raw bytes
)
except WebhookVerificationError:
return Response(status_code=401)
# Default payload is "thin": metadata + results_url. Fetch results
# with your normal API key against /bulk/{job_id} or /crawl/{job_id}.
job = wm.get_job(payload["job_id"])
for item in job.results:
...
return Response(status_code=200)from wellmarked import WellMarked
with WellMarked(api_key="wm_...") as wm:
u = wm.get_usage()
print(f"{u.used}/{u.limit} ({u.remaining} left this period)")