MERCURY
UniversityDocsCrawlCited Sitemap

Mercury · Cited Sitemap

Crawl$0.01 / callLivex402API key

GET /buy/sitemap

What it does

Domain/URL → a SIGNED snapshot of the site's PUBLISHED sitemap: discovers the sitemap via robots.txt Sitemap: lines then /sitemap.xml fallback, parses <urlset> + <sitemapindex> (follows up to 5 child sitemaps), returns a deduped, bounded (≤2000) URL inventory with lastmod/changefreq/priority. The receipt signs the DECLARED URL list (deterministic — same sitemap bytes ⇒ byte-identical list). Optional ?fetch=N (≤10) adds a HARD-BOUNDED same-domain liveness probe (title+status+bytes per URL) — that probe is the ONLY non-deterministic part and is NOT covered by the signature. SSRF-guarded; the crawl is bounded at every axis.

The goal it serves: map a site's link graph, sitemap and AI-crawler permissions so an agent can plan a crawl — and prove what the site actually published at the time.

Schemas & output preview

Input schema — the exact request shape the route validates.

json · input schema
{
  "type": "object",
  "properties": {
    "url": {
      "type": "string",
      "maxLength": 2048,
      "description": "a domain (example.com) or any URL on the site — only its origin is used"
    },
    "fetch": {
      "type": "string",
      "maxLength": 3,
      "description": "optional N (0–10): also shallow-fetch the first N same-domain sitemap URLs and report each one's live title + HTTP status + byte size (liveness sample). NOT covered by the signed receipt (it can change between calls). Default 0 = off."
    },
    "limit": {
      "type": "string",
      "maxLength": 4,
      "description": "optional cap on URLs returned (1–2000); default returns all up to 2000"
    }
  },
  "required": [
    "url"
  ],
  "additionalProperties": false
}

Output schema — the exact response shape the handler returns.

json · output schema
{
  "type": "object",
  "properties": {
    "ok": {
      "type": "boolean",
      "description": "true on success; false on an honest failure (still delivered)"
    },
    "url": {
      "type": "string",
      "description": "the sitemap URL that was fetched + parsed"
    },
    "status": {
      "type": "integer",
      "description": "upstream HTTP status of the sitemap fetch"
    },
    "data": {
      "type": "object",
      "description": "the structured sitemap snapshot (the product the buyer consumes)",
      "properties": {
        "origin": {
          "type": "string",
          "description": "scheme://host the sitemap was discovered for"
        },
        "sitemapUrl": {
          "type": "string",
          "description": "the sitemap that was parsed"
        },
        "discoveredVia": {
          "type": "string",
          "enum": [
            "robots.txt",
            "sitemap.xml",
            "input"
          ],
          "description": "how the sitemap was located (robots Sitemap: line / conventional path / direct input)"
        },
        "robotsSitemaps": {
          "type": "array",
          "description": "all Sitemap: URLs robots.txt declared (may be > the one parsed)",
          "items": {
            "type": "string"
          }
        },
        "kind": {
          "type": "string",
          "enum": [
            "urlset",
            "sitemapindex",
            "unknown"
          ],
          "description": "which sitemap schema was parsed"
        },
        "childSitemaps": {
          "type": "array",
          "description": "child sitemaps followed from a <sitemapindex> (bounded to 5)",
          "items": {
            "type": "string"
          }
        },
        "total": {
          "type": "integer",
          "description": "number of unique page URLs returned"
        },
        "truncated": {
          "type": "boolean",
          "description": "true if the site declared more URLs than the 2000 cap / requested limit"
        },
        "urls": {
          "type": "array",
          "description": "the published URL inventory (deduped, origin-then-loc sorted)",
          "items": {
            "type": "object",
            "properties": {
              "loc": {
                "type": "string",
                "description": "absolute page URL"
              },
              "lastmod": {
                "type": "string",
                "description": "declared last-modified (only if present)"
              },
              "changefreq": {
                "type": "string",
                "description": "declared change frequency (only if present)"
              },
              "priority": {
                "type": "string",
                "description": "declared crawl priority (only if present)"
              }
            }
          }
        },
        "probe": {
          "type": "array",
          "description": "OPTIONAL liveness sample (?fetch=N): first N same-domain URLs shallow-fetched. NOT covered by the signed receipt — live status can change between calls.",
          "items": {
            "type": "object",
            "properties": {
              "url": {
                "type": "string"
              },
              "ok": {
                "type": "boolean"
              },
              "status": {
                "type": "integer"
              },
              "title": {
                "type": "string"
              },
              "bytes": {
                "type": "integer"
              },
              "error": {
                "type": "string",
                "description": "present only when that URL failed"
              }
            }
          }
        }
      }
    },
    "text": {
      "type": "string",
      "description": "canonical newline string the signed receipt covers: one URL per line, origin-then-loc sorted (the DECLARED inventory only — never the live-probe sample, so it is reproducible)"
    },
    "contentType": {
      "type": "string"
    },
    "fetchedAt": {
      "type": "string",
      "description": "ISO8601 fetch time (in the signed payload)"
    },
    "error": {
      "type": "string",
      "description": "present only when ok:false"
    }
  },
  "required": [
    "ok",
    "url"
  ],
  "additionalProperties": false
}

Output preview — a real example response, shown free (you only pay when you call the route).

json · output preview
{
  "ok": true,
  "url": "https://www.iana.org/sitemap.xml",
  "status": 200,
  "data": {
    "origin": "https://www.iana.org",
    "sitemapUrl": "https://www.iana.org/sitemap.xml",
    "discoveredVia": "sitemap.xml",
    "robotsSitemaps": [],
    "kind": "urlset",
    "childSitemaps": [],
    "total": 2,
    "truncated": false,
    "urls": [
      {
        "loc": "https://www.iana.org/about",
        "lastmod": "2024-01-01"
      },
      {
        "loc": "https://www.iana.org/domains",
        "changefreq": "weekly",
        "priority": "0.8"
      }
    ],
    "probe": []
  },
  "text": "https://www.iana.org/about\nhttps://www.iana.org/domains",
  "contentType": "application/xml",
  "fetchedAt": "2026-06-04T00:00:00.000Z"
}

Pay & call

Your agent calls the route; the 402 challenge carries the exact price ($0.01, USDC on Base mainnet); the x402 client settles via the CDP facilitator and retries. No key, no signup.

agent.mjs · x402
import { wrapFetchWithPayment } from "x402-fetch";
const pay = wrapFetchWithPayment(fetch, account); // viem account holding a little USDC on Base
const res = await pay("https://network.mercury-hq.com/buy/sitemap?url=https://example.com");
const out = await res.json(); // the result + `attestation` (the signed receipt)

Prepaid alternative — the same route accepts an API key:

bash · API key
# Same route, prepaid API-key rail (Bearer mk_live_…) — get a key at https://network.mercury-hq.com/developers
curl -H "Authorization: Bearer mk_live_YOURKEY" "https://network.mercury-hq.com/buy/sitemap?url=https://example.com"
Pay over 402 — get the missing pieceEvery paid call returns an EIP-191 signed receipt — verify it free at /x402/verify.

Verify the receipt

Recover the EIP-191 signature over sha256(content)‖url‖status‖fetchedAt‖nonce and confirm the signer equals the pinned attestation key 0xACB40253BD71Bb9a5d491b2c6EFF755F2A33Fc75 (published at /.well-known/mercury-attestation). No callback to Mercury — the receipt verifies offline, forever. Verification is always free: POST the receipt to /x402/verify or run ecrecover yourself.

FactValue
Attestation signer (pinned)0xACB40253BD71Bb9a5d491b2c6EFF755F2A33Fc75
Key published at/.well-known/mercury-attestation
Live verifier (free)/x402/verify
Settlementreal USDC on Base mainnet (eip155:8453) via CDP — auditable on BaseScan
Domain → signed, timestamped per-AI-crawler allow/block audit from robots.txt + llms.txt + ai.txt (GPTBot, ClaudeBot, PerplexityBot, Google-Extended,…
Open

More: all services · /catalog · the headline web-fetch · agent twin of this page: GET /university/docs/cited-sitemap?format=md