Tiramisu

lurkers

lurkers gives agents a single front door to the web. Throw any URL at lurkers.fetch(...) and get back a typed Document with the content extracted, metadata parsed, and source-specific quirks handled.

Install

uv add lurkers

Quick start (Python)

import lurkers

doc = lurkers.fetch("https://example.com/article")
doc = lurkers.fetch("https://www.youtube.com/watch?v=dQw4w9WgXcQ")
doc = lurkers.fetch("https://x.com/user/status/123")

# check how a URL routes, without fetching it
lurkers.detect_source_type("https://youtu.be/dQw4w9WgXcQ")  # → "youtube"

# RSS / Atom: fetches each entry; failed ones are skipped (skip_errors=False to raise)
docs = lurkers.feed("https://news.ycombinator.com/rss", limit=10)

# async siblings — afetch / afeed mirror fetch / feed
import asyncio
doc = asyncio.run(lurkers.afetch("https://example.com/article"))
docs = asyncio.run(lurkers.afeed("https://example.com/rss.xml"))

Every fetch returns a Document:

class Document(BaseModel):
    source: str
    source_type: str                  # "html" | "youtube" | "twitter"
    title: str | None
    content: str                      # markdown / plain text
    fetched_at: datetime
    metadata: dict[str, Any]          # source-specific (video_id, author_handle, ...)

Quick start (CLI)

lurkers fetch <url>                   # JSON Document to stdout
lurkers fetch <url> --pretty          # indented JSON
lurkers feed <rss-url> --limit 10     # JSON array of Documents

Sources

SourceExtractorAuth
HTMLtrafilatura → markdown + metadatanone
YouTubeoEmbed (title / channel) + youtube-transcript-apinone
Twitter / Xfxtwitter public APInone
RSS / Atomfeed entries fed back through the unified fetch()none

Source

github.com/tiramisu-sh/lurkers · Apache-2.0

On this page