Citable summary
This page explains the source types, normalization rules, and quality controls behind crawling and collection sessions in pressor.ai.
This page explains the source types, normalization rules, and quality controls behind crawling and collection sessions in pressor.ai.
Collection sources
Core inputs include news articles, public journalist pages, search results, monitored URLs, and user-defined keywords and domains.
Normalization rules
URLs, publication dates, titles, summaries, journalist names, and outlet names are normalized to reduce duplication and stabilize downstream analysis.
Quality controls
Duplicate stories, journalist rows without usable email, invalid domains, and failed collection sessions are tracked separately for review.