Motivation
The Samurai Predictive Event Collector is a standalone data processing application. It functions as a pass-through, no-storage layer, serving as a data sink for the predictive event and property layers. It captures real-time and batch model predictions in a unified collection schema, formats and structures predictive events, and dispatches them in real-time to the activation layer after ensuring they are fully prepared, clean, and ready for ingestion.
The collector can halt events if they do not meet specific predictive event schema requirements, such as failing to include requested user identifiers necessary for downstream activation processes. This ensures that events are valid and compatible with the data destinations they are intended for. However, the collector never stops events based solely on the quality metrics or accuracy of the predictive model execution. In other words, as long as the prediction payload follows the predefined event schema, the collector will still process it downstream, leaving it to the activation layer to determine whether the model satisfies the minimum performance metrics and quality standards depending on the use case at hand.
The rationale behind this approach is to ensure the collector does not restrict the flow of event data in any way, processing events as long as they meet minimum structural quality criteria. This solution allows you to warehouse all predictive events, regardless of the predictive run's quality metrics, thereby ensuring comprehensive data coverage without imposing limitations.