Skip to main content

Intro

This document details the technical, implementation, and usage aspects of the Samurai Predictive Customer Data Infrastructure (pCDI), which enables the collection, activation and modelling of predicted user events and properties. Samurai pCDI is designed in a containerised, modular manner, prioritising full control over your data. It comprises several components and layers of data processing and activation: the modelling layer (where batch and real-time prediction occurs), the predictive event collector layer (which processes predicted events real-time), and the predictive event activation layer (which activates predicted events real-time).

The Samurai Predictive CDI is a work in progress and will continually evolve. As a result, this documentation may occasionally lag behind the platform's actual state. Rest assured, we will inform you of any critical updates or breaking changes that you need to be aware of.

Key Assumptions

Samurai pCDI is built on several guiding principles: rich predictive event payload, combined batch and real-time predictions, and unified predictive event training schema.

Rich Predictive Event Payload

Our predictive models generate events with identity and predictive properties tailored to your specific downstream (also known as data activation) destinations. For advertising platforms, this may include attribution cookies or user agents, which are critical for associating predictive events with actual users. For activating predictive events and properties in CRM systems such as Pipedrive, predictive properties might include expected close dates, time horizons, or predicted deal values. For example, in our Pipedrive AI Booster consulting package, which relies heavily on Samurai pCDI on the backend, we use properties like predictive close date, predictive deal value, or the best-matching sales rep. The main aim is to ensure downstream systems can process both spot and predicted events seamlessly.

Combined Batch and Real-time Predictions

Our approach to modelling predictive user activities integrates both real-time and batch model predictions. This dual approach captures immediate and aggregate behavioural patterns.

Real-time predictions are invoked on-demand to estimate predictive properties and events based on current user data and model state. This is crucial for destinations that require immediate ingestion of predictive events. For example, real-time predictions allow us to identify a user's propensity to convert during a specific session. If this information is sent downstream through the predictive real-time event collector and activation engine, it could enable a quick upsell campaign, maximizing cart value per session when the user is already engaged.

Batch predictions, on the other hand, process user activities in aggregate. This method uncovers behavioural patterns that may only become apparent after a significant number of users exhibit similar behaviours. By periodically re-learning from the entire database of user activities, our system identifies new, valuable patterns that may not be evident in real-time analysis. If new patterns are found, the system triggers new events, continuously refining the predictive data. For example, when modelling short-term SaaS churn, a specific activity pattern may correlate with elevated churn only after many users exhibit it. Once this happens, we can retroactively trigger the predicted churn event for all relevant user identities, even those whose activity patterns did not previously indicate churn. Armed with this data, you can contact your prospective churners with reactivation campaigns or special offers.

Unified predictive event training schema

The modelling layer of the Samurai Predictive Infrastructure relies on a model architecture trained on a specific, unified predictive event data model. All events involved in the event modelling phase must conform to this schema. We have developed a standard event model adapter that formats events from any data source into this specific format, making them ready for processing through the modelling layer. This adapter filters all incoming events, storing them in a structured format in the warehouse before they are fed into the predictive model.

As a result, as long as your event stream source conforms to the modelling event schema, you can train the Samurai predictive model on any data source. Certain data sources, such as the Pipedrive webhook event stream or GA4 events stream, are finely adapted and thoroughly tested against the model, as they were initially built for these sources. However, the system is flexible and can accommodate other event data sources without issues. We are also continuously experimenting with new data sources to expand the unified predictive event modelling schema.

Getting Started

To get started, contact us for a pilot, risk-free deployment of Samurai's Predictive Event model along with the Samurai Predictive Property model. Once that's set up, we will pilot launch your Samurai Predictive Event Collector instance and connect both to process your rich predictive user events.

Authors

Samurai Predictive CDI was created by the team at Samurai. Its initial version was based on the Master's thesis of Maciej Miętek, the company's founder, titled Predicting User Activities in Customer Data Infrastructures. The thesis received the highest possible grade and was successfully defended.