1  Real-time Machine Learning: What and Why

This chapter covers


Real-time ML describes systems that use predictions from ML models, and the prediction is produced on demand.

ML is …

Real-time vs Online Machine Learning

the corect term is real-time this is why we’ll use it. online means something else…

So real-time ML is

link text in this chapter, we will start off with examples to bring the relevance of real-time ML up front and close. then..

1.1 Examples of real-time ML systems

We will lead with 3 distinct examples to show how real-time ML is used in modern systems.

1.1.1 Fraud detection in Credit Card Transactions

fraud is a massive problem in any system that serves customers, such as a digital bank. it’s hard to deal with because one wants to prevent fraudsters from acting without harming legitimate users. so this means that one needs to tell apart fraudulent actions from legitimate ones.

This is the ideal use-case for real-time ML, because ML is very good at detecting patterns. and we need a quick prediction: imagine if you had to wait one day whenever making a banking transaction, because the ML engine runs only once daily.

Figure 1.1: diagram of a mobile banking app connected to a cloud where the backend is… .

1.1.2 E-commerce “You may also like..”

In e-commerce, cross-selling means offering related or complementary products

Figure 1.2: image of a a generic product page showing red basketball shoes on an ecommerce, showing a section called “you may also like”, with green variations of the shoes, a basketball and tickets to a basketball game nearby

in this case, real-time ML is suitable because being able to use very short-term data (like the previous pages teh customer has just viewed) helps ml models better suggest more accurate products, which will result and more sales and higher profits.

Figure 1.3: the browser calls a backend to retrieve an URL. while building the screen HTML, the backend calls a model that predicts the best set of 5 items to show in the “you may also like..” section.

1.1.3 AI Assistant Completion

Figure 1.4: browser sends a prompt, the backend generates an adequate completion to the user’s prompt and sends back the original text + the completion

The three examples above have varying levels of complexity, but they are all examples of use-cases of real-time ML: flows with non-deterministic behavior built into the system.

1.2 What is real-time ML

TODO

1.3 What is not real-time ML

  • real-time systems with rule based decisioning, even if complex

    EXAMPLE a complex human-crafted rule-based if-else system to decide whether one should grant a loan

    “akshually… ml models are also if/else constructs…” “yes, but models are automatically learnt from data, not from human expertise.”

  • real-time systems that use ML predictions calculated in batch

    EXAMPLE a system that pre-calculates the default risk for every customer in a bank daily. then when a customer tries to take out a loan, the system retrieves the last known score for that customer.

1.4 How real-time ML systems differ from traditional systems

TODO split into subsections: failure modes, different skills, stochastic decisions, data collection concerns, different monitoring

Why do we need to learn about real-time ML as an indepdendent discipline?

well, that looks just like any other system to me. Isn’t this just software engineering? What’s so different about ML-enabled RT software?

  • ML-enabled systems have different failure modes and they can fail silently
    • a model outputting garbage outputs is arguably worse than software that’s out of service, because it will keep making bad business decisions.
  • ML-enabled systems need a very different monitoring setup from regular software
  • The performance of ML-enabled systems decays over time
  • ML-enabled systems are non-deterministic and data-dependent by construction, so the usual testing strategies don’t work

1.4.1 ML models give approximate responses only

Therefore, the use-cases they are used in are necessarily different from those that require precise and deterministic responses.

ml systems are not meant to get the right response every time, but they are built so that on average it will get things right much more often than they get them wrong. therein lies the art of ml.

1.4.2 Skills needed to train and maintain models are different

The people involved in building and maintaining ml models involve people who understand some things in detail, but usually lack software engineering and architecture instincts. This adds some challenges that need to be addressed such as the need to enable analytical and research work without jeopardizing production code, exploring trade-offs between what is theoretically desirable versus what is possible in practice.

1.4.3 Real-time ML systems fail silently

In traditional software engineering, you can verify that the system is working as intended if there are no errors at compile/build time and no exceptions at runtime.

you can usually precisely define the paths the system will take (even more so if you use strongly/statically typed languages) and thoroughly test (unit, integration) that all behavior is covered. ahead of time.

of course, bugs happen often, due to

but the key point is that traditional software (with no ML/stochastic components) are either working or they aren’t (exceptions and runtime errors blowing up). there is no ambiguity, no in-betweenness.

real-time ML also suffer from these “explicit” failure modes, after all they must still make network calls, run functions, read/write to disk, etc.

but in addition to all of these, there are many more ways an ML system may fail. And most of them are silent failures, in the sense that there is no “exception”, “runtime error” or “stacktrace” to debug.

  • operational drift and training-serving skew

  • feedback loops caused by model retraining

  • adversarial attacks

1.4.4 Real-time ML systems need different observability, monitoring and alerting

not only monitoring

1.4.5 The research/production split

In traditional services, there is no research/production split.

In ML services, the same code that runs in production — the model, feature engineering, and data preparation — also needs to run at research time, usually inside Jupyter Notebooks and similar environments. This creates a problem: the same exact code must work at research time and at production time, but production code has much higher robustness requirements than research code. We don’t want or need to bring research-time dependencies into production, because that only adds to the risk of running these systems in production.

1.4.6 ML systems need to be tested in the production environment

ML systems depend on data distributions. you cannot test them in a “staging” environment the same way you can a traditional system.

a traditional, deterministic system can be fully simulated with a few dummy cases, in a staging environment. this is useful for very high-risk situations when you want to simulate the execution of the flow in a staging environment, to cover failure modes that may have eluded unit and integration tests.

a real-time ML model (or any model for that matter) can be “tested” like this. the only way to “test” a real-time ML system is to deploy the system in “shadow-mode”, in the production environment, but without being used to make any real decision. this is an important pattern we cover in ?sec-ch-the-first-deployment.

1.5 When to use real-time ML

LINK to chapter 2 checklist

careful not to cannibalize ch 02

real-time ML is rarely the first solution to a given problem. so you should only consider using real-time ML systems when you already have a working system that is not where it should be.

assuming you already have a “ML-amenable” use case…it DEPENDS on where you are coming from

IF YOURE COMING FROM BATCH SCORING

  • rtml can be cheaper than batch scoring because you only need to score what you actually need. EXAMPLE scoring millions of customers for their loan defaulut risk daily, when only a few hundreds actually will take out a loan on a given day.,
  • with rtml you can use information about what’s happening right now

EXAMPLE recommending films

IF YOURE COMING FROM HUMAN-CRAFTED RULES

  • rtml is more accurate than human-made rule-based systems. most rule-based systems can be converted into a simple decision tree at least. EXAMPLE grant loan
  • human-made rule-based systems are very easy to learn and get gamed.

EXAMPLE fraud detection from business rules. (large ticket purchases made by accounts that have just been created on an e-commerce website.)

1.6 When not to use real-time ML

should this be here or in chapter 2

  • it’s more expensive and riskier and sometimes the use case doesn’t justify the cost (e.g. too few instances being scored…)

  • sometimes the use-case is not that well defined (who will use the scores, what the decision layer policy will look like, etc) so a batch model is a good MVP until the use-case is solid enough to justify a RT ML model.

  • sometimes it just doesn’t make sense because the customer doesn’t need a real-time answer. E.g. a decision of whether or not to lend 1 billion dollars to a big company is something that takes weeks and maybe months to decide. It makes little sense to have this be a RT ML flow.

  • modeling teams are sometimes far away from “engineering” teams (sometimes in different business units) so it’s easier to have a flow where datasets get scored and then “passed on” to the engineering teams from time to time.

    • differences in mindsets, jargon, incentives, etc.
    • modeling teams don’t always want to cede “power” to engineering teams.

1.6.1 Summary