1  Real-time Machine Learning

TODO bullets

this book contains practical advice on how to use ML models together with real-time systems. in simple terms, this means connecting a previously trained ML model in regular software and performing inference in real-time. see two examples below

Figure 1.1: example-real-time-credit-underwriting
Figure 1.2: example-chatgpt

the corect term is real-time this is why we’ll use it. online means something else…

it could well be that in the future, most systems will be RTML systems, but there will always be use-cases that cannot be driven by ML, because they don’t support probabilistic decisioning. This is in fact a major reason why RTML projects fail and we’ll explain it in more detail in the next chapter

link text in this chapter, we will…..

1.1 Traditional vs ML-enabled systems

well, that looks just like any other system to me. Isn’t this just software engineering? What’s so different about ML-enabled RT software?

  • ML-enabled systems have different failure modes and they can fail silently
    • a model outputting garbage outputs is arguably worse than software that’s out of service, because it will keep making bad business decisions.
  • ML-enabled systems need a very different monitoring setup from regular software
  • The performance of ML-enabled systems decays over time
  • ML-enabled systems are non-deterministic and data-dependent by construction, so the usual testing strategies don’t work
  • The skills involved in training and maintaining a RT model are different from those in traditional sofware teams (math, statistics, etc)

TODO figure out how to add a link to deterministic heuristics here and how they are sometimes the signal

understood, but where does RT enter the picture? Aren’t all applications of ML like this?. No. see next

1.2 Real-time vs Batch ML

explain the usual RTML flow

Well, isn’t all production ML like this? How else is it done?

  • “ad-hoc” ML operation: input datasets are built and scored when needed and then the output is manually sent to whichever team will use it to make decisions.

  • batch production ml: input datasets are automatically built and scored every day/week or month.

Well, why doesn’t everyone just use real-time ML then? It looks much better

  • it’s more expensive and riskier and sometimes the use case doesn’t justify the cost (e.g. too few instances being scored…)

  • sometimes the use-case is not that well defined (who will use the scores, what the decision layer policy will look like, etc) so a batch model is a good MVP until the use-case is solid enough to justify a RT ML model.

  • sometimes it just doesn’t make sense because the customer doesn’t need a real-time answer. E.g. a decision of whether or not to lend 1 billion dollars to a big company is something that takes weeks and maybe months to decide. It makes little sense to have this be a RT ML flow.

  • modeling teams are sometimes far away from “engineering” teams (sometimes in different business units) so it’s easier to have a flow where datasets get scored and then “passed on” to the engineering teams from time to time.

    • differences in mindsets, jargon, incentives, etc.
    • modeling teams don’t always want to cede “power” to engineering teams.

1.2.1 Summary