super{set} Lessons: When inference meets engineering

super{set} companies have been very early to adopt data engineering, or leveraging software engineering to power data science workflows and solve business problems. Many of us have been on the front lines of the emerging roles that sit between an organization’s traditional software engineers and data scientists – namely the data engineers who optimize the retrieval and use of data to power ML models, and the machine learning engineers who ensure a scalable and flexible environment for ML model pipelines.

At super{summit} 2022, I organized a session for data and ML engineers to come together and share lessons learned from their particular workflow and business situation.

Our conversation distilled 3 ways that data science can benefit from engineering workflows to deliver business value:

Managing the complexity of machine learning lifecycles at scale
Creating business value by seeing models through to deployment and beyond
Preserving data privacy to build trust with consumers

Let’s review!

Managing machine learning lifecycles at scale

Data science teams focus on building models to help businesses solve problems.

For example: identifying hate speech with deep learning models. The performance of those models is assessed with labeled datasets originating from client traffic or other sources. All this is quite manageable at a small scale, when there are only a handful of models to serve and customers that can be counted with one hand.

When data science models scale, things start to break.

The average super{set} deals with a large number of models that need to be deployed on behalf of multiple clients in a myriad of production environments. Understanding and managing these models and their dependencies at scale while also mitigating risks that may arise from decision automation (decision-making without human intervention) becomes critical to the success of business operations. Simply put: dollars and livelihoods are on the line as startups scale into meaningful businesses.

Data + ML engineering to the rescue!

Data engineers and ML engineers work together to:

Optimize the retrieval of data needed to train models
Integrate machine learning models into an organization’s applications and systems
Ensure a scalable and flexible machine learning model pipeline from design to serving to monitoring
Build robust automation to ease the continuous delivery of model updates while maintaining high quality

Introducing MLOps

super{set} has more than just in-house expertise to manage lifecycles at scale – something we call machine learning operations or “MLOps.”

One super{set} company, MarkovML, is entirely dedicated to solving the problem of MLOps! MarkovML helps organizations gain visibility into their end-to-end machine learning workflows to reach their business objectives.

As the team from MarkovML shared, MLOps is more than just streamlining the process of deploying, monitoring, and maintaining ML models - it’s about improving the entire lifecycle by providing valuable insights around:

The performance of the model
The relevance of the data used for training
Connecting performance and relevance to the target business value

Once again, it all comes down to solving business problems.

The burden and error-prone manual processes of keeping track of the organization’s data and models can be eliminated in favor of automating the end-to-end machine learning workflow enabling data science teams to focus on extracting insights related to business objectives. Products of MarkovML make it simpler.

Data governance and model measurement workflow in markovML.

Creating business value means seeing models to deployment and beyond…

Creating business value doesn’t stop with model creation. Each super{set} company made clear that the smooth deployment of new models into production is key to maximizing the value of the product offering.

Ketch, a company that enables organizations to build trust with their consumers via privacy controls and governance for data, shared the importance of ensuring that models developed in isolation in a dev environment are prepared for the production environment. For instance, when a model is developed using python libraries and production is based on a Java runtime environment, conversion is required.

Data scientists can be well-served by using a model format such as ONNX, which is an open format built specifically to represent machine learning models. Look for model formats that are widely used, have built-in optimizations, and support a variety of machine learning frameworks, operating systems, and hardware platforms.

Post-deployment testing strategies

Deploying models into production is far from the final step in providing business value. A deployed model can start degrading in quality since a static model cannot keep up with new trends – the reality of life is that change is the only constant.

My company, Spectrum Labs, is dedicated to protecting users from disruptive behaviors and promoting healthy exchange via positive behaviors. We run sanity checks on our models prior to deployment and monitor the performance to track any potential degradation which may trigger a retraining of the model with more representative data.

A typical machine learning project lifecycle at Spectrum Labs.

There are two approaches we at Spectrum Labs take to evaluate and monitor model performance post-deployment:

Regression tests via ground truth evaluation.
These tests pull in data from live traffic through the following deployment process:
1. Before deployment, a new model is first ensured to pass a set of carefully curated data that previous models passed.
2. After deployment and some time later, new data from live traffic is pulled and labeled to obtain ground truth that can be used to make sure the model is not degrading as compared to registered metrics in the training phase.
Smoke tests via drift detection, where data distributions are monitored to make sure that they don’t diverge in a statistically significant way from the training and testing phases on one side and the development phase on the other.

Sometimes, this feedback loop from the production environment back to prototyping and development is not simply about quality assurance – it is central to the product value proposition itself and how it solves a business problem.

For instance: Sturdy unifies customer feedback from a variety of sources into one channel and uses machine learning to identify signals in the data that impact revenue retention. Through automation, the signals enable Sturdy’s customers to drive critical business processes and to act on customer feedback as soon as data is received.

Sturdy puts all your customer conversations and feedback into one single dashboard.

Preserving data privacy to build trust with consumers

Models depend on data. The quality of the data used has the biggest impact on the performance of a model. In many cases, the business value is derived from data that originates from people.

As data scientists, software engineers, data engineers, and ML engineers, we are not the true owners of this data – just custodians of data that is truly owned by others. In these cases, the management of data and its privacy requires a set of controls to ensure that organizations deliver on their responsibilities to stakeholders viewed broadly.

Of chief concern is understanding the following:

The provenance of data used in training
How data was collected
How data was treated for bias

Beyond the ethical use of data is the secure use of data – data must be handled off of desktops and managed in a secure and traceable manner with all personal information strictly removed.

Conveniently, super{set} once again has in-house expertise: Ketch offers an infrastructure for data privacy, compliance, and security, and Habu offers a secure data collaboration platform (“data clean room”) with comprehensive analytics.

Habu’s data clean room software allows for privacy-safe data collaboration between multiple clients’ first party data to obtain valuable insights from aggregated data outputs.

Final thoughts

Data and ML engineering is an emerging field. As data scientists, software engineers, data engineers and ML engineers, it’s always helpful to compare notes and get up-to-speed on best practices friends in other organizations are applying to their products.

Only at super{set} will you get a community of data practitioners that are not just leveraging data to solve business models, but also building businesses to solve data problems!

super{set} Lessons: When inference meets engineering

Managing machine learning lifecycles at scale

Introducing MLOps

Data governance and model measurement workflow in markovML.

Creating business value means seeing models to deployment and beyond…

Post-deployment testing strategies

A typical machine learning project lifecycle at Spectrum Labs.

Sturdy puts all your customer conversations and feedback into one single dashboard.

Preserving data privacy to build trust with consumers

Habu’s data clean room software allows for privacy-safe data collaboration between multiple clients’ first party data to obtain valuable insights from aggregated data outputs.

Final thoughts

Spectrum Labs CMO, On How to Keep the Metaverse Safe

Learn more about how Spectrum Labs can help you create the best user experience on your platform.

Need help deciding?

super{set} Lessons: When inference meets engineering

Managing machine learning lifecycles at scale

Introducing MLOps

Data governance and model measurement workflow in markovML.

Creating business value means seeing models to deployment and beyond…

Post-deployment testing strategies

A typical machine learning project lifecycle at Spectrum Labs.

Sturdy puts all your customer conversations and feedback into one single dashboard.

Preserving data privacy to build trust with consumers

Habu’s data clean room software allows for privacy-safe data collaboration between multiple clients’ first party data to obtain valuable insights from aggregated data outputs.

Final thoughts

Related Content

Cyberbullying: Detection in a Meme, Emoji, and Slang-Filled World

Spectrum Labs CMO, On How to Keep the Metaverse Safe

Celebrating Pride Month with Trust and Safety

Learn more about how Spectrum Labs can help you create the best user experience on your platform.