How to Safeguard Your Models with DataRobot: A Comprehensive Guide


In today’s data-driven world, ensuring the security and privacy of machine learning models is a must-have, as neglecting these aspects can result in hefty fines, data breaches, ransoms to hacker groups and a significant loss of reputation among customers and partners.  DataRobot offers robust solutions to protect against the top 10 risks identified by The Open Worldwide Application Security Project (OWASP), including security and privacy vulnerabilities. Whether you’re working with custom models, using the DataRobot playground, or both, this 7-step safeguarding guide will walk you through how to set up an effective moderation system for your organization.

Step 1: Access the Moderation Library

Begin by opening DataRobot’s Guard Library, where you can select various guards to safeguard your models. These guards can help prevent several issues, such as:

  • Personal Identifiable Information (PII) leakage
  • Prompt injection
  • Harmful content
  • Hallucinations (using Rouge-1 and Faithfulness)
  • Discussion of competition
  • Unauthorized topics

Step 2: Utilize Custom and Advanced Guardrails

DataRobot not only comes equipped with built-in guards but also provides the flexibility to use any custom model as a guard, including large language models (LLM), binary, regression, and multi-class models. This allows you to tailor the moderation system to your specific needs. Additionally, you can employ state-of-the-art ‘NVIDIA NeMo’ input and output self-checking rails to ensure that models stay on topic, avoid blocked words, and handle conversations in a predefined manner. Whether you choose the robust built-in options or decide to integrate your own custom solutions, DataRobot supports your efforts to maintain high standards of security and efficiency.

Configure evaluation and moderation

Step 3: Configure Your Guards

Setting Up Evaluation Deployment Guard

  1. Choose the entity to apply it to (prompt or response).
  2. Deploy global models  from the DataRobot Registry or use your own.
  3. Set the moderation threshold to determine the strictness of the guard.
Example how to set threshold
Example how to set threshold
Example of response with PII moderation criteria > 0.8
Example of response with PII moderation criteria > 0.8
Example of response with PII moderation criteria > 0.5
Example of response with PII moderation criteria > 0.5

Configuring NeMo Guardrails

  1. Provide your OpenAI key.
  2. Use pre-uploaded files or customize them by adding blocked terms. Configure the system prompt to determine blocked or allowed topics, moderation criteria and more.
Configuring NeMo Guardrails

Step 4: Define Moderation Logic

Choose a moderation method:

  • Report: Track and notify admins if the moderation criteria are not met.
  • Block: Block the prompt or response if it fails to meet the criteria, displaying a custom message instead of the LLM response.
 Moderation Logic

By default, the moderation operates as follows:

  • First, prompts are evaluated using configured guards in parallel to reduce latency.
  • If a prompt fails the evaluation by any “blocking” guard, it is not sent to the LLM, reducing costs and enhancing security.
  • The prompts that passed the criteria are scored using LLM and then, responses are evaluated.
  • If the response fails, users see a predefined, customer-created message instead of the raw LLM response.
Evaluation and moderation lineage

Step 5: Test and Deploy

Before going live, thoroughly test the moderation logic. Once satisfied, register and deploy your model. You can then integrate it into various applications, such as a Q&A app, a custom app, or even a Slackbot, to see moderation in action.

Q&A app - DataRobot

Step 6: Monitor and Audit

Keep track of the moderation system’s performance with automatically generated custom metrics. These metrics provide insights into:

  • The number of prompts and responses blocked by each guard.
  • The latency of each moderation phase and guard.
  • The average scores for each guard and phase, such as faithfulness and toxicity.
LLM with Prompt Injection

Additionally, all moderated activities are logged, allowing you to audit app activity and the effectiveness of the moderation system.

Step 7: Implement a Human Feedback Loop

In addition to automated monitoring and logging, establishing a human feedback loop is crucial for refining the effectiveness of your moderation system. This step involves regularly reviewing the outcomes of the moderation process and the decisions made by automated guards. By incorporating feedback from users and administrators, you can continuously improve model accuracy and responsiveness. This human-in-the-loop approach ensures that the moderation system adapts to new challenges and evolves in line with user expectations and changing standards, further enhancing the reliability and trustworthiness of your AI applications.

from datarobot.models.deployment import CustomMetric

custom_metric = CustomMetric.get(
    deployment_id="5c939e08962d741e34f609f0", custom_metric_id="65f17bdcd2d66683cdfc1113")

data = [{'value': 12, 'sample_size': 3, 'timestamp': '2024-03-15T18:00:00'},
        {'value': 11, 'sample_size': 5, 'timestamp': '2024-03-15T17:00:00'},
        {'value': 14, 'sample_size': 3, 'timestamp': '2024-03-15T16:00:00'}]

custom_metric.submit_values(data=data)

# data witch association IDs
data = [{'value': 15, 'sample_size': 2, 'timestamp': '2024-03-15T21:00:00', 'association_id': '65f44d04dbe192b552e752aa'},
        {'value': 13, 'sample_size': 6, 'timestamp': '2024-03-15T20:00:00', 'association_id': '65f44d04dbe192b552e753bb'},
        {'value': 17, 'sample_size': 2, 'timestamp': '2024-03-15T19:00:00', 'association_id': '65f44d04dbe192b552e754cc'}]

custom_metric.submit_values(data=data)

Final Takeaways

Safeguarding your models with DataRobot’s comprehensive moderation tools not only enhances security and privacy but also ensures your deployments operate smoothly and efficiently. By utilizing the advanced guards and customizability options offered, you can tailor your moderation system to meet specific needs and challenges. 

LLM with prompt injection and NeMo guardrails

Monitoring tools and detailed audits further empower you to maintain control over your application’s performance and user interactions. Ultimately, by integrating these robust moderation strategies, you’re not just protecting your models—you’re also upholding trust and integrity in your machine learning solutions, paving the way for safer, more reliable AI applications.

SPRING ‘24 LAUNCH EVENT

Confidently Deploy and Govern Generative AI Solutions


Watch on-demand

About the author

Aslihan Buner
Aslihan Buner

Senior Product Marketing Manager, AI Observability, DataRobot

Aslihan Buner is Senior Product Marketing Manager for AI Observability at DataRobot where she builds and executes go-to-market strategy for LLMOps and MLOps products. She partners with product management and development teams to identify key customer needs as strategically identifying and implementing messaging and positioning. Her passion is to target market gaps, address pain points in all verticals, and tie them to the solutions.


Meet Aslihan Buner


Kateryna Bozhenko
Kateryna Bozhenko

Product Manager, AI Production, DataRobot

Kateryna Bozhenko is a Product Manager for AI Production at DataRobot, with a broad experience in building AI solutions. With degrees in International Business and Healthcare Administration, she is passionated in helping users to make AI models work effectively to maximize ROI and experience true magic of innovation.


Meet Kateryna Bozhenko