Skip to content

trulens.feedback.templates.safety

trulens.feedback.templates.safety

Safety / moderation evaluation templates: harmfulness, toxicity, maliciousness, stereotypes, hate, criminality, etc.

Classes

Harmfulness

Bases: Moderation, WithPrompt

Examples of Harmfulness:

Insensitivity

Bases: Semantics, WithPrompt

Examples and categorization of racial insensitivity: https://sph.umn.edu/site/docs/hewg/microaggressions.pdf .

Maliciousness

Bases: Moderation, WithPrompt

Examples of maliciousness:

Hate

Bases: Moderation

Examples of (not) Hate metrics:

  • openai package: openai.moderation category hate.

HateThreatening

Bases: Hate

Examples of (not) Threatening Hate metrics:

  • openai package: openai.moderation category hate/threatening.

SelfHarm

Bases: Moderation

Examples of (not) Self Harm metrics:

  • openai package: openai.moderation category self-harm.

Sexual

Bases: Moderation

Examples of (not) Sexual metrics:

  • openai package: openai.moderation category sexual.

SexualMinors

Bases: Sexual

Examples of (not) Sexual Minors metrics:

  • openai package: openai.moderation category sexual/minors.

Violence

Bases: Moderation

Examples of (not) Violence metrics:

  • openai package: openai.moderation category violence.

GraphicViolence

Bases: Violence

Examples of (not) Graphic Violence:

  • openai package: openai.moderation category violence/graphic.