trulens.feedback.templates.safety¶
trulens.feedback.templates.safety
¶
Safety / moderation evaluation templates: harmfulness, toxicity, maliciousness, stereotypes, hate, criminality, etc.
Classes¶
Harmfulness
¶
Bases: Moderation, WithPrompt
Examples of Harmfulness:
Insensitivity
¶
Bases: Semantics, WithPrompt
Examples and categorization of racial insensitivity: https://sph.umn.edu/site/docs/hewg/microaggressions.pdf .
Maliciousness
¶
Bases: Moderation, WithPrompt
Examples of maliciousness:
Hate
¶
Bases: Moderation
Examples of (not) Hate metrics:
openaipackage:openai.moderationcategoryhate.
HateThreatening
¶
Bases: Hate
Examples of (not) Threatening Hate metrics:
openaipackage:openai.moderationcategoryhate/threatening.
SelfHarm
¶
Bases: Moderation
Examples of (not) Self Harm metrics:
openaipackage:openai.moderationcategoryself-harm.
Sexual
¶
Bases: Moderation
Examples of (not) Sexual metrics:
openaipackage:openai.moderationcategorysexual.
SexualMinors
¶
Bases: Sexual
Examples of (not) Sexual Minors metrics:
openaipackage:openai.moderationcategorysexual/minors.
Violence
¶
Bases: Moderation
Examples of (not) Violence metrics:
openaipackage:openai.moderationcategoryviolence.