OpenAI launched the new moderation model omni-moderation-latest on the Moderation API (open new window). The new model, GPT-4o, is also capable of carrying out completions from both text and image inputs and, at all anyway countable, surpasses our former model in its performance especially for non-English languages. OpenAI decided to upgrade GPT-4o Moderation API model to bulid more robsut.
Like the prior version of the model, this model deploys OpenAI’s GPT-based classifiers to determine if content should be flagged for any category of hate, violence self-harm, and more and include features for additional harm categories.
Furthermore, it affords greater specificity of moderation decisions by normalizing probability scores with the potential of content to correspond to the category in question. The new moderation model is being offered for free for all the developers through GPT-4o Moderation API.
Major features of the Upgraded GPT-4o Moderation API
- Multimodal harm classification across six categories: the new model is also capable of ascertaining the risk that an image, as a single element or concerning the text, may be deemed undesirable. This is supported today for the following categories: violence which is classified as violence and violence/graphic, self-harm classified as self-harm, self-harm/intent, self-harm/instruction, and sexual, not including sexual/minor. All other categories today are text-only and we have plans to provide more support for multimedia expectations in the future.
- Two new text-only harm categories: The new model can identify harm in two more categories than in our prior models: Illicit, which includes instructions or advice on how to commit an offense – say, “How to shoplift” and illicit/violent, which is the same as the former but involves violence.
- More accurate scores, especially for non-English content: on our internal multimodal eval on a test of 40 languages, this new model was 42% better than the previous model and better in 98% of the languages we tried. In low-resource languages like Khmer and Swati, it increased by 70%, while languages like Telugu, Bengali, and Marathi increased by 6.4, 5.6, and 4.6 respectively.
- In the previous model, only English was supported but even in Spanish, German, Italian, Polish, Vietnamese, Portuguese, French, Chinese Indonesian, and English, this new model outperforms the previous model in English.
- Calibrated scores: the new model’s scores now more accurately represent the likelihood that a given piece of content violates the policies and will be far more aligned in future moderation models.
AI content moderation tools supplement compliance with platform policies and also assist human moderators in maintaining the well-being of digital platforms. As with the previous model, the new moderation model is available free of charge for all developers through the GPT-4o Moderation API with the limits depending on the tier. To get started, visit this link with more information on GPT-4o Moderation API guide opens in a new tab.