GPT-4 and ChatGPT’s developer OpenAI have formulated a new method by the name of ‘Rule-Based Rewards (RBR)’ for enhancing the efficiency and safety of language models. A conspiracy is one where it is said that RBR can safely operate an AI system without having to use human data collectors and instead use the AI itself.
OpenAI shifting from RLHF method to Rule-based Rewards (RBR)
More commonly, reinforcement learning from human feedback (RLHF) has been used to make sure that the language models obey the provided instructions and stay safe. Nonetheless, OpenAI’s research presents RBRs as a superior solution through which the remote is rendered more versatile. RBRs follow a set of well-defined procedures for assessing the model’s output as well as to provide guidance on its operation to guarantee that it is servicing safely.
Earlier on, OpenAI applied the RLHF method, in which reinforcement learning is employed to further train the language models based on human supervision; however, RBR is more effective and versatile than RLHF to guarantee that the language models will adhere to the provided directions and safe parameters.
The major reason behind integrating the RBR method
By doing so, RBR can solve the drawbacks of human feedback like the factors of ‘cost and time-consuming’ and ‘bias.’ In RBR, propositions like ‘judgmental,’ ‘do not include unauthorized content,’ ‘correspond to the safety policies,’ and ‘disclaimers’ are defined and rules are framed to construct a safe and proper reply by AI in various cases.
The three categories of desired model behavior when dealing with harmful or sensitive topics are: Hard refusal and soft refusal are two of the types, and the third is to Comply. Hard Refusals consist of a short apology followed by a statement of the inability to obey the order. Soft Refusals may provide a different answer to, for example, a self-harm question, one that includes an apology. The Comply category entails the model giving a response that complies with the user’s request while at the same time following the safety precautions.
OpenAI said, ‘We plan to conduct further research to gain a more comprehensive understanding of the various RBR components, as well as human evaluation to validate the effectiveness of RBR in various applications, including in other areas beyond safety.’