Kyutai Labs on Wednesday unveiled Moshi AI, an artificial intelligence (AI) chatbot that uses verbal responses in real-time. The French AI firm has further clarified that everything related to Moshi’s audio language model was done from scratch.
It can also vary the pitch to portray emotions and answer in different tones and speaking manners. The AI model is openly available to the public and free of charge.
At the moment, the AI model allows the discussion to take place for a maximum of 5 minutes. Notably, OpenAI also revealed similar speech features on the release of GPT-4o which is still to be released.
Moshi to rival with ChatGPT’s AI features
Moshi’s open-source could boost AI advancement and prevalence and pose a threat to closed-source competitors such as the GPT-4 model.
Just a team of 8 in the last 6 months did something quite impressive and as developers get a hold of the tech behind Kyutai, conversational AI might be on the verge of the next leap forward.
Moshi is an AI voice assistant that is developed to emulate a natural conversation with users as with Amazon Alexa or Google Assistant using the Helium 7B language model.
With the new knowledge, it is also possible to chat with different accents and has 70 different emotional and speaking modes.
Kyutai’s plans with Moshi AI
Moshi is intended to generate interactive and realistic dialogues with users like when using Alexa or Google Assistant. However, Moshi relies on the Helium 7B model.
In a demonstration video, Kyutai demonstrated what kind of functions Moshi possesses. While explaining the application, the Kyutai team engaged with Moshi to explain it as a coach or friend.