OpenAI said on Monday it had given its artificial intelligence (AI) program ChatGPT the ability to speak and see to make it “more intuitive.”
The interface that popularized generative AI, that is, capable of producing text, images and other content on demand, will soon be able to process requests with images and also discuss orally with its users.
In a statement, OpenAi gives the example of a user who photographs a monument and then “converses with ChatGPT” about the history of the construction or who shows the program the inside of his refrigerator so that it can propose a recipe.
Other examples of possible uses, according to the company, are helping children with school exercises, for example, by sending them a photo of a math problem, or asking the ‘chatbot’ to first tell the children a story to reconcile the dream.
These new tools will be developed over the next two weeks for subscribers of ChatGPT Plus, which is the paid version of the ‘chatbot’, and organizations that are clients of the service.
The company announced these features in March, introducing GPT-4, the latest version of its language model, which supports chatGPT.
GPT-4 is multimedia, in the sense that it can handle information in addition to text or computer codes.
The success of ChatGPT since late 2022 has sparked a race toward generative AI by tech conglomerates including Google and Microsoft.
But the accelerated development of these programs, still very poorly regulated, raises many concerns, especially because they tend to “hallucinate”, that is, to invent answers.
“Models with vision pose new challenges, hallucinations, because people can trust the interpretation of the images by the program in areas with serious consequences” of the decisions made, OpenAI admitted in its statement.
The company claimed to have “tested the model” on topics such as extremism and scientific knowledge, but added that it is based on real-life uses and user feedback to improve it.
On the other hand, it limited ChatGPT’s capabilities to “analyze people”, since the interface “is not always accurate and these systems must respect the confidentiality of people.”
Source: TSF