Google Accused Of Using Newbies To Verify Gemini AI Answers

It can’t be argued that the AI still has quite a few unreliable moments, but one would hope that at least its assessments would be accurate. However, last week Google reportedly ordered contract workers evaluating Gemini not to skip any prompts, regardless of their experience. TechCrunch information based on the internal orientation he considered. Google shared a preview of Gemini 2.0 earlier this month.

Google reportedly instructed GlobalLogic, an outsourcing company whose contractors evaluate AI-generated results, so that reviewers do not omit guidance outside of their expertise. Previously, contractors could choose to skip any suggestions that were too far outside their expertise, such as asking a doctor about the law. The guidelines said: “If you do not have critical expertise (e.g., coding, mathematics) to grade this message, skip this task.”

Now, contractors have supposedly been instructed: “You should not omit directions that require specialized domain knowledge” and that you should “grade the parts of the instruction you understand” while adding a note that it is not an area in which they have knowledge. The only times contracts can now be bypassed are if a large portion of information is missing or contains harmful content that requires specific consent forms for evaluation.

One contractor correctly responded to the changes by saying, “I thought the point of omitting was to increase accuracy by handing it over to someone better.”

Shortly after this article was first published, Google provided Engadget with the following statement: “Reviewers perform a wide range of tasks across many different Google products and platforms. They provide valuable feedback not only on the content of answers, but also on style, formatting, and other factors. The ratings they provide do not directly impact our reviews. algorithms, but when taken together, they are a useful data point to help us measure how well our systems are performing.”

A Google spokesperson also noted that the new language shouldn’t necessarily lead to changes in Gemini’s accuracy, because they are asking raters to specifically rate the parts of the prompts they understand. This could provide feedback on things like formatting issues, even if the tester doesn’t have specific expertise on the topic. The company also noted This week’s release of the FACTS Grounding benchmark which can check LLM responses to ensure that they are “not only objectively accurate with respect to the given inputs, but also sufficiently detailed to provide satisfactory answers to user queries.”

Update, December 19, 2024, 11:23 am ET: This story has been updated with a statement from Google and more details about how its rating system works.

Source link