AI정확도

The Need for Benchmarks to Advance AI-Enabled Player Risk Detection in Gambling

“정확도 90%?” 도박 중독 막는다던 AI, 실제로는 제대로 작동하는지 아무도 몰라

12월 1, 2025

온라인 도박 업체들이 도박 중독을 예방하는 인공지능(AI) 시스템을 앞다퉈 도입하고 있다. 하지만 이런 시스템들이 정말 효과가 있는지 확인할 방법이 없어 논란이 되고 있다. 미국…

ThumbnailTruth: A Multi-Modal LLM Approach for Detecting Misleading YouTube Thumbnails Across Diverse Cultural Settings

유튜브 썸네일 어그로 시대 끝나나? AI가 유해 썸네일 94% 정확도로 찾는다

9월 10, 2025

인공지능을 활용한 가짜 썸네일 탐지 연구에서 클로드 3.5 소네트가 기존 전문 시스템보다 뛰어난 성과를 보였다. 이때 가짜 썸네일은 내용에 비해 썸네일이 과장되거나 거짓 약속을…

Students' Perceptions to a Large Language Model's Generated Feedback and Scores of Argumentation Essays

챗GPT로 대학생 과제 채점했더니… 학생들, AI의 정확한 개념 피드백에 ‘대부분 만족’

8월 22, 2025

미국 퍼듀대학교 물리천문학과 윈터 앨런(Winter Allen) 연구팀이 물리학 수업에서 인공지능을 활용한 실험을 진행했다. 대학 1학년 물리 수업에서 학생들이 작성한 과학 논리 설명문을 오픈AI의 GPT-4o에게…

Large language models without grounding recover non-sensorimotor but not sensorimotor features of human concepts

“감정은 이해·촉각과 후각은 이해 못해” LLM의 감각에 대한 연구 결과 공개

6월 16, 2025

Large language models without grounding recover non-sensorimotor but not sensorimotor features of human concepts GPT-4와 제미나이(Gemini) 같은 거대언어모델(LLM)들이 감각-운동 경험 없이도 감정이나 추상적 개념에서는…

WEB-SHEPHERD: Advancing PRMs for Reinforcing Web Agents

웹사이트 사람처럼 탐색하는 ‘똑똑한 로봇’ 등장… 연세대가 개발한 ‘WEB-SHEPHERD’

6월 5, 2025

WEB-SHEPHERD: Advancing PRMs for Reinforcing Web Agents GPT-4o 대비 30점 높은 정확도, 10배 저렴한 비용 효율성 실현 연세대학교와 카네기멜론대학교 연구진이 웹 내비게이션 분야에서 혁신적인…

Good answers are not necessarily factual answers: an analysis of hallucination in leading LLMs

AI에게 ‘간략히 설명해줘’라고 말하면 오답률 20% 증가… 충격적 연구 결과

5월 12, 2025

Good answers are not necessarily factual answers: an analysis of hallucination in leading LLMs 배포된 AI 애플리케이션 사고의 3분의 1이 환각 현상 때문… 전문가들…

Fact-checking with Generative AI: A Systematic Cross-Topic Examination of LLMs Capacity to Detect Veracity of Political Information

코로나19에선 정확, 경제는 취약? 5대 AI 모델의 팩트체킹 능력 비교

3월 14, 2025

Fact-checking with Generative AI: A Systematic Cross-Topic Examination of LLMs Capacity to Detect Veracity of Political Information LLM, 거짓 정보 탐지에는 강하지만 전반적 성능은…