A selection of the most important recent news, articles, and papers about AI.
News, Articles, and Analyses
AI and Games Conference
https://www.aiandgamesconference.com/
(Sunday, July 28, 2024) “Bringing together the leading experts in AI for the video games industry.”
GenAI Can’t Scale Without Responsible AI
https://www.bcg.com/ja-jp/publications/2024/genai-cant-scale-without-responsible-ai
Authors: Eric Jesse; Vanessa Lyon; Maria Gomez; and Krupa Narayana Swamy
(Wednesday, August 14, 2024) “GenAI agents need to handle tasks responsibly, accurately, and swiftly in multiple languages, addressing potentially millions of specifications across hundreds of thousands of products.”
A.I. Is Helping to Launch New Businesses – The New York Times
https://www.nytimes.com/2024/08/18/business/economy/ai-business-startups.html
(Sunday, August 18, 2024) “Entrepreneurs say use of artificial intelligence for a variety of tasks is accelerating the path to hiring and, ideally, profitability.”
AMD to Significantly Expand Data Center AI Systems Capabilities with Acquisition of Hyperscale Solutions Provider ZT Systems
(Monday, August 19, 2024) “AMD (NASDAQ: AMD) today announced the signing of a definitive agreement to acquire ZT Systems, a leading provider of AI infrastructure for the world’s largest hyperscale computing companies. The strategic transaction marks the next major step in AMD’s AI strategy to deliver leadership AI training and inferencing solutions based on innovating across silicon, software and systems.”
Embracing Gen AI at Work
https://hbr.org/2024/09/embracing-gen-ai-at-work
Authors: H. James Wilson and Paul R. Daugherty
(Sunday, September 01, 2024) “Today artificial intelligence can be harnessed by nearly anyone, using commands in everyday language instead of code. Soon it will transform more than 40% of all work activity, according to the authors’ research. In this new era of collaboration between humans and machines, the ability to leverage AI effectively will be critical to your professional success. This article describes the three kinds of “fusion skills” you need to get the best results from gen AI. Intelligent interrogation involves instructing large language models to perform in ways that generate better outcomes—by, say, breaking processes down into steps or visualizing multiple potential paths to a solution. Judgment integration is about incorporating expert and ethical human discernment to make AI’s output more trustworthy, reliable, and accurate. It entails augmenting a model’s training sources with authoritative knowledge bases when necessary, keeping biases out of prompts, ensuring the privacy of any data used by the models, and scrutinizing suspect output. With reciprocal apprenticing, you tailor gen AI to your company’s specific business context by including rich organizational data and know-how into the commands you give it. As you become better at doing that, you yourself learn how to train the AI to tackle more-sophisticated challenges. The AI revolution is already here. Learning these three skills will prepare you to thrive in it.”
AI to reduce train delays, speed up NHS prescriptions and train construction workers gets £32 million boost – GOV.UK
“Government unveils AI projects to improve productivity and public services supported by a share of £32 million.”
Technical Papers, Articles, and Preprints
[2408.08781] Evaluating the Evaluator: Measuring LLMs’ Adherence to Task Evaluation Instructions
https://arxiv.org/abs/2408.08781
Authors: Murugadoss, Bhuvanashree; Poelitz, Christian; Drosos, Ian; Le, Vu; McKenna, Nick; Negreanu, Carina Suzana; Parnin, Chris; and Sarkar, Advait
(Friday, August 16, 2024) “LLMs-as-a-judge is a recently popularized method which replaces human judgements in task evaluation (Zheng et al. 2024) with automatic evaluation using LLMs. Due to widespread use of RLHF (Reinforcement Learning from Human Feedback), state-of-the-art LLMs like GPT4 and Llama3 are expected to have strong alignment with human preferences when prompted for a quality judgement, such as the coherence of a text. While this seems beneficial, it is not clear whether the assessments by an LLM-as-a-judge constitute only an evaluation based on the instructions in the prompts, or reflect its preference for high-quality data similar to its fine-tune data. To investigate how much influence prompting the LLMs-as-a-judge has on the alignment of AI judgements to human judgements, we analyze prompts with increasing levels of instructions about the target quality of an evaluation, for several LLMs-as-a-judge. Further, we compare to a prompt-free method using model perplexity as a quality measure instead. We aggregate a taxonomy of quality criteria commonly used across state-of-the-art evaluations with LLMs and provide this as a rigorous benchmark of models as judges. Overall, we show that the LLMs-as-a-judge benefit only little from highly detailed instructions in prompts and that perplexity can sometimes align better with human judgements than prompting, especially on textual quality.”