AI – Friday, July 26, 2024: Notable and Interesting News, Articles, and Papers

Advanced AI data center

A selection of the most important recent news, articles, and papers about AI.

News, Articles, and Analyses

Meta announces Llama, an AI model to rival Anthropic, Google and OpenAI.

https://www.axios.com/2024/07/23/meta-releases-open-source-llama-ai-model

Author: Ina Fried

(Tuesday, July 23, 2024) “The model designed to rival the biggest models from Anthropic, Google and OpenAI.”

FTC Issues Orders to Eight Companies Seeking Information on Surveillance Pricing | Federal Trade Commission

https://www.ftc.gov/news-events/news/press-releases/2024/07/ftc-issues-orders-eight-companies-seeking-information-surveillance-pricing

(Tuesday, July 23, 2024) “The Federal Trade Commission issued orders to eight companies offering surveillance pricing products and services that incorporate data about consumers’ characteristics and behavior.”

Introducing Llama 3.1: Our most capable models to date

https://ai.meta.com/blog/meta-llama-3-1/

“Bringing open intelligence to all, our latest models expand context length, add support across eight languages, and include Meta Llama 3.1 405B— the…”

Alphabet’s Strong Q2 2024 – The Futurum Group

https://futurumgroup.com/insights/alphabets-strong-q2-2024-highlighting-revenue-growth-and-ai-impact/

Author: Keith Kirkpatrick

“Alphabet’™s Q2 2024 results highlight 14% revenue growth, driven by AI and cloud innovations, exceeding analyst expectations.”

AI Software & Services June 2024 Market Snapshot Report – The Futurum Group

https://futurumgroup.com/insights/artificial-intelligence-software-and-services-june-2024-monthly-market-snapshot-report/

Author: Keith Kirkpatrick

“We focus on enterprise AI news for June 2024, assessing product news, partnerships, research developments, and industry and market activity.”

Technical Papers and Preprints

[2407.16286] A deeper look at depth pruning of LLMs

https://arxiv.org/abs/2407.16286

Authors: Siddiqui, Shoaib Ahmed; Dong, Xin; Heinrich, Greg; Breuel, Thomas; Kautz, Jan; Krueger, David; Molchanov, Pavlo

arXiv logo(Tuesday, July 23, 2024) “Large Language Models (LLMs) are not only resource-intensive to train but even more costly to deploy in production. Therefore, recent work has attempted to prune blocks of LLMs based on cheap proxies for estimating block importance, effectively removing 10% of blocks in well-trained LLaMa-2 and Mistral 7b models without any significant degradation of downstream metrics. In this paper, we explore different block importance metrics by considering adaptive metrics such as Shapley value in addition to static ones explored in prior work. We show that adaptive metrics exhibit a trade-off in performance between tasks i.e., improvement on one task may degrade performance on the other due to differences in the computed block influences. Furthermore, we extend this analysis from a complete block to individual self-attention and feed-forward layers, highlighting the propensity of the self-attention layers to be more amendable to pruning, even allowing removal of upto 33% of the self-attention layers without incurring any performance degradation on MMLU for Mistral 7b (significant reduction in costly maintenance of KV-cache). Finally, we look at simple performance recovery techniques to emulate the pruned layers by training lightweight additive bias or low-rank linear adapters. Performance recovery using emulated updates avoids performance degradation for the initial blocks (up to 5% absolute improvement on MMLU), which is either competitive or superior to the learning-based technique.”

[2407.17468] WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries

https://arxiv.org/abs/2407.17468

Authors: Zhao, Wenting; Goyal, Tanya; Chiu, Yu Ying; Jiang, Liwei; Newman, Benjamin; Ravichander, Abhilasha; Chandu, Khyathi; Bras, Ronan Le; Cardie, Claire; Deng, Yuntian; Choi, Yejin

arXiv logo(Wednesday, July 24, 2024) “While hallucinations of large language models (LLMs) prevail as a major challenge, existing evaluation benchmarks on factuality do not cover the diverse domains of knowledge that the real-world users of LLMs seek information about. To bridge this gap, we introduce WildHallucinations, a benchmark that evaluates factuality. It does so by prompting LLMs to generate information about entities mined from user-chatbot conversations in the wild. These generations are then automatically fact-checked against a systematically curated knowledge source collected from web search. Notably, half of these real-world entities do not have associated Wikipedia pages. We evaluate 118,785 generations from 15 LLMs on 7,919 entities. We find that LLMs consistently hallucinate more on entities without Wikipedia pages and exhibit varying hallucination rates across different domains. Finally, given the same base models, adding a retrieval component only slightly reduces hallucinations but does not eliminate hallucinations.”