AI – Sunday, January 5, 2025: Commentary with Notable and Interesting News, Articles, and Papers

Chester the Cat

Commentary and a selection of the most important recent news, articles, and papers about AI.

Today’s Brief Commentary

Welcome to my first newsletter of 2025 focused on topics related to AI. Over the last several weeks, I slowed down my publishing rate for this and the quantum editions to catch my breath. I did not especially reflect on 2024, since I see time and discovery as continuous rather than marked by calendars. Yes, I know business results are often reported quarterly and annually, but I prefer to think of the flow of ideas as ignoring all that.

This year, I will try to publish slightly more often with slightly less content per edition. The curation quality will remain the same using my three criteria for possible inclusion: is a particular article or paper interesting, significant, or amusing?

I’m also reexamining the images I use for the newsletter editions. Most of mine were generated by OpenAI ChatGPT with DALL-E 3, but they have a certain sameness. This look is also present in many others’ posts. The first link below is a review of the best GenAI image generators. My best bet might be to simply post photos of our family cats since there seems to be no better SEO technique.

Finally, the last few weeks have been blemished for me by horribly inaccurate posts that people put up for social media clickbait. I imagine one of these posters using an LLM prompt like “Create a 500-word article combining blockchain and Google’s Willow chip news.” The posters themselves may be bots. I’ve responded to a few of these and usually regretted doing so. My new strategy is to block the poster and move on.

General News, Articles, and Analyses


Best AI image generators of 2025 | Tom’s Guide

https://www.tomsguide.com/best-picks/best-ai-image-generators

Author: Ryan Morrison

Date: Friday, November 8, 2024

Excerpt: In a rapidly evolving market, a good model isn’t enough anymore though. What makes a platform stand out is the additional features including tools like Ideogram’s Canvas, upscaling in Freepik or Midjourney’s Editor which also includes editing external images.

Working out which model to use, especially when you are parting with money to get the best capabilities, can be a challenge. Hopefully, this guide will help. As well as simply using the image generation platforms, I also put them to the test with a range of prompt styles, image types and criteria to see not only how they perform, but also how they compare to one another.

Semiconductor Chipsets and Infrastructure


These are the 10 hottest AI hardware companies to follow in 2025 | TechRadar

https://www.techradar.com/pro/these-are-the-10-hottest-ai-hardware-companies-to-follow-in-2025

Author: Wayne Williams

Date: Saturday, January 4, 2025

Commentary: Note the focus on three areas: edge, photonics, and AI accelerators.

Excerpt: Watch out Nvidia, these startups are looking to dent your dominance

Generative AI and Models


AI can now create a replica of your personality | MIT Technology Review

https://www.technologyreview.com/2024/11/20/1107100/ai-can-now-create-a-replica-of-your-personality/

Author: James O’Donnell

Date: Saturday, November 30, 2024

Excerpt: A two-hour interview is enough to accurately capture your values and preferences, according to new research from Stanford and Google DeepMind.

Generative AI and Models | Technical


[2412.08821v2] Large Concept Models: Language Modeling in a Sentence Representation Space

https://arxiv.org/abs/2412.08821v2

Authors: LCM team; Barrault, Loïc; Duquenne, Paul-Ambroise; Elbayad, Maha; Kozhevnikov, Artyom; Alastruey, Belen; Andrews, Pierre; Coria, Mariano; Couairon, Guillaume; ; …; and Schwenk, Holger

Date: Wednesday, December 11, 2024

Commentary: There’s more to GenAI than LLMs. The middle “L” stands for “Language.” We also have Large Action Models (LAMs), Large Video Models (LVMs), Large Concept Models (LCMs), and several others. These get combined and intertwined to create multi-modal models. This paper by a team at Meta team is about LCMs.

Excerpt: LLMs have revolutionized the field of artificial intelligence and have emerged as the de-facto tool for many tasks. The current established technology of LLMs is to process input and generate output at the token level. This is in sharp contrast to humans who operate at multiple levels of abstraction, well beyond single words, to analyze information and to generate creative content. In this paper, we present an attempt at an architecture which operates on an explicit higher-level semantic representation, which we name a concept. Concepts are language- and modality-agnostic and represent a higher level idea or action in a flow. Hence, we build a “Large Concept Model”. In this study, as proof of feasibility, we assume that a concept corresponds to a sentence, and use an existing sentence embedding space, SONAR, which supports up to 200 languages in both text and speech modalities. The Large Concept Model is trained to perform autoregressive sentence prediction in an embedding space. We explore multiple approaches, namely MSE regression, variants of diffusion-based generation, and models operating in a quantized SONAR space. These explorations are performed using 1.6B parameter models and training data in the order of 1.3T tokens. We then scale one architecture to a model size of 7B parameters and training data of about 2.7T tokens. We perform an experimental evaluation on several generative tasks, namely summarization and a new task of summary expansion. Finally, we show that our model exhibits impressive zero-shot generalization performance to many languages, outperforming existing LLMs of the same size. The training code of our models is freely available.

[2412.14711] ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing

https://arxiv.org/abs/2412.14711

Authors: Wang, Ziteng; Chen, Jianfei; and Zhu, Jun

Date: Thursday, December 19, 2024

Excerpt: Sparsely activated Mixture-of-Experts (MoE) models are widely adopted to scale up model capacity without increasing the computation budget. However, vanilla TopK routers are trained in a discontinuous, non-differentiable way, limiting their performance and scalability. To address this issue, we propose ReMoE, a fully differentiable MoE architecture that offers a simple yet effective drop-in replacement for the conventional TopK+Softmax routing, utilizing ReLU as the router instead. We further propose methods to regulate the router’s sparsity while balancing the load among experts. ReMoE’s continuous nature enables efficient dynamic allocation of computation across tokens and layers, while also exhibiting domain specialization. Our experiments demonstrate that ReMoE consistently outperforms vanilla TopK-routed MoE across various model sizes, expert counts, and levels of granularity. Furthermore, ReMoE exhibits superior scalability with respect to the number of experts, surpassing traditional MoE architectures. The implementation based on Megatron-LM is available at https://github.com/thu-ml/ReMoE.