Commentary and a selection of the most important recent news, articles, and papers about AI.
Today’s Brief Commentary
Welcome to my first newsletter of 2025 focused on topics related to AI. Over the last several weeks, I slowed down my publishing rate for this and the quantum editions to catch my breath. I did not especially reflect on 2024, since I see time and discovery as continuous rather than marked by calendars. Yes, I know business results are often reported quarterly and annually, but I prefer to think of the flow of ideas as ignoring all that.
This year, I will try to publish slightly more often with slightly less content per edition. The curation quality will remain the same using my three criteria for possible inclusion: is a particular article or paper interesting, significant, or amusing?
I’m also reexamining the images I use for the newsletter editions. Most of mine were generated by OpenAI ChatGPT with DALL-E 3, but they have a certain sameness. This look is also present in many others’ posts. The first link below is a review of the best GenAI image generators. My best bet might be to simply post photos of our family cats since there seems to be no better SEO technique.
Finally, the last few weeks have been blemished for me by horribly inaccurate posts that people put up for social media clickbait. I imagine one of these posters using an LLM prompt like “Create a 500-word article combining blockchain and Google’s Willow chip news.” The posters themselves may be bots. I’ve responded to a few of these and usually regretted doing so. My new strategy is to block the poster and move on.
General News, Articles, and Analyses
Best AI image generators of 2025 | Tom’s Guide
https://www.tomsguide.com/best-picks/best-ai-image-generators
Author: Ryan Morrison
Date: Friday, November 8, 2024
Excerpt: In a rapidly evolving market, a good model isn’t enough anymore though. What makes a platform stand out is the additional features including tools like Ideogram’s Canvas, upscaling in Freepik or Midjourney’s Editor which also includes editing external images.
Working out which model to use, especially when you are parting with money to get the best capabilities, can be a challenge. Hopefully, this guide will help. As well as simply using the image generation platforms, I also put them to the test with a range of prompt styles, image types and criteria to see not only how they perform, but also how they compare to one another.
Semiconductor Chipsets and Infrastructure
These are the 10 hottest AI hardware companies to follow in 2025 | TechRadar
https://www.techradar.com/pro/these-are-the-10-hottest-ai-hardware-companies-to-follow-in-2025
Author: Wayne Williams
Date: Saturday, January 4, 2025
Commentary: Note the focus on three areas: edge, photonics, and AI accelerators.
Excerpt: Watch out Nvidia, these startups are looking to dent your dominance
Generative AI and Models
AI can now create a replica of your personality | MIT Technology Review
https://www.technologyreview.com/2024/11/20/1107100/ai-can-now-create-a-replica-of-your-personality/
Author: James O’Donnell
Date: Saturday, November 30, 2024
Excerpt: A two-hour interview is enough to accurately capture your values and preferences, according to new research from Stanford and Google DeepMind.
Generative AI and Models | Technical
[2412.08821v2] Large Concept Models: Language Modeling in a Sentence Representation Space
https://arxiv.org/abs/2412.08821v2
Authors: LCM team; Barrault, Loïc; Duquenne, Paul-Ambroise; Elbayad, Maha; Kozhevnikov, Artyom; Alastruey, Belen; Andrews, Pierre; Coria, Mariano; Couairon, Guillaume; ; …; and Schwenk, Holger
Date: Wednesday, December 11, 2024
Commentary: There’s more to GenAI than LLMs. The middle “L” stands for “Language.” We also have Large Action Models (LAMs), Large Video Models (LVMs), Large Concept Models (LCMs), and several others. These get combined and intertwined to create multi-modal models. This paper by a team at Meta team is about LCMs.
Excerpt: LLMs have revolutionized the field of artificial intelligence and have emerged as the de-facto tool for many tasks. The current established technology of LLMs is to process input and generate output at the token level. This is in sharp contrast to humans who operate at multiple levels of abstraction, well beyond single words, to analyze information and to generate creative content. In this paper, we present an attempt at an architecture which operates on an explicit higher-level semantic representation, which we name a concept. Concepts are language- and modality-agnostic and represent a higher level idea or action in a flow. Hence, we build a “Large Concept Model”. In this study, as proof of feasibility, we assume that a concept corresponds to a sentence, and use an existing sentence embedding space, SONAR, which supports up to 200 languages in both text and speech modalities. The Large Concept Model is trained to perform autoregressive sentence prediction in an embedding space. We explore multiple approaches, namely MSE regression, variants of diffusion-based generation, and models operating in a quantized SONAR space. These explorations are performed using 1.6B parameter models and training data in the order of 1.3T tokens. We then scale one architecture to a model size of 7B parameters and training data of about 2.7T tokens. We perform an experimental evaluation on several generative tasks, namely summarization and a new task of summary expansion. Finally, we show that our model exhibits impressive zero-shot generalization performance to many languages, outperforming existing LLMs of the same size. The training code of our models is freely available.
[2412.14711] ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing
https://arxiv.org/abs/2412.14711
Authors: Wang, Ziteng; Chen, Jianfei; and Zhu, Jun
Date: Thursday, December 19, 2024
Excerpt: Sparsely activated Mixture-of-Experts (MoE) models are widely adopted to scale up model capacity without increasing the computation budget. However, vanilla TopK routers are trained in a discontinuous, non-differentiable way, limiting their performance and scalability. To address this issue, we propose ReMoE, a fully differentiable MoE architecture that offers a simple yet effective drop-in replacement for the conventional TopK+Softmax routing, utilizing ReLU as the router instead. We further propose methods to regulate the router’s sparsity while balancing the load among experts. ReMoE’s continuous nature enables efficient dynamic allocation of computation across tokens and layers, while also exhibiting domain specialization. Our experiments demonstrate that ReMoE consistently outperforms vanilla TopK-routed MoE across various model sizes, expert counts, and levels of granularity. Furthermore, ReMoE exhibits superior scalability with respect to the number of experts, surpassing traditional MoE architectures. The implementation based on Megatron-LM is available at https://github.com/thu-ml/ReMoE.