Radar Trends to Watch: May 2024 – O’Reilly


In the past month, we saw a blizzard of new language models. It’s almost hard to consider this news, though Microsoft’s open (but maybe not open source) Phi-3 is certainly worth a look. We’ve also seen promising work on reducing the resources required to do inference. While this may lead to larger models, it should also lead to reduced power use for small and midsized models.

AI

  • Microsoft’s Phi-3-mini is yet another freely available language model. It is small enough to run locally on phones and laptops. Its performance is similar to GPT-3.5 and Mixtral 8x7B.
  • Google’s Infini-attention is a new inference technique that allows large language models to offer infinite context.
  • Companies are increasingly adding AI bots to their boards as observers. The bots are there to plan strategy, help analyze financials, and report on compliance.
  • OutSystems offers a low-code toolkit for building AI agents, unsurprisingly named the AI Agent Builder.
  • Ethan Mollick’s Prompt Library is worth checking out. It collects most of the prompts from his book and his blog; most are Creative Commons, requiring only attribution. Anthropic has also published a prompt library for use with Claude, but which probably works with other LLMs.
  • There are many solutions for people who want to run large language models locally. They range from desktop apps to APIs. Here’s a list.
  • Meta has released the 8B and 70B versions of Llama 3. The largest versions are still to come. Early reports say that these smaller versions are impressive.
  • Mistral AI has announced Mixtral 8x22B, a larger version of its very impressive Mixtral 8x7B mixture-of-experts model.
  • Effort is a new method for doing LLM inference that reduces the amount of floating point computation needed without compromising the results. Effort has been implemented for Mistral but should work with other models.
  • The ML Commons is developing an AI Safety Benchmark for testing AI chatbots against common kinds of abuse. They caution that the current version (0.5) is only a proof of concept that shouldn’t be used to test production systems.
  • Representative Fine Tuning is a new technique for fine-tuning language models. It is unique because it focuses specifically on the task you want the model to perform. It outperforms other fine-tuning techniques, in addition to being faster and more efficient.
  • AI systems can be more persuasive than humans, particularly if they have access to information about the person they are trying to persuade. This extreme form of microtargeting may mean that AI has discovered persuasive techniques that we don’t yet understand.
  • In one 24-hour period, there were three major language model releases: Gemini Pro 1.5, GPT-4 Turbo, and Mixtral 8x22B. Mixtral is the most interesting; it’s a larger successor to the very impressive mixture-of-experts model Mixtral 8x7B.
  • More models for creating music are popping up all over. There’s Sonauto (apparently not related to Suno; Sonauto uses a different kind of model) and Udio, in addition to Stable Audio and Google’s MusicLM.
  • An ethical application for deep fakes? Domestic Data Streamers creates synthetic images based on memories—for example, an important event that was never captured in a photo. Interestingly, older image models seem to produce more pleasing results than the latest models.
  • What happened after Alpha Go beat the world’s best Go player? Human Go players got better. Some of the improvement came from studying games played by AI; some of it came from increased creativity.
  • You should listen to Permission Is Hereby Granted, Suno’s setting of the MIT License to music as a piano ballad.
  • How does AI-based code completion work? GitHub isn’t saying much, but Sourcegraph has provided some details for its Cody assistant. And Cody is open source, so you can analyze the code.
  • Claude-llm-trainer is a Google Colab notebook that simplifies the process of training Meta’s Llama 2.
  • In one set of experiments, large language models proved better than “classical” models at financial time series forecasting.
  • More easy ways to run language models locally: The Opera browser now includes support for 150 language models. This feature is currently available only in the Developer stream.
  • JRsdr is an AI product that promises to automate all your corporate social media. Do you dare trust it?
  • LLMLingua-2 is a specialized model designed to compress prompts. Compression is useful for long prompts—for example, RAG, chain-of-thought, and some other techniques. Compression reduces the context required, in turn increasing performance and reducing cost.
  • OpenAI has shared some samples generated by Voice Engine, its (still unreleased) model for synthesizing human voices.
  • Things generative AI can’t do: create a plain white image. Perhaps it’s not surprising that it’s difficult.
  • DeepMind has developed a large language model for checking the accuracy of an LLM’s output. Search-Augmented Factuality Evaluator (SAFE) appears to have accuracy that’s greater than crowdsourced humans and is less expensive to operate. Code for SAFE is posted on GitHub.
  • While AI-generated watermarks are often seen as a way to identify AI-generated text (and, in the EU, are required by law), it is relatively easy to discover a watermark and remove it or copy it for use on another document.
  • Particularly for vision models, being small isn’t necessarily a disadvantage. Small models trained on carefully curated data that’s relevant to the task at hand are less vulnerable to overfitting and other errors.

Programming

  • Martin Odersky, creator of the Scala programming language, has proposed “Lean Scala,” a simpler and more understandable way of writing Scala. Lean Scala is neither a new language nor a subset; it’s a programming style for Scala 3.
  • sotrace is a new tool for Linux developers that shows all the libraries your programs are linked to. It’s a great way to discover all of your supply chain dependencies. Try it; you’re likely to be surprised, particularly if you run it against a process ID rather than a binary executable.
  • Aider is a nice little tool that facilitates pair programming with GPT 3.5 or 4. It can edit the files in your Git repo, committing changes with a generated descriptive message.
  • Another new programming language: Vala. It is object-oriented, looks sort of like Java, compiles to native binaries, and can link to many C libraries.
  • Excellent advice from Anil Dash: make better documents. And along similar lines: write code that’s easy to read, from Gregor Hohpe.
  • According to Google, programmers working in Rust are roughly as effective as programmers working in Go and twice as effective as programmers working in C++.
  • Winglang is a programming language for DevOps; it represents a higher level of abstraction for deploying and managing applications in the cloud. It includes a complete toolchain for developers.
  • Keeping track of time has always been one of the most frustratingly complex parts of programming, particularly when you account for time zones. Now the Moon needs its own time zone—because, for relativistic reasons, time runs slightly faster there.
  • The Linux Foundation has started the Valkey project, which will fork the Redis database under an open source license. Redis is a widely used in-memory key-value database. Like Terraform and others, it was recently relicensed under terms that aren’t acceptable to the source community.
  • Redict is another fork of Redis, this time under the LGPL. It is distinct from Valkey, the fork launched by the Linux Foundation. Redict will focus on “stability and long-term maintenance” rather than innovation and new features.
  • “Ship it” culture is destructive. Take time to learn, understand, and document; it will pay off.

Security

  • GitHub allows a comment to specify a file that is automatically uploaded to the repository, with an automatically generated URL. While this feature is useful for bug reporting, it has been used by threat actors to insert malware into repos.
  • GPT-4 is capable of reading security advisories (CVEs) and exploiting the vulnerabilities. Other models don’t appear to have this ability, although the researchers haven’t yet been able to test Claude 3 and Gemini.
  • Users of the LastPass password manager have been targeted by relatively sophisticated phishing attacks. The attacks originated from the CryptoChameleon phishing toolkit.
  • Protobom is an open source tool that will make it easier for organizations to generate and use software bills of materials. Protobom was developed by the OpenSSF, CISA, and DHS.
  • Last month’s failed attack against xz Utils probably wasn’t an isolated incident. The OpenJS foundation has reported similar incidents, though they haven’t specified which projects were attacked.
  • System Package Data Exchange (previously known as Software Package Data Exchange 3.0) is a standard for tracking all supply chain dependencies, not just software. GitHub is integrating support to generate SPDX data from their dependency graphs.
  • A malicious PowerShell script that has been used in a number of attacks is believed to have been generated by an AI. (The tell is that the script has a comment for every line of code.) There will be more…
  • Kobold Letters is a new email vulnerability and is a real headache. A hostile agent can use CSS to modify an HTML-formatted email after it is delivered and depending on the context in which it is viewed.
  • AI can hallucinate package names when generating code. These nonexistent names often find their way into software. Therefore, after observing a hallucinated package name, it’s possible to create malware with that name and upload it into the appropriate repository. The malware will then be loaded by software referencing the now-existent package.

Web

Robotics

  • Boston Dynamics has revealed its new humanoid robot, a successor to Atlas. Unlike Atlas, which uses hydraulics heavily, the new robot is all electric and has joints that can move through 360 degrees.
  • A research robot now uses AI to generate facial expressions and respond appropriately to facial expressions in humans. It can even anticipate human expressions and act accordingly—for example, by smiling in anticipation of a human smile.

Quantum Computing

  • Has postquantum cryptography already been broken? We don’t know yet (nor do we have a working quantum computer). But a recent paper suggests some possible attacks against the current postquantum algorithms.
  • Microsoft and Quantinuum have succeeded in building error-corrected logical qubits: the error rate for logical qubits is lower than the error rate for uncorrected qubits. Although they can only create two logical qubits, this is a significant step forward.


Learn faster. Dig deeper. See farther.