Kids are Learning AI and Small Language Models are Making Inroads

Acceptable performance, lower costs and wider accessibility are fueling growth in SLMs; seen as overcoming problems with LLMs that limit their use for “high-stakes” use cases

Nov 08, 2024

By John P. Desmond, Editor, AI in Business

As this robot family diversifies with a smaller model, small language models provide a lower-cost and smaller footprint way to employ AI than their large language model progenitors. (Credit: Simon Harrison via Openverse)

Children are building small language models, perhaps portending a trend as they learn about AI and try it out.

“This new AI technology—it’s very interesting to learn how it works and understand it more,” stated 10-year-old Luca, a 10-year-old AI model maker, in a recent account in MIT Technology Review. He is trying out Little Language Models, a new application from Manuj and Shruti Dhariwal, two PhD researchers at MIT's Media Lab.

The kids work with dice on probability, which is how the weights in LLMs work, with results in the adult world mostly good but not perfect. The teacher creates a data set with hands of a different skin color. The probability can be set to 100 percent white, then in a computer animation, all white hands come up. Then the teacher asks the students to manipulate the weights to increase the percentage of other skin colors, producing hands of different colors.

*Shruti Dhariwal, PhD candidate, MIT Media Lab*

The use of Little Language Models makes AI into something that the students can “grasp what’s going on,” stated Helen Mastico, a middle school librarian in Quincy, Massachusetts, who worked with eighth grade students on the program.

The students are prescient, because the big kids are also playing with small language models.

Powerful LLMs have drawbacks. Training them requires an enormous amount of data, billions or trillions of parameters, making the training very resource-intensive, and the computational power and energy consumption needed “staggering,” according to a recent account in VentureBeat. The high costs make it difficult for smaller organizations to engage in LLM development. OpenAI CEO Sam Altman is quoted as stating at an MIT event last year that the cost of training GPT-4 was at least $100 million.

The complexity of LLMs also poses a steep learning curve for developers, with a long cycle time from training to building and deploying models. The VentureBeat account quoted a paper from the University of Cambridge showing that companies can spend 90 days or longer to deploy a single machine learning model.

And the propensity of LLMs to hallucinate, generating output that seems plausible but is not true, can limit them to use cases that can take the chance. Detecting and mitigating hallucinations is an ongoing challenge. “If you’re using this for a high-stakes problem, you don’t want to insult your customer, or get bad medical information, or use it to drive a car and take risks there. That’s still a problem,” stated Gary Marcus, cognitive scientist, member of the faculty at New York University, and author of the Marcus on AI newsletter on Substack.

SLMs Seen As Able to Address “99 percent” of Use Cases

Small language models (SLMs) are more streamlined versions of LLMs, with fewer parameters and simpler designs that require less data and training times. SLMs are suited for specific applications, with a more focused scope. They can be fine-tuned for specific domains or tasks, such as sentiment analysis or domain-specific question answering. The smaller codebase makes SLMs easier to secure and thus attractive for handling sensitive data, such as in healthcare or finance. The reduced computation requirements of SLMs make them more practical to run locally within on-premise servers, thus avoiding some cloud computing costs.

The CEO of HuggingFace, Clem Delangue, suggested in the VentureBeat account that up to 99 percent of use cases could be addressed using SLMs. As a result of a partnership between HuggingFace and Google announced earlier this year, the HuggingFace platform has been integrated into Vertex AI from Google. This gives developers the ability to quickly deploy models through the Google Vertex Model Garden.

In February, Google introduced Gemma, a series of small language models that can run on smartphones, tablets or laptops without special hardware or extensive optimization. CodeGemma is one example, a version focused on coding and mathematical reasoning.

Deploying SLMs at the edge enables real-time, personalized and security applications more practical for use in finance, healthcare and automotive systems, the VentureBeat authors suggest. “By processing data locally and reducing reliance on cloud infrastructure, edge computing with SLMs enables faster response times, improved data privacy, and enhanced user experiences,” the authors stated.

The initial Gemma models were available in 2-billion parameter and 7-billion parameter versions; Google announced a 27-billion parameter version in May. The Gemma models have been downloaded “millions of times” across services where it is available, stated Josh Woodward, VP of Google Labs, in an account in TechCrunch prior to the latest release. He noted that Google optimized the 27-billion model to run on Nvidia’s next-gen GPUs, a single Google Cloud TPU host and the managed Vertex AI service.

Apple and Microsoft have also recently announced SLMs. Apple introduced its Apple Intelligence models with some 3-billion parameters in June at its Worldwide Developers Conference; Microsoft released the Phi-3 family of SLMs with models ranging between 3.8 billion and 14 billion parameters, according to an account in IEEE Spectrum.

Tests were devised to evaluate how well a model understands language by feeding it questions about mathematics, philosophy, law and more. Microsoft’s Phi-3-small, with 7 billion parameters, performed better than OpenAI’s GPT-3.5 in many of the benchmarks, IEEE Spectrum reported.

Training On Higher-Quality Data Found To Improve Model Performance

Being trained on higher-quality data can also improve a model’s performance, found language model researcher Aaron Mueller of Northeastern University, Boston. The Microsoft Phi models were trained on “textbook-quality” data that has a more consistent style and is easier to learn from than the mess of text from the internet that LLMs typically rely on, he stated. Apple has also trained its SLMs on “richer and more complex datasets.”

*Aaron Mueller, postdoc researcher, Northeastern University*

SLMs can run on the resources available on smartphones and laptops, without going to the cloud. In March, Google rolled out Gemini Nano to its Pixel line of smartphones. The capabilities of this SLM include summaries of audio recordings and replies to conversations without an Internet connection, the IEEE Spectrum authors reported.

Northeastern’s Mueller is happy the smaller models are more affordable than the hefty licensing fees of the big cloud operators; he sees it opening up AI development to a more diverse group, many capable of producing specific applications.

The big model players are also facing some competition from open-source models, such as a new release called Molmo from the Allen Institute for AI (Ai2), a research nonprofit. The group reported Molmo’s performance can rival that of top models from OpenAI, Google and Anthropic, according to a September 25 account In MIT Technology Review.

The organization maintains that its biggest Malmo model, with 72-billion parameters, outperforms OpenAI’s GPT-4o, estimated to have over a trillion parameters, in tests measuring understanding of images, charts and documents.

Open-source AI development is now on a par with closed, proprietary models, maintains Ali Farhadi, the CEO of Ai2. Plus, open-source models have the advantage of allowing applications to be built on top of them.

The process of training LLMs on data sets of several trillion parameters consumed from the internet introduces a lot of noise to training data that can result in hallucinations, stated Ani Kembhavi, a senior director of research at Ai2. Molmo models have been trained on small, more curated data sets containing 600,000 images, and have between 1 billion and 72 billion parameters, The focus on high-quality data leads to good performance with fewer resources, Kembhavi maintains.

“We’re excited about enabling others and seeing what others would build with this,” stated Farhadi.

Read the source articles and information from MIT Technology Review, October 25 account, VentureBeat, TechCrunch, IEEE Spectrum and MIT Technology Review, September 25 account.

AI in Business

Kids are Learning AI and Small Language Models are Making Inroads

Acceptable performance, lower costs and wider accessibility are fueling growth in SLMs; seen as overcoming problems with LLMs that limit their use for “high-stakes” use cases

Discussion about this post