Large Language Models are Increasing in Size, With No End in Sight
As GPT-3 was superseded in 2021 by a number of larger language models, efforts arose to study the impact of LMMs, out of concern their use poses unknown risks.
By John P. Desmond, Editor, AI in Business
Large language models (LLMs) are getting bigger.
Open AI released the GPT-3 large language model in June 2020, the largest language model ever built at the time. In 2021, it was superseded in size by multiple models.
Developers of AI systems are interested in testing how GPT-3 can help them meet business objectives. Among its advantages for developers is the ability to generalize across language tasks it had not been specifically trained on, because of its sheer size.
“We thought we needed a new idea, but we got there just by scale,” stated Jared Kaplan, a researcher at OpenAI and one of the designers of GPT-3, during a panel discussion in December at the NeurIPS AI conference, according to an account from the MIT Technology Review.
A pair of Microsoft researchers were quoted as stating, “We continue to see hyperscaling of AI models leading to better performance, with seemingly no end in sight,” in a blog post announcing the company's Megatron-Turing NLG model built in collaboration with the GPU chipmaker Nvidia.
The size of a model, which is a trained neural network, is measured by the number of parameters it has. These values are tuned over and over again as the model is trained, in pursuit of more accurate predictions. In general, the more parameters a model has, the more information it can glean from training data and thus the more accurate its predictions based on fresh data will be.
Competing Large Language Models
GPT-3 has 175 billion parameters, 10 times more than its predecessor GPT-2, according to the MIT Technology Review account. In 2021, US startup AI21 Labs in September released Jurassic-1, a large language model with 178 billion parameters. DeepMind in December released Gopher, a new model with 280 billion parameters. The Megatron-Turing NLG model has 530 billion parameters, and Google’s Switch-Transformer and GLaM models have one and 1.2 trillion parameters, respectively.
In China, tech giant Huawei last year released PanGu, a 200-billion parameter language model. The Beijing Academy of AI announced Wu Dao 2.0, with 1.75 trillion parameters. And the South Korean internet search firm Naver announced HyperCLOVA, a model with 204 billion parameters.
Many unknowns surround a more widespread use of LMMs. Their potential for bias was flagged by ex-Google ethicist Timnet Gebru. Google seemed uncomfortable with the public discussion of the issue; Gebru and her superior at Google, Margaret Mitchell, have both since departed the company. Their concerns remain topical. The original GPT-3 team acknowledged in a paper that, “Internet-trained models have internet-scale biases,” according to the MIT Technology Review account.
BigScience Initiative Researching Impact of LMMs
The concern about the impact of large language models has led to formation of a consortium called the BigScience initiative, initiated by AI company Hugging Face. Some 600 researchers from 50 countries and more than 250 institutions are volunteering time to help build and study an open source language model as a shared resource. The group is trying to answer the question of how and when LLMs should be developed and deployed to result in benefits and without harmful consequences, according to another MIT Technology Review account.
“We can’t really stop this craziness around large language models, where everybody wants to train them,” stated Thomas Wolf, the chief science officer at Hugging Face, co-leader of the initiative. “But what we can do is try to nudge this in a direction that is in the end more beneficial.”
Hugging Face describes itself as “on a journey to advance and democratize NLP [natural language processing] for everyone,” contributing technology in the process. Over 5,000 organizations are using its models, which are offered free on the site, along with inference APIs to help plug into the infrastructure, and “transformers” in the company’s natural language processing library. The About page stated, “We are on a mission to democratize good machine learning, one commit at a time.”
The French government has given the consortium a grant to fund use of its Jean Zay supercomputer for the development effort. Jean Zay is a converged platform overseen by the French Ministry of Higher Education, Research and Innovation, located near Paris, France.
Gebru spoke at NeurIPS 2021, giving a talk entitled “Beyond Fairness in Machine Learning,” She suggests that if fake news, hate speech and death threats are not moderated out of the data being scraped to create the LMM, they potentially parrot it back. Gebru has called LMMs “stochastic parrots.”
The consortium has formed a dozen working groups. One is focused on developing responsible ways to source training data, with the goal of avoiding toxic language and focusing on permission-based data sharing.
The project will run through May 2022, with the researchers hoping to then offer tools and best practices for building and deploying LMMs in a responsible way.
“NLP is at a very important turning point,” stated Karën Fort, an associate professor at the Sorbonne, or the University of Paris. The BigScience project “allows the community to push the research forward and provide a hopeful alternative to the status quo within industry: “
In addition to Hugging Face, the consortium is being bootstrapped by: GENCI, which coordinates French high performance computing centers and is 49 percent owned by the French state; and IDRIS {the Institute for Development and Resources in Intensive Scientific Computing), a center of excellence in intensive numerical calculations.
DeepMind Researching Ethical and Social Risks of LMMs
The Google DeepMind unit in December 2021 released several research papers related to LLMs. The authors are Jack Rae, Geoffrey Irving and Laura Weidinger, all researchers at DeepMind. In a blog post, the authors described their rationale for releasing the papers together. It stated in part, “As part of a broader portfolio of AI research, we believe the development and study of more powerful language models – systems that predict and generate text – have tremendous potential for building advanced AI systems that can be used safely and efficiently to summarize information, provide expert advice and follow instructions via natural language. “
The authors add, “Developing beneficial language models requires research into their potential impacts, including the risks they pose. This includes collaboration between experts from varied backgrounds to thoughtfully anticipate and address the challenges that training algorithms on existing datasets can create.”
Among the papers is a discussion of the ethical and social risks from LMMs. The authors present a taxonomy of risks related to language models, categorized into six areas. They are: discrimination, exclucions and toxicity; information hazards; misinformation harms; malicious uses; human-computer interaction harms; and automation, access and environmental harms.
The authors proposed that the taxonomy “serves as a foundation for experts and wider public discourse to build a shared overview of ethical and social considerations on language models, make responsible decisions, and exchange approaches to dealing with the identified risks.”
The authors cite two areas in particular that require more work. First, current benchmarking tools are not sufficient for assessing some important risks, for example, when language models output misinformation and people trust it to be true; Second, “more work is needed on risk mitigations.” For example, LLMs “are known to reproduce harmful social stereotypes, but research on this problem is still in early stages.”
The challenges were outlined in another DeepMind blog post entitled, Challenges in Detoxifying Language Models.
Learn more at MIT Technology Review and in a Google DeepMind blog post.