
Discover more from AI in Business
Data Licensing Tips From an IP Lawyer, from a Marketing VP and from an AI Developer
When licensing data needed to train machine learning algorithms, marketers and AI developers need to be on top of contracts that spell out rights and restrictions for acceptable use of the data.
By John P. Desmond, Editor, AI in Business

Best practices in data licensing tend to cover similar territory from the point of view of a lawyer, of a marketer or an AI developer.
Data is the fuel of AI software development. Historical data and relevant data is assembled to train algorithms for machine learning models. The privacy and regulatory restrictions on the intellectual property of the supplier of the data factor into the use rights.
“In the United States, contracts are the most important source of rights and restrictions for data use, laying out between companies what is acceptable data use and what isn’t,” stated Anna Remis, a partner in the Technology and IP Transactions practice at Sidley Austin in San Francisco, in a recent account from Law.com.
“If you use third-party data, the question of whether a data owner could assert rights in your technology always requires a review of contractual restrictions. Getting clear contractual permissions is a best practice,” Remis stated.
She had the following tips for developers of technology product and platforms who plan to license data:
Carefully define your contract. Note the technology each company is contributing and who owns improvements and derivative works. A third party dataset could be considered intellectual property. A developer needs to know whether a developed product is considered an improvement or a joint development.
Check obligations to share insights back to the data licensor. “Data is a valuable form of currency, and these insights may be part of the price you pay for the data,” Remis stated. Any exclusivity required could encumber the technology.
Negotiate. To get rights to use data beyond the immediate services offered, developers need to negotiate the terms. If for example data is licensed for a pilot program, can the data be used beyond the pilot?
Demand For Data Seen Increasing As Google Shuts Down Cookies
Marketers who have relied on the tracking power of cookies via Google’s Chrome browser need to make other plans, since Google is phasing out third-party cookies on Chrome browsers this year. White it seemed possible that Google might develop alternate ways to track consumer identifiers, Google announced last fall that it would not be the case.
“We don’t believe these solutions will meet rising consumer expectations for privacy, nor will they stand up to rapidly evolving regulatory restrictions,” Google stated in a blog post in March 2021.
In describing “first-party relationships,” the Google post written by David Temkin, Director of Product Management, Ads Privacy and Trust for Google, stated, “Developing strong relationships with customers has always been critical for brands to build a successful business, and this becomes even more vital in a privacy-first world.”
The impact on marketers will be substantial. In 2019, Google Chrome made up more than 56 percent of the web browser market, and Chrome accounted or more than half of all global web traffic, according to Statista.
“Having access to only authenticated traffic has clients missing out on nearly 90 percent of consumers,” stated Keelia Schumacher, Senior Director of Sales with Data Axle, a provider of data and marketing services, in a recent account on the company’s blog. She suggested enhancing first-party data and trying multiple, hashed emails per individual, to help ensure better targeting in a ”consumer-first advertising world.”
When selecting data sources, marketers need to consider data coverage, data privacy and simplicity of integration, suggested Doug Parsonage, VP of business development for Data Axle, in the same blog post. The data provider would ideally have a diverse set of sources to draw from, since “there is no single registry to lean on,” he stated.
Also, marketers need to carefully evaluate data quality and choose data partners with “exhaustive processes to verify the information they collect,” he stated.
Open Source Permissive Licenses Trending Up
For developers building AI systems incorporating open source code, look to companies with “permissive licenses” that place minimal limitations on the users, suggested Patricia Johnson, open source licensing and security expert with WhiteSource, in a recent blog post. WhiteSource offers tools for software composition analysis.
“Creators attach permissive licenses to their open source projects because they want to reach as wide an audience as possible,” Johnson stated. The alternative license, called “copyleft,” requiring the release of derivative works to the open source community, is falling out of favor.
WhiteSource collects information from its database of more than four million open source packages and 130 million open source files in 200 programming languages.
The primary permissive licenses – the MIT License and the Apache License 2.0 – are increasing in use, while the primary copyleft licenses - GNU series and Mozilla – are declining in use, WhiteSource has found.
“The tension between creating a viable business model and maintaining a robust and successful open source project continues to grow,” Johnson stated. “We will continue to see open source projects struggling to find the balance between making a profit and being supportive members of the open source community.”
Read the source articles and information in Law.com, in a March 2021 blog post from Google, in a blog post from Data Axle and in a blog post from WhiteSource.
(Write to the editor here.)