Open data: tech companies seek better licences

November 28 2019

With the growing popularity of data development in the open source space, counsel at Red Hat, Microsoft and Benevolent AI say a mechanism is needed to enable better collaboration

Tech companies have embraced open data to spur product and services development – particularly for those driven by hybrid cloud and artificial intelligence solutions – but they now need better licences to enable them to protect IP when they start and collaborate on different data projects.

Companies such as IBM are driving an AI and cloud-computing ‘revolution’ but need copious amounts of data to train and develop those technologies at speed. Open data, where firms access and develop data sets made available online, has become an attractive means to achieve that goal.

The concept is becoming particularly popular among driverless car developers. Waymo, for example, recently released its first data set to researchers.

But in-house counsel at firms such as Red Hat, Zoox, Benevolent AI and Microsoft – which are also investing heavily in machine learning and cloud computing – say that because open data is so new, they are still trying to figure out how their businesses can best work on various projects.

Open data initiatives are often covered by licences that, like open source software licensing, are not truly ‘open’ and can set out either fairly permissive or restrictive terms of use.

Businesses must therefore conduct due diligence to ensure they are only held to acceptable terms for a particular project, and cover their own data sets with licences that protect company interests and IP while encouraging further contributions from other parties.

“We focus a lot on open source, and open data sharing has come up a lot recently in the driverless car industry,” says Chris Nalevanko, general counsel and head of IP strategy at Zoox in California. “We want to make sure we are participating in the right things but also being guarded as to our core IP.”

Gareth Jones, vice president of IP at Benevolent AI in London, adds that ‘open data’ does not carry a single definition. “We use data from a huge variety of different sources, some of which is proprietary commercial data we pay for [to have] a licence to, and some of which is freely available – and all kinds in between along that spectrum,” he says.

Patrick McBride, senior director of IP at Red Hat in North Carolina, says that his business is still trying to figure out the open data landscape in much the same way as it was the open source world years ago.

He says that not only does open data help enable hybrid cloud solutions – technology that Red Hat and its parent company IBM are pursuing with great energy – it also allows businesses to take advantage of some of the advances in AI that are happening.

He adds that he is now trying to understand the role open data will play in Red Hat’s overall hybrid cloud strategy and how it will enable the business to continue to provide value to its customers and subscribers.

Data’s next top model

The problem is that many open data licences are difficult to understand, vary considerably between different organisations or projects and are often not fit for purpose in modern projects.

Several organisations, including Microsoft and Linux, are now making efforts to standardise data licensing to make it easier to understand. In-house sources say they are keeping an eye on what these organisations are doing and hope to apply them to their own work.

Erich Andersen, chief IP counsel at Microsoft in the US, says the company found too many examples of data use licences when it looked at the landscape and that there were gaps in those existing licences. It responded by launching three potential licences on GitHub for community feedback.

“They are designed with an eye of standardising terms for the most common data uses. That is an initiative we will continue,” says Andersen.

Jones says Microsoft and Linux’s efforts are a step in the right direction. The Open Knowledge Foundation set up some open data licences a few years ago, but those are now less fit for purpose because of the way people are using data now in AI and machine learning models.

“Data licensing is incredibly complicated and time consuming, and a lot of the issues surround whether different organisations – particularly SMEs and academics institutions – have the resources available to help them understand it.

“Sometimes it is just a matter of ambiguity in these licences. If you have lots of people writing their own licences, suddenly everyone is using a different language and they are not always clear, particularly when the people providing the data do not understand the new technologies.”

McBride at Red Hat adds: “It is fairly obvious that one important thing to figure out is some mechanism for collaborating on the data itself and the AI products of that data – that is, an AI that knows how to react to certain situations and address what to look for in a particular data set when presented with new data because it has been trained on open data.

“There are efforts out there to come up with template licences for different environments. We are looking at those with interest and trying to work out which would work best with ours.”

Some say that data is the oil of the 21st century – everything runs on it in one form or another, and no one can get enough of it. Open data is the way forward, but the licensing infrastructure isn’t quite good enough to encourage wide-scale use just yet. Once it is, open data will offer a deep well of opportunity for tech companies.