“Wide angle photo of a cat zombie, walking dead style, digital art”
Open Source is dead.
Long live Open Source Software.
“Wide angle photo of A cat wearing a king’s crown and a red cape, game of thrones style, iron throne, 3d digital art”
The Airius Risk Maturity Knowledgebase is intended to give you a snapshot of those things in the world affecting information risk for June, 2023.
The advent of artificial intelligence, and more specifically, Large Language Model (LLM) has changed how software is developed. These LLMs are as capable as the material that they are trained upon. As a result, LLMs have started to specialize, focusing on research, natural language, conversation, contracts and for this discussion, software development.
These models have used readily available data on the internet. They have also used structured datasources to aid in the learning and the indexing of data. As a result, mankind has access to the knowledge of the machines since they have become sentient using whatever can be found on the internet.
The problem lies with the use of everything accessible on the internet and whether training an LLM for private and commercial purposes constitutes “fair use”. We will discuss this in detail.
By Columbia Copyright Office – Obtained from the Library of Congress https://www.loc.gov/exhibits/bobhope/vaude.html
Transferred from en.wikipedia; transferred to Commons by User:Dichter using CommonsHelper., Public Domain, https://commons.wikimedia.org/w/index.php?curid=10858426
Note: 100% of the research of this project was done with the aid of Bing-GPT. Most of the images were generated with Bing’s version of Dall-E. All sources for research are cited in the references below.
Using technology to copy protected content (copyright or copyleft) and then use that inventory to allow customers to bypass existing license restrictions and earn money undermines the fair use argument. Using AI to bypass restrictive open source licenses is theft.
A large language model (LLM) is a type of artificial intelligence (AI) algorithm that uses deep learning techniques and massively large data sets to understand, summarize, generate and predict new content. It consists of a neural network with many parameters (typically billions of weights or more), trained on large quantities of unlabeled text using self-supervised learning or semi-supervised learning. LLMs emerged around 2018 and perform well at a wide variety of tasks.
Incomplete list of current LLM projects (there are easily hundreds of well developed projects)
Generative AI is a type of artificial intelligence (AI) system capable of generating text, images, or other media in response to prompts. Generative AI models learn the patterns and structure of their input training data, and then generate new data that has similar characteristics.
Generative AI builds on existing technologies, like large language models (LLMs) which are trained on large amounts of text and learn to predict the next word in a sentence. For example, “peanut butter and _” is more likely to be followed by “jelly” than “shoelace”. Generative AI can not only create new text but also images, videos, or audio.
Generative AI has potential applications across a wide range of industries, including art, writing, software development, healthcare, finance, gaming, marketing, and fashion. However, there are also concerns about the potential misuse of generative AI, such as in creating fake news or deepfakes which can be used to deceive or manipulate people.
Kate Downing explained in her blog:
“The crux of the USCO’s refusal to recognize any copyright interest in the images rests on the idea that Midjourney’s output is unpredictable and that the prompts users provide to it are mere suggestions, with too much “distance between what a user may direct Midjourney to create and the visual material Midjourney actually produces” such that “users lack sufficient control over generated images to be treated as the “mastermind” behind them.” Repeatedly, the USCO seems to argue that the final result has to reflect the artist’s “own original conception,” even going so far as to argue that the “process is not controlled by the user because it is not possible to predict what Midjourney will create ahead of time.”
The ownership of code generated by AI tools like GitHub Copilot is a topic of active debate and legal dispute. There have been lawsuits filed against Microsoft, GitHub and OpenAI alleging that the creation of AI-powered coding assistant GitHub Copilot relies on “software piracy on an unprecedented scale”. The key question in the lawsuit is whether open-source code can be reproduced by AI without attached licenses.
According to GitHub, the suggestions generated by Copilot and the code you write with its help belong to you and you are responsible for it. However, there have been instances where Copilot has been found to regurgitate long sections of licensed code without providing credit.
It’s a complex issue and the legal landscape is still evolving. I would recommend consulting with a lawyer for more specific information on this topic.
The question of whether works created by generative AI can be copyrighted is a complex one and the legal landscape around this issue is still evolving. According to the U.S. Copyright Office, there is no copyright protection for works created by non-humans, including machines⁴. However, some argue that AI-generated works should be eligible for copyright protection because they are the product of complex algorithms and programming.
Fair use is a legal doctrine that allows for the use of copyrighted material without permission under certain circumstances. It permits a party to use a copyrighted work without the copyright owner’s permission for purposes such as criticism, comment, news reporting, teaching, scholarship, or research.
There are four factors that must be considered in deciding whether a use constitutes fair use: the purpose and character of the use, the nature of the copyrighted work, the amount and substantiality of the portion used in relation to the copyrighted work as a whole, and the effect of the use upon the potential market for or value of the copyrighted work.
The four factors that must be considered in deciding whether a use constitutes fair use are:
For the reasons quickly outlined above, AI engines are not using research samplings of code in order to learn how code works. They grabbed ALL code, and offer a convenient interface to that code. They offer a way for users to mistakenly bypass license obligations and solve code challenges. For a fee, customers get access to a stolen inventory of code offered by Github and Microsoft.
We are amazed at the number of submissions we have gotten to date, but even more so, we are incredibly grateful to over 150 core contributors who have devoted their time and resources to helping us provide up-to-date information. Send your stories and announcements to email@example.com.
The Risk Maturity Knowledgebase restarts an effort that we began in 2007. With hundreds of volunteers, interns and staff members at the time, along with over 60 weekly translations, our predecessor became the standard for GPL and open source security information.
The Risk Maturity Knowledgebase restarts an effort that we began in 2007. With hundreds of volunteers, interns and staff members at the time, along with over 60 weekly translations, our predecessor became the standard for GPL and open source security information. Can you translate the blog? Please reach out.
If we can help you with risk management, SOC reporting, an emergency or you just need guidance with INFOSEC or IP issues, please reach out to us.
At Airius, we depend on our friends at A-Lign to provide auditors and experience with the SOC reporting and auditing process. We work closely with companies to get them through it.