-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Description
Introduction
With the rapid growth of artificial intelligence, and especially machine learning models that train on web data, the issues that
- these models themself train on (poorly) generated data over and over again,
- Users don't know whether the content is generated or not,
- and Search Engines cannot decide the quality of content,
arise.
Currently, there is no standard way for website owners to express that AI models (partly) generated their content. This proposal seeks to address this issue by introducing a new HTML meta tag called ai-generated.
The Proposed Solution
I propose the introduction of an HTML meta tag named ai-generated. This tag would have a content attribute with the following possible values:
all
: The whole main content was generated by AIpartially
: The content was co-authored by AInone
: none of the content was generated by AIunknown
(internal value?): it is unknown whether the content was generated. This value should be assumed in case of an absence of the meta tag
The tag would appear in the <head>
of an HTML document. For example:
<meta name="ai-generated" content="partially">
Use Cases
Below are some examples of when the ai-generated
meta tag could be used:
1. Let search engines know the content was (partially) generated by AI
Websites use AI-generated content in different ways. In the future, search engines might be aware that the content was generated by AI (because they generated it themselves), and not providing the meta tag would automatically de-rank those websites.
2. Let users know the content was (partially) generated by AI
When browsers see this meta tag, they could visually indicate that parts of the website were authored by AI, telling the user to treat the information with caution.
3. Let AI know that this content was generated by AI
AI should be aware that the following content was already generated, and thus, the information might be flawed.
Examples
Below are examples of how to use the ai-generated
meta tag:
1. The whole (main) content was generated by AI (e.g., the main chunk of text content)
<meta name="ai-generated" content="all">
2. Only parts of the content were generated by AI
<meta name="ai-generated" content="partially">
3. Nothing on this website was generated by AI
<meta name="ai-generated" content="none">
Existing Solutions
We have two existing tags that could solve this problem, but we would have to standardize the use:
1. Meta Generator
<meta name="generator" content="Chat-GPT">
The meta generator tag indicates that the structure
of the document has been generated. In my opinion, this is good enough but solves a different problem. It could, however, actually be used to indicate that the structure of a website was generated by AI.
2. Meta Author
This tag is more interesting as it does exactly what was proposed. But its use would have to be standardized in order to be useful:
The content was fully created by AI:
<meta name="author" content="AI">
The content was co-authored by AI:
<meta name="author" content="Me, AI">
The content was not created by AI:
<meta name="author" content="Me">
In my opinion, having a dedicated meta tag for ai-generated
is the better solution.
Other considerations
1. Why should an author use the tag?
Authors need incentives to use this tag. First of all, they contribute to the quality of AI-generated content, as AI might not pick up content that had been generated. Second, we have to be able to identify the content that was generated. Adobe already tries this with Firefly, but we also need a mechanism for written content. So, in the future, Search Engines and other relevant players might punish content that was generated and doesn't explicitly state so.
2. Schema Org
We could move the whole issue to Schema Org and call it a day. E.g., by proposing the ai-generated
attribute to them, users could indicate whether articles etc. were generated.
3. How to show which parts of content were generated by AI?
This is an unsolved problem. I am not a friend of creating a new attribute or even new tags, but currently, this might be the only way to solve it:
<span ai-generated="true">Foo</span>
Of course, this would indeed be easier if we just used the schema org solution. Or maybe a combination.
Conclusion
The proposed ai-generated
meta tag provides a standard method for website owners to express that their content was (partially) generated by AI. It would promote transparency and respect for website users, contributing to a more ethical web environment for AI.
How to declare which parts of the website are generated remains unresolved and open to discussion.
Other
I copied some of the text from this issue which proposed the ai-consent
meta tag, as they were very similar. #9334