Skip to content

Proposal: Meta Tag for AI Generated Content #9479

@evayde

Description

@evayde

Introduction

With the rapid growth of artificial intelligence, and especially machine learning models that train on web data, the issues that

  • these models themself train on (poorly) generated data over and over again,
  • Users don't know whether the content is generated or not,
  • and Search Engines cannot decide the quality of content,

arise.

Currently, there is no standard way for website owners to express that AI models (partly) generated their content. This proposal seeks to address this issue by introducing a new HTML meta tag called ai-generated.

The Proposed Solution

I propose the introduction of an HTML meta tag named ai-generated. This tag would have a content attribute with the following possible values:

  • all: The whole main content was generated by AI
  • partially: The content was co-authored by AI
  • none: none of the content was generated by AI
  • unknown (internal value?): it is unknown whether the content was generated. This value should be assumed in case of an absence of the meta tag

The tag would appear in the <head> of an HTML document. For example:

<meta name="ai-generated" content="partially">

Use Cases

Below are some examples of when the ai-generated meta tag could be used:

1. Let search engines know the content was (partially) generated by AI

Websites use AI-generated content in different ways. In the future, search engines might be aware that the content was generated by AI (because they generated it themselves), and not providing the meta tag would automatically de-rank those websites.

2. Let users know the content was (partially) generated by AI

When browsers see this meta tag, they could visually indicate that parts of the website were authored by AI, telling the user to treat the information with caution.

3. Let AI know that this content was generated by AI

AI should be aware that the following content was already generated, and thus, the information might be flawed.

Examples

Below are examples of how to use the ai-generated meta tag:

1. The whole (main) content was generated by AI (e.g., the main chunk of text content)
<meta name="ai-generated" content="all">

2. Only parts of the content were generated by AI
<meta name="ai-generated" content="partially">

3. Nothing on this website was generated by AI
<meta name="ai-generated" content="none">

Existing Solutions

We have two existing tags that could solve this problem, but we would have to standardize the use:

1. Meta Generator
<meta name="generator" content="Chat-GPT">

The meta generator tag indicates that the structure of the document has been generated. In my opinion, this is good enough but solves a different problem. It could, however, actually be used to indicate that the structure of a website was generated by AI.

2. Meta Author
This tag is more interesting as it does exactly what was proposed. But its use would have to be standardized in order to be useful:

The content was fully created by AI:
<meta name="author" content="AI">

The content was co-authored by AI:
<meta name="author" content="Me, AI">

The content was not created by AI:
<meta name="author" content="Me">

In my opinion, having a dedicated meta tag for ai-generated is the better solution.

Other considerations

1. Why should an author use the tag?
Authors need incentives to use this tag. First of all, they contribute to the quality of AI-generated content, as AI might not pick up content that had been generated. Second, we have to be able to identify the content that was generated. Adobe already tries this with Firefly, but we also need a mechanism for written content. So, in the future, Search Engines and other relevant players might punish content that was generated and doesn't explicitly state so.

2. Schema Org
We could move the whole issue to Schema Org and call it a day. E.g., by proposing the ai-generated attribute to them, users could indicate whether articles etc. were generated.

3. How to show which parts of content were generated by AI?
This is an unsolved problem. I am not a friend of creating a new attribute or even new tags, but currently, this might be the only way to solve it:

<span ai-generated="true">Foo</span>

Of course, this would indeed be easier if we just used the schema org solution. Or maybe a combination.

Conclusion

The proposed ai-generated meta tag provides a standard method for website owners to express that their content was (partially) generated by AI. It would promote transparency and respect for website users, contributing to a more ethical web environment for AI.

How to declare which parts of the website are generated remains unresolved and open to discussion.

Other

I copied some of the text from this issue which proposed the ai-consent meta tag, as they were very similar. #9334

Metadata

Metadata

Assignees

No one assigned

    Labels

    addition/proposalNew features or enhancementsneeds implementer interestMoving the issue forward requires implementers to express interest

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions