California requires AI companies to disclose training data in 2026

California has passed Assembly Bill 2013, requiring generative AI developers to publicly disclose their training data starting January 1, 2026. The Generative Artificial Intelligence Training Data Transparency Act represents one of the most comprehensive U.S. rules on AI disclosure, potentially strengthening copyright lawsuits while raising compliance burdens for companies operating in the state.

What you should know: The law mandates detailed public disclosures about datasets used to train AI models, including sources, availability, size, and whether copyrighted or personal data are included.

Developers must publish information on their websites about data sources, whether datasets are publicly available or proprietary, their size and type, and the time period during which data was collected.
Bloomberg Law described AB 2013 as among the most comprehensive U.S. rules on AI disclosure, requiring companies to publish details about the data that trains their models.
Compliance presents significant challenges, particularly for models that have evolved over time using data from diverse sources that may lack clear ownership records or licensing information.

Why this matters: The disclosure requirements could make it easier to trace which datasets were used in training, potentially strengthening claims from copyright holders in ongoing litigation.

Generative AI firms are already navigating lawsuits alleging that models were trained on copyrighted works without permission.
Researchers argue that transparency could provide a foundation for independent audits and risk assessments of AI systems.
California’s regulatory approach often shapes national technology policy, from privacy rules to emissions standards, giving this law significance beyond state borders.

Industry pushback: Business and technology executives are expressing concerns about the law’s potential impact on innovation and development.

According to The Wall Street Journal, executives warned the bill could have a “chilling effect” on development in California, with startups particularly exposed to compliance burdens.
Some analysts argue California’s targeted strategy may prove more durable than broader regulatory approaches.
Microsoft’s Chief Scientist Eric Horvitz offered a contrasting view, suggesting that oversight “done properly” can accelerate AI advances by encouraging responsible data use and building public trust.

The big picture: California’s law signals that AI transparency may transition from voluntary best practice to mandatory requirement across industries.

The broader policy debate centers on whether transparency alone will be sufficient for AI governance.
Colorado has delayed its AI act implementation to June 2026, while financial institutions are independently moving toward clearer safeguards and responsible scaling practices.
If the disclosure requirements prove workable, other states could follow suit, potentially creating a national standard for AI transparency.

California requires AI companies to disclose training data in 2026

Recent Stories

DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment

Tying it all together: Credo’s purple cables power the $4B AI data center boom

Vatican launches Latin American AI network for human development