New Research Shows Digitization Results In Routine Lock-Down Of Public Domain Books

from the what-about-our-rights? dept

The public domain is supposed to be what we receive in return for, and after the expiry of, time-limited, government-backed intellectual monopolies that are granted to creators. As Mike noted recently, that neat equation does not reflect today’s reality for copyright, where the situation is so complicated that it requires a 52-page handbook to determine whether or not something is in the public domain.

But the situation is actually far worse than that, because the public is being denied access to many works that are unambiguously in the public domain because of new restrictions being placed on them when they are digitized. That’s something that Techdirt has discussed before, but such stories have been largely anecdotal. Research from New Zealand provides us with more detailed information of what’s going on:

In order to establish the extent to which digitized public domain books are being restricted, a sample of 100 pre-1890 books was selected from the New Zealand National Bibliography (NZNB). This sample was chosen on the assumption that these works had entered the public domain under New Zealand copyright law. Each book in the sample was searched for within six online repositories: Google Books, Hathi Trust, Internet Archive, Early New Zealand Books (ENZB), New Zealand Electronic Text Collection (NZETC) and Project Gutenberg. In addition, Google and Bing searches were conducted for all sample books that could not be located within these repositories.

Here’s what the researchers discovered:

The findings of this research suggest that a high proportion of digitized public domain books are being restricted by online repositories. Out of a sample of 100 public domain books, only three are hosted by repositories that do not impose any form of usage restriction. Furthermore, 48 percent (24) of all digitized books [50 out of the 100 public domain sample] are hosted by a repository that restricts or blocks access, with the most restrictive repository limiting or blocking access to 91 percent (21) of sample books within its collection.

They also managed to pinpoint the key problem:

Almost all access restrictions applied to public domain books within the sample were the result of repositories using a process of estimation to assess copyright status. Within the sample, a one-minute search located accurate biographical information about authors two-thirds of the time. This task takes a fraction of the time required to digitize a book, which involves 30 minutes to scan 500 pages (Kelly, 2006).

A solution is the following:

Digitizers should incorporate the sourcing of copyright information within the overall process of digitization, and copyright estimation should only be used as an option of last resort. Furthermore, copyright estimation periods should better reflect statistical norms regarding the actual duration of copyright protection. The current estimation period of 140 years, used by Google Books and Hathi Trust, is far too conservative. If hosted under this policy, 47 percent of sample books would be restricted. This is despite the fact that all books with locatable biographical information were confirmed as being in the public domain for between 30 and 132 years.

This goes back to the problem of determining whether a work is in the public domain or not. Because that can be complex, those carrying out the digitization of works simply assume the worst, just to be on the safe side. That’s something that needs to change, otherwise we risk losing not just the benefits of digitized public domain works, but also our undoubted rights to access them freely.

Follow me @glynmoody on Twitter or identi.ca, and +glynmoody on Google+

Filed Under: , , , , , , ,

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “New Research Shows Digitization Results In Routine Lock-Down Of Public Domain Books”

Subscribe: RSS Leave a comment
24 Comments
Anonymous Coward says:

Why don’t people denounce copyright law?

Congress has the power to create copyright laws, not the responsibility. The laws aren’t necessary, effective, or proportionate. Enforcing them requires reduction of the common carrier principle and mass monitoring of who is doing what online.

I’d be curious to see a comparison of the percentage of voters who want marijuana legalized vs who want copyright law reformed. If people can spin that as pragmatic, it says something about our society when we can’t spin something that affects more than our private lives as pragmatic enough to get off our asses and tend to.

Anonymous Coward says:

Re: Re: Congress?

That’s actually part of the problem. Do the websites in the study need to conform to the copyright legislation of the country they are hosted in, or the country the requests come from, or both, or take the worst case scenario from around the world just to be safe?

I suspect host law is more likely to be involved than destination law, which means that the article isn’t, in fact, about New Zealand (law)… or perhaps it is for those sites which have local hosts in New Zealand (Google?).

See also https://www.techdirt.com/articles/20131231/23434825735/grinch-who-stole-public-domain.shtml

Anonymous Coward says:

As Mike noted recently, that neat equation does not reflect today’s reality for copyright, where the situation is so complicated that it requires a 52-page handbook to determine whether or not something is in the public domain.

This argument is so stupid. Neither you nor Mike actually try to figure out the public domain status of a given. If you did, you’d see how simple it is to do. You don’t need all 52 pages for one work.

But the situation is actually far worse than that, because the public is being denied access to many works that are unambiguously in the public domain because of new restrictions being placed on them when they are digitized.

Even if a work is in the public domain, it can be locked up behind any paywall the owner of the COPY wants. Another stupid argument.

This goes back to the problem of determining whether a work is in the public domain or not. Because that can be complex, those carrying out the digitization of works simply assume the worst, just to be on the safe side.

Again, rather than alarmist bullshit, why don’t you walk us through the determination of the public domain status of a given work. The handbook is simple to apply. They even released an 8-page flow chart version, and you only need one page for a given work. One page.

That’s something that needs to change, otherwise we risk losing not just the benefits of digitized public domain works, but also our undoubted rights to access them freely.

“Undoubted rights”?? That’s hilarious. If I have a copy of a public domain work on my bookshelf or on my server, you have ZERO rights to access it. Terrible argument, Glyn.

G Thompson (profile) says:

Re: Re: Re: Re:

Ok I’ll bite.

My Brilliant Career (1901) – Miles Franklin, died 1954
Animal Farm (1945) – George Orwell, died 1950
The Great Gatsby (1925) – F Scott Fitzgerald, died 1940
Tender is the Night (1933) – F Scott Fitzgerald, died 1940
Lady Chatterley’s Lover (1928) – D H Lawrence, died 1930
Gone with the Wind (1936) – Margaret Mitchell, died 1949
Between the Acts (1941) – Virginia Woolf, died 1941

All were published with copyright notices except for first which had copyright at time of creation under blanket copyright structures.

whether Renewed or not is irrelevant to the above due to the dates of death

So come on.. you are so knowledgeable and have decided that you can determine copyright in a simplistic flowchart. Have a go at them, should be easy. Oh and remember the answer should be contextually based upon the article above too.

Sheogorath (profile) says:

Re: Re: Re:2 Re:

I can answer this one from a UK perspective:
My Brilliant Career (1901) – Miles Franklin,
died 1954 Under copyright until 2025
Animal Farm (1945) – George Orwell, died 1950 Under copyright until 2021
The Great Gatsby (1925) – F Scott Fitzgerald,
died 1940 Public Domain since 2011
Tender is the Night (1933) – F Scott Fitzgerald,
died 1940 Public Domain since 2011
Lady Chatterley’s Lover (1928) – D H
Lawrence, died 1930 Public Domain from 1981-1996 then since 2001
Gone with the Wind (1936) – Margaret Mitchell,
died 1949 Under copyright until 2020
Between the Acts (1941) – Virginia Woolf, died 1941 Public Domain

PaulT (profile) says:

Re: Re: Re: Re:

Missed the point, didn’t you? Even if you’re given specifics, you still have to follow the steps contained in those 52 pages to determine copyright status. When the answer should actually just be “if the work is over X years old, it’s public domain”. Or, preferably “has the author got a current registration on file?”.

Anonymous Coward says:

Re: Re:

Since I disagree with the entire concept of copyright, I could care less how difficult it is to figure out.

But I agree that a company is not obligated to make their own copies of public domain works freely available to the public.

As long as no one gets any crazy ideas that there are any restrictions on what anyone can do, once they have access through a paywall or whatever, with the copies that appear on their own devices.

Anonymous Coward says:

Re: Re:

This argument is so stupid. Neither you nor Mike actually try to figure out the public domain status of a given. If you did, you’d see how simple it is to do. You don’t need all 52 pages for one work.

That only applies when someone has read and understood the implications of all 52 pages. Until they have done that the cannot answer the question, do any other pages in the book change anything I have read so far.

hutcheson says:

The rules (in the U.S.) are indeed horrifically complex, and include such facts as author’s citizenship at the time of creation (and the copyright laws in that jurisdiction), author date of death, location and date (including month) of first publication anywhere, location and date (including month) of first U.S. publication … and, as impossible as most of this is to find[*] there are additional, even-more-obscure details mentioned in the Stanford SUMMARY of copyright law that could impact the result.

How can you call something intellectual PROPERTY if nobody can know who it belongs to?

How can you call something INTELLECTUAL property if most of it, is, well, FORGOTTEN?

[*]Yes, I’m speaking from experience, researching a book by a citizen of the Austro-Hungarian empire who came to the U.S. as a teenager and remained there the rest of his life. How am I as a U.S. citizen supposed to know what the Austro-Hungarian empire’s copyright laws were–since the Empire didn’t exist or even have a unique successor on the date the book was written! And how can I know whether/when someone became a U.S. citizen?

1st Dread Pirate Roberts (profile) says:

Har!

Prior to copyright enactment in England, authors had full control of works, essentially forever. Copyright law was intended to force works into the public domain. If you wanted a continuing income stream, you needed to produce new works. You were granted a limited period during which to earn income from your works.

Copyright has been turned on its head. Thanks to that %$%*@
Sonny Bono, copyright lasts longer than the lifespan of almost the entire population. That’s like not having a copyright law at all.

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Ctrl-Alt-Speech

A weekly news podcast from
Mike Masnick & Ben Whitelaw

Subscribe now to Ctrl-Alt-Speech »
Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Loading...