Why Is 'Self-Plagiarism' Even An Issue?

from the it's-called-reinforcement? dept

A recent report looked at how scientists respond when caught plagiarizing a research paper. The article and the responses are a bit amusing — but what struck me was the claim that the vast majority of “plagiarism” was actually “self-plagiarism.” In other words, the researcher was effectively reusing some bit of material he or she had published for something else. I’m sure some academics will be quick to explain why this is a horrible breach of academic protocol, but I’m having a very difficult time understanding how this makes any sense, whatsoever. Reusing concepts, ideas, data or anything else would seem to be an incredibly useful tool for the purposes of reinforcement, or even to build on those earlier works. Limiting that for some artificial standard just doesn’t seem to make much sense. There obviously may be cases where the first research journal to publish something gets the copyright on the content (an all-too-frequent occurrence, especially for publicly-funded research), but even then it’s not “plagiarism” so much as copyright infringement, potentially — and it seems ridiculous to not allow such reuse to go forward.

Filed Under: , ,

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “Why Is 'Self-Plagiarism' Even An Issue?”

Subscribe: RSS Leave a comment
37 Comments
Albert Nonymous says:

Re: Re:

Because you actually have to do *new* research to warrant a new publication. This promotes the progress of science much more efficiently than doing a single piece of work and writing about it over and over again. Given all the arguments here at Techdirt about why perpetual copyright is bad, this should be clear as crystal. A musician benefiting perpetually from a single performance decades ago doesn’t promote the creation of new music either.

David T says:

A common practice

In academia, reusing portions of text from prior writings is a common practice. It makes no sense to rewrite your own text over and over again when you are trying to convey the exact same meaning.

Poaching someone else’s work, on the other hand, is a no-no. But it irons itself out as most publications/grants go through peer review (in the hard sciences, at least). The community within in any given specialty is small and such malfeasance gets around. These are the same people who are reviewing your papers and sitting on your study sections for grant selection.

Albert Nonymous says:

Re: A common practice

In the article (which was about plagiarism in general), the bit about self-plagiarism refers explicitly to republising the same *data*. That’s much more serious than reusing portions of text from the discussion portion of the paper (as mentioned in the article, there are only so many different ways to say the same thing).

As an aside, interestingly I recieved my graduate degree from the same institution that the authors of the report are from, which may explain why I understand and am sympathetic to their position.

Albert Nonymous says:

Missing the point (maybe several)

Scientific communication is a fairly precise process that follows established standards which evolved for good reason. In this case, the desire is to have the results for a particular experiment (or related set of experiments) published once and only once. Henceforth, that information can be pointed to by reference by anyone and there will be a single place to look for that information. That’s very useful and important. For those of you that work with or understand relational databases, the infrastructure for publising *original research* is structured very much like a well designed database, and for much the same reasons. A piece of important data is stored in only one place, then is referred to by a unique identifier from then on. In the case of scientific publications it’s a unique reference that points back to exactly the article in question, just like a unique key in a database. The end result is efficient communication because scientists can unambigously refer to a piece of research when communicating. That’s very, very important.

“Self-plagiarism” is bad because it weakens this system by introducing ambiguity. The body of scientific literature is enormous, and growing almost exponentially. It’s hard enough to find information as it is. If it’s useful to republish existing data in a new context to better discuss or explain something, there is already a vehicle for that – it’s called a “review article”. People understand that this is different from “original research”, which is what the argument against self-plagiarism is referring to.

An additional point: A scientist’s productivity is measured by publication output and the relative importance of a particular piece of work (by how many people reference that work – think Google’s Page Rank). The understanding is that a new article describing original research describes new work done since the last article on the subject was published. Republishing work already done can be construed as cheating, since the unscrupulous can pad their publication record through clever use of already published work, without actually doing new research. This means that scientists who are good at writing articles will have an advantage over scientists who do good and careful research. Science depends on the latter, not the former.

hegemon13 says:

Re: Missing the point (maybe several)

“A scientist’s productivity is measured by publication output and the relative importance of a particular piece of work…”

Maybe this is the problem, and not the fact the scientists are good writers.

“Republishing work already done can be construed as cheating, since the unscrupulous can pad their publication record through clever use of already published work, without actually doing new research.”

Again, sounds like a problem with how individuals are ranked than with the practice itself. If this silly, artificial ranking was not in place, or if all these really smart scientists actually read the work instead automatically equating quantity with quality, the “cheating” would not benefit the cheater.

Albert Nonymous says:

Re: Re: Missing the point (maybe several)

It’s not a perfect system, but I challenge you to come up with a better one that doesn’t require much more effort to get the same results. It’s precisely because of the rigor with which original research is communicated that it works as a useful, though admittedly flawed metric. It’s not much different from any other field where success is measured by the number of successful projects completed. A published article of original research is the successful culmination of a lot of work that had to be done in order to produce the article – you can’t easily fake it. So it’s somewhat more efficient than actually going to each lab and verifying that a scientist is actually doing work. Even the latter happens in the form of grant-review committees that do visit and evaluate, so the system isn’t just based on publication alone. But the entire point of science is to contribute new knowledge to the existing pool, so until the work is published to the world, that hasn’t really happened. And so measuring success by publication record is a reasonable thing to do. The major flaw is that a single research paper doesn’t easily equate to either the amount of work done, it’s difficulty or it’s significance. So it’s possible to publish a bunch of work that doesn’t really move the field forward and still look better than the scientist who is doing careful, painstaking work that might produce a breakthrough when complete. That’s the flaw in the system, not the fact that publication is the desired end product of the work.

Jonas says:

“Reusing concepts, ideas, data or anything else would seem to be an incredibly useful tool for the purposes of reinforcement, or even to build on those earlier works.”

Excactly, which is why you should give references even when you’re quoting yourself. Otherwise you’re not building on those earlier works or reinforcing anything.

For example, there may be references in the earlier work that the scientist in question does not think is relevant in the context of the latter work, but for someone else using the text as a source would have found incredibly useful. If you’re not giving a reference, it’s much harder to find those references, not to mention evaluating whether the conclusions (if any) seems to be valid is exponentially harder.

Aaron Martin-Colby (profile) says:

Missing the point (maybe several)

Albert, thanks so much for the response. I was hoping to read a measured explanation of the practice.

While I think the practice is defensible, I don’t think it’s entirely necessary.

For example, let’s say I write a good piece of connecting information in a paper I write about, say, fungus. I continue my research into fungus and come up with enough material to write another paper. I’m not “cheating,” I actually have enough to write my new paper.

But let’s say I need to use connecting information identical to my previous paper. If I wrote it correctly the first time, not only is it silly to write it again to convey the same information, I consider it inefficient.

If I wrote the information accurately and succinctly the first time, I should use it again because it was what I considered perfect. The point of writing is not to be unique, but to communicate the data effectively and as efficiently as possible. Being forced to re-write introduces inefficiency into the process.

That being said, the article talks about works “in which the text was, on average, 86.2% similar to previously-published work.” This is pretty obviously simply re-writing an article and re-submitting it. Moreover, they also mention that in a majority of the cases, the plagiarism was the smallest problem, with fudged, or outright faked, data being much more common.

I also think the self-plagiarists can be understood. The article itself mentions “publish-or-perish.” If I was in a researcher’s shoes, and my job was on the line, I’d do the same thing. I care about feeding my family more than scientific integrity.

Albert Nonymous says:

Re: Missing the point (maybe several)

Good questions/observations. 2 points.

1. Make the distinction between the data (most important), the discussion of conclusions drawn from the data (very important), exact descriptions of how the experiments were done so they can be reproduced (important) and the boilerplate text required to make an article a coherent stand-alone piece of work (not so important). Reproducing the latter parts of this list are more justifiable than the former. I don’t think the research in the original article made this distinction (but I haven’t read it so can’t say for certain). From the discussion, it seemed like they were measuring similarity over all rather than assigning variable importance to different parts of the text.

2. The nature of a published piece of original work is very much different from writing a conventional article or paper. Like I said, at its best the system works more like a database than a mere vehicle to disseminate information.

So even if a particular piece of writing is actually perfect, it’s still more useful to refer to it than reproduce it again. But as enumerated in the list above, some parts of a paper can be copied with less impact than others. In a series of subsequent publications produced by a lab through the course of research, for example, the introductory portion of a paper will be very similar from article to article, because it’s just there to give the reader context for the real information that’s about to be conveyed. So introductions tend to be copied over and tweaked rather than written from scratch each time. Self-plagiarism in this case is more a matter of laziness and expedience rather than questionable ethics, and so more forgivable and tolerated.

I would also note that copying from previous work introduces a bit of intellectual laziness that might not be insignificant. It takes away the incentive to think carefully about everything once last time, taking away the opportunity for new insights and epiphanies. Since the real work of science takes place in the mind not the lab bench, this could have more of a detrimental impact than is apparent at first glance. It’s not the doing of science that moves knowledge forward, it’s the thought that goes into it.

Hmm. Just noticed your last comment. That’s short-term thinking. Undermining scientific integrity weakens the entire system (there’s a very large element of trust in the integrity of ones peers), making it more difficult to consistently feed the family over time. I’d say the promise of many meals in the future should outweigh the desire for one “right now”. Besides, “publish or perish” sounds more dire than it actually is. It affects more where your job is (a top-flight institution vs. a lesser one) rather than whether you have a job at all. It’s not a justification for slippery ethics.

Aaron Martin-Colby (profile) says:

Re: Re: Missing the point (maybe several)

Agreed on pretty much all counts.

The distinction of the data being sacrosanct in comparison to other aspects of the articles is well made and one not missed. As such, duplicate data is right out. But I stand by my efficiency argument. Luckily, I see no problem since if duplicate data is used, efficiency in research demands that one cites one’s own articles.

Also, about my final comment, I’m not saying that what these researchers did is ethically correct, but at the very least, and in particular circumstances, understandable.

While this system defines a researcher’s employment in the hierarchy of educational institutions and not necessarily whether he has a job at all, that job could be 300 miles away.

Uprooting my entire family is a pretty nasty option. And while, yes, in the long term this could do serious damage to my credibility, I might weigh the situation (low-likelihood of getting caught vs. high-likelihood of moving) and choose the less-than-ethical path. The promise of many meals in the future should outweigh the desire for one “right now,” but perhaps it doesn’t.

Anonymous Coward says:

Re: Re: Re: Missing the point (maybe several)

“While this system defines a researcher’s employment in the hierarchy of educational institutions and not necessarily whether he has a job at all, that job could be 300 miles away.

Uprooting my entire family is a pretty nasty option.”

You must not be living in the same world I’m living in. Relocation for work happens all the time. I’m sitting here writing this instead of working precisely because this happened to my spouse and someone had to fall on the sword. Having done this quite a few times already (and having grown up with a father in the military before that), I have to say it’s not as bad as people make it out to be.

Besides, people move easily when it involves an “opportunity” like getting a position at “Better Institution X”. So moving to take a job elsewhere because the research didn’t pan out isn’t as awful as you imply.

Anonymous Coward says:

Re: Missing the point (maybe several)

But let’s say I need to use connecting information identical to my previous paper. If I wrote it correctly the first time, not only is it silly to write it again to convey the same information, I consider it inefficient.

That’s nice that you consider it inefficient, but really you need to just cite your previous work – it’s even more efficient and it’s not plagiarism.

Matt Bennett says:

What you have here, is a case of “research papers have to be about something”. So, these researchers wanted to research plagiarism. Turns out there isn’t that much (which would make the paper about nothing), but other researchers are lazy and reuse a lot of their previous work. Call it “self-plagiarism” and voila! These guys actually have some thing to write about.

yonatron (profile) says:

Mike: don't miss Albert Nonymous's first post.

This is the important bit. Publishing the same results multiple times is something that would be nearly impossible for a working scientist to do accidentally or unconsciously. And because the point of an individual publication is usually original research, it is in fact deceptive when someone does so. As folks have said, there are obviously times when you want to reiterate what you’ve found, which is when you cite yourself.

Now, on the other aspect of reusing text, I’d agree with the no-big-deal assessment. That is, if you’ve found a good succinct way to explain some finding you cite, and you end up citing it in multiple places, you should by all means “plagiarize” yourself. This should in fact be encouraged, and I think it’s a shame that people might consider reusing a few sentences of text in different works is equivalent to presenting old research as though it’s new.

Mike (profile) says:

Re: Mike: don't miss Albert Nonymous's first post.

And because the point of an individual publication is usually original research, it is in fact deceptive when someone does so.

Why?

People keep saying this, but no one gives a good reason why it should be so. All I hear is “this is the way it’s done.” That, to me, is a good indication that something’s wrong with the system.

Albert Nonymous says:

Re: Re: Mike: don't miss Albert Nonymous's first post.

Which part of the quote is the “Why?” addressing? Why the point of publication is original research? Or why passing off old work as original work is deceptive?

If the former, keep in mind that publishing original research is just one form of publication that a researcher can engage in. Nothing prevents a researcher from writing a ‘review article’ which uses already published data, or a Scientific American article, or a chapter in a text book, or a newspaper article, or in a presentation, or a blog entry etc. Well maybe contractual restraints or something like that, but in general there is no ‘ban’ on reusing one’s work. In fact, when giving a talk (at a conference, say), people show old stuff all the time, along with a few new bits at the end to show progress made since last time.

The thing is, committing research results to the permanent scientific record is a formal act that means something specific. It’s not the first time the results have been seen by anyone or even seen print, it’s a distillation of lots of work that is supposed to stand alone as a self-contained piece of scientific effort. It is not merely a communication of research results, it’s a specific act that is an integral part of the scientic process itself.

I could turn to an analogy, I suppose. Why don’t musicians produce albums containing “all new songs” by just including a set of old songs, perhaps in a different order than the first time around? Because the “all new” means something specific (just as “original research” means something specific). Instead, they call them “Best of” or “Greatest Hits” or “Retrospective” and such.

I don’t know – maybe you’re confusing scientific publication with a blog or something.

Anonymous Coward says:

Re: Re: Mike: don't miss Albert Nonymous's first post.

The papers written by scientists act as sort of a grade as far as the university or research entity that employs the scientists. When a scientist or researcher takes information they have previously published, rehashed it without adding anything new, and then publish again, the researcher’s employer will likely become quite annoyed. It is only an original paper when there is new data or conclusions, not a rehash of something previously published – unless the paper specifically states it is a rehash, likely because the author has a new insight.

I find it strange that you would have problems with the criteria imposed on scientists and researchers and their publications. People plagiarizing their own works offer nothing new. They are essentially copying their old work and implying something is new. They are getting caught defrauding the people who read their articles, and if they are being rated by their employers on the basis of original publications, they are defrauding their employers. By using tools to determine whether a paper is new or a rehash of a previous publication, authors are being objectively and independently rated. I would think this is the sort of quality business model you would encourage, considering many of your posts.

Albert Nonymous says:

Re: Albert

Could you clarify your argument, please? (Or actually make one?). What’s your point? I’m not actually a practicing scientist at the moment, so I’m not defending the system because I benefit from the status quo. I do, however, do scientifically related work (data mining, analysis and such), so am actually more interested in the forest itself than individual trees. It’s only now that I understand why certain practices were important back when I was writing papers. There’s a saying in software engineering regarding clarity of code and comments (I’m paraphrasing from memory): “Code is written once, but read many times”. The same applies to the body of scientific literature. It’s not there to benefit the individual scientists, but to benefit all those who are or ever will use it. That’s why it’s important to do things right and why looking at the issue from the perspetive of the individual contributor doesn’t give a clear idea of why certain practices are followed.

mechwarrior says:

Theres more to this story,probably. One could be a matter of attribution. Many sham papers often do not attribute their sources so as to hide whether they lied about the work they did.

Another could be that the portions copied were based on research that is not applicable to the current research. For example, say someone takes some information on the thermodynamics of plant cells, and tries to use that information to show that nickle-cadmium batteries can sustain higher energy-densities than lithium ion batteries. Its an extreme example, but considering that a lot of this research isnt exactly written for the reading of non-scientists, and that peer reviews can often be fooled, its completely possible, and has happened before.

This is more to do with fidelity and accuracy of information, as opposed to plainly reusing previous research. Most research papers have a bibliography that may span decades of research. As long as everything is attributed, then it should be fine. Its when the research isnt properly attributed, or is incorrectly correlated that there is a problem.

DB says:

It isn't considered plagiarism

I earned my doctorate recently from one of the country’s top research institutions. I think you hit the nail on the head when you called it a copyright violation, rather than plagiarism. The university actively encouraged researchers to reuse their own material (if you’ve got something good, you want to tell as many people as possible). We were even permitted to copy and paste publications verbatim into our thesis. The caveat here is that you were required to get permission from the publisher before reproducing any figures or charts (the thesis is indeed considered a publication).

Aside from this, it is frowned upon to republish the same results in different journals, even though it’s definitely possible to get away with it given the vast number of journals with lax peer review standards. The university certainly wouldn’t consider it plagiarism, but it would reflect poorly on the lab and the principle investigator. If a high profile journal found out that this was happening, it would be less likely to publish articles from that author in the future.

Albert Nonymous says:

Re: It isn't considered plagiarism

“Aside from this, it is frowned upon to republish the same results in different journals, even though it’s definitely possible to get away with it given the vast number of journals with lax peer review standards.”

It’s only possible to get away with if nobody looks. By definition, publication means putting things into the public record (the “public” in publication). Whoops – we’re discussing the outcome of someone looking.

The fact that people who call themselves scientists would be stupid enough to copy/plagiarize work that is destined to be public by definition is mind-blowing to me. That pretty defines one’s credibility as a scientist in a single act. The inability to reason through this fairly simple situation certain speaks volumes about the ability to reason through the complexities one encounters in research.

Luci says:

It’s an oxymoron. You can’t ‘self-plagiarize’ since plagiarism is ‘the unauthorized use or close imitation of
the language and thoughts of another author and the representation of them as one’s own original work.’ As per the dictionary, of course. Since you cannot be both yourself and another author at the same time, you cannot plagiarize your own works.

Joe (profile) says:

research paper spam

From one outside the field, what these scientists are doing looks like research paper spam … repackaging old data and pawning it off to a publication as new research. Why? Because research scientists / university professors are under great pressure to publish. The number of times and how recently matters. These guys know what they did is wrong. It’s Cramer trying feebly to defend himself against John Stewart … ain’t gonna work.

Jesse says:

Self-plagiarism, in my opinion, is propaganda originating from ridiculously overzealous scientific publishers.

“”Self-plagiarism” is bad because it weakens this system by introducing ambiguity. The body of scientific literature is enormous, and growing almost exponentially. It’s hard enough to find information as it is.”

Rather than go crazy about self-plagiarism, how about work towards a better way to organize and distribute information. Each publisher releases works in different search tools, and most lock it down so that if you want to read it you have to pay 30$ for 24 hours of viewing time. That is ridiculous. There is also a lot of information on the internet. Google does an excellent job of organizing information based on relevance to a search term. Why not make use of that, or something like it?

Jesse says:

“But they have the same requirement for any college class too. I could not take a paper I wrote in an english class and submit for another class even if it met every single requirement. I think that is really silly.”

I agree. If the teachers made the class so similar that assignments overlap so precisely, then why the hell shouldn’t you be allowed to hand in the same assignment. Or maybe the school wants to be paid twice for teaching you the same thing.

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Ctrl-Alt-Speech

A weekly news podcast from
Mike Masnick & Ben Whitelaw

Subscribe now to Ctrl-Alt-Speech »
Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Loading...