Tempers flare when Professors Carlson and Lazzell, working independently, ironically set their time machines to identical coordinates.

Clarity: Strategies for revising scientific writing

Nathan Sheffield, PhD
www.databio.org/slides

Let's start with an example

Example 1

What makes this sentence unclear?
The assumptions that all sites evolve at one of two evolutionary rates (conserved and nonconserved), that these rates are uniform across the genome, that sites evolve independently conditional on whether they are in conserved or nonconserved regions, and that the phylogenetic models for conserved and nonconserved regions have the same branch-length proportions, base compositions, and substitution patterns, all represent oversimplications of the complex process of sequence evolution in eukaryotic genomes.

Example 1

Distance between subject and verb
The assumptions that all sites evolve at one of two evolutionary rates (conserved and nonconserved), that these rates are uniform across the genome, that sites evolve independently conditional on whether they are in conserved or nonconserved regions, and that the phylogenetic models for conserved and nonconserved regions have the same branch-length proportions, base compositions, and substitution patterns, all represent oversimplications of the complex process of sequence evolution in eukaryotic genomes.

Example 1

Complex subject
The assumptions that all sites evolve at one of two evolutionary rates (conserved and nonconserved), that these rates are uniform across the genome, that sites evolve independently conditional on whether they are in conserved or nonconserved regions, and that the phylogenetic models for conserved and nonconserved regions have the same branch-length proportions, base compositions, and substitution patterns, all represent oversimplications of the complex process of sequence evolution in eukaryotic genomes.

Example 1

verbs vs. Implied actions (nominalizations)
The assumptions that all sites evolve at one of two evolutionary rates (conserved and nonconserved), that these rates are uniform across the genome, that sites evolve independently conditional on whether they are in conserved or nonconserved regions, and that the phylogenetic models for conserved and nonconserved regions have the same branch-length proportions, base compositions, and substitution patterns, all represent oversimplications of the complex process of sequence evolution in eukaryotic genomes.

Example 1

List precedes its context
The assumptions that all sites evolve at one of two evolutionary rates (conserved and nonconserved), that these rates are uniform across the genome, that sites evolve independently conditional on whether they are in conserved or nonconserved regions, and that the phylogenetic models for conserved and nonconserved regions have the same branch-length proportions, base compositions, and substitution patterns, all represent oversimplications of the complex process of sequence evolution in eukaryotic genomes.

Conciseness
↑ ↓
Clarity
↑ ↓
Cohesion

The Four Problems

The Four Problems

Things that make scientific writing unclear
  1. Subjects and verbs too far apart
  2. Overabundance of nominalizations
  3. Poor flow (misplacement of old vs new information)
  4. Excessive or unnecessary use of passive voice


NOT the complexity of the topic!

The Four Problems

Subjects and verbs too far apart
  • Who did it, and what did they do? English readers expect doers to be near their actions.
  • Complex subjects (subjects modified with essential clauses) can violate this expectation.

The Four Problems

Subjects and verbs too far apart
Complex subject:
Analysis of peak composition of clusters containing more than 20 peaks (large clusters hereafter) to identify a minimal required set to determine the clusters identified mediator and cohesin subunits as the best individual features.
Simplified subject:
Mediator and cohesin subunits were identified as the best individual features by analysis of peak composition of clusters containing more than 20 peaks (large clusters hereafter) to identify a minimal required set to determine the clusters.

The Four Problems

Overabundance of nominalizations
  • English readers expect actions to be in verbs.
  • Nominalizations are actions that appear in parts of a sentence other than a verb (e.g. in nouns or adjectives).
  • Some nominalizations are clear, but many reduce clarity.

The Four Problems

Overabundance of nominalizations
Actions in Nominalizations:
The assumption that all RNAs are poly- adenylated is an oversimplification of the transcription process.
Actions in Verbs:
The model oversimplifies the transcription process because it assumes that all RNAs are polyadenylated.

The Four Problems

Poor flow (lack of cohesion)
A cohesive sentence links with neighboring sentences by starting with familiar ideas and ending with new ideas.

oldnew

    Disrupt flow by:
  • Starting with unfamiliar ideas
  • Ending with backwards-linking ideas


Cohesion matters at both sentence-level and paragraph-level.


Tempers flare when Professors Carlson and Lazzell, working independently, ironically set their time machines to identical coordinates.

With their time machines ironically set to identical coordinates while working independently, Professors Carlson and Lazzell's tempers flare.

The Four Problems

Poor flow (lack of cohesion)
(in a paper about farmers...)
Farmers try to provide optimal growing conditions for crops by using soil additives to adjust soil pH. Garden lime, or agricultural limestone, is made from pulverized chalk, and can be used to raise the pH of the soil. Clay, which is a naturally acidic soil type, often requires addition of agricultural lime.

The Four Problems

old information vs new information
(in a paper about farmers...)
Farmers try to provide optimal growing conditions for crops by using soil additives to adjust soil pH. Garden lime, or agricultural limestone, is made from pulverized chalk, and can be used to raise the pH of the soil. Clay, which is a naturally acidic soil type, often requires addition of agricultural lime.

The Four Problems

old information vs new information
(in a paper about farmers...)
Farmers try to provide optimal growing conditions for crops by using soil additives to adjust soil pH. One way to raise the pH of the soil is an additive made from pulverized chalk called garden lime or agricultural limestone. Agricultural limestone is often added to naturally acidic soils, such as clay.

The Four Problems

Excessive or unnecessary use of passive voice

Active

I stole the money

Passive

The money was stolen by me
Passive voice has side-effects:
  • It often increases length
  • It can eliminate the actor (causing ambiguity)
  • Reverses the order of the sentence (A-B vs. B-A)

The Four Problems

Excessive or unnecessary use of passive voice
  • Consider cohesion: Don’t choose passive voice simply out of habit. Do choose passive voice when it improves cohesion by putting familiar ideas first.
  • Most scientific journals encourage authors to use active voice for the sake of clarity, conciseness, and cohesion.
  • Passive voice is NOT inherently scientific!

Passive voice

What do the journals say?
Science
Use active voice when suitable, particularly when necessary for correct syntax (e.g., ‘To address this possibility, we constructed a λZap library...,’ not ‘To address this possibility, a λZap library was constructed...’).

Passive voice

What do the journals say?
Nature
Active voice has been Nature policy for as long as I can remember; it is enshrined in our style manual and is specifically recommended to all authors as part of our standard acceptance procedure... you will see papers in Nature in the passive voice, but you can be assured that this is at the author's insistence rather than Nature policy. - Maxine Clark, editor

Revision techniques

Revision techniques

Ways to improve clarity, conciseness, and cohesion
  • Omit needless words
  • Put actions in verbs
  • Use nominalizations to summarize
  • Place verbs near subjects
  • Put familiar information first

Revision techniques

Omit needless words
  1. It is absolutely vital that...
    → We must...
  2. At the same time...
    → Simultaneously/furthermore...
  3. There were five mice receiving antibiotics...
    → Five mice received antibiotics.

Revision techniques

Put actions in verbs
  1. We performed an analysis...
    → We analyzed
  2. The quantification of the atoms was done...
    → The atoms were quantified.
  3. The MS managed the measurement and identification of the proteins.
    → The MS measured and identified the proteins.

Revision techniques

Use summarizing nominalizations
Nominalizations are useful when they summarize the action of the previous sentence:
Our analysis using regression and k-means clustering revealed...
→ We analyzed the data with regression and k-means clustering. This analysis revealed...

Revision techniques

Use summarizing nominalizations
Complex subject:
Analysis of peak composition of clusters containing more than 20 peaks (large clusters hereafter) to identify a minimal required set to determine the clusters identified mediator and cohesin subunits as the best individual features.
Summarizing nominalization:
For large clusters (containing more than 20 peaks), we identified a minimal required set of peaks that determine the cluster. This analysis identified mediator and cohesin subunits as the best individual features.

Revision techniques

Place verbs near subjects
DNA in repeat regions or small microsatellites or with long stretches of the same base causes problems for next-gen sequencers.
DNA causes problems for next-gen sequencers when it is in repeat regions or small microsatellites or has long stretches of the same base.

Revision techniques

Put familiar information first
We searched the database of sequences to look for similar structures. A protein involved in the regulation of the BRCA1 gene in humans was found by the search.
→ We searched the database of sequences to look for similar structures. This search found a protein involved in the regulation of the BRCA1 gene in humans.

Now for some practice

Example 2 - What would you do?
This component will chiefly involve a description and quantitative analysis of the study’s data collection process.

We suggest: put actions in verbs
→ This component describes and quantitatively analyzes the data collection process.

The sentence is more concise (10 vs 16 words).
The meaning is clearer.
Example 3 - What would you do?
Detailed analyses of the evolutionary features of different types of regulatory elements are an important area for future research.

We suggest: put actions in verbs

Consider implied actions vs. verb.
→ Future research should analyze the evolutionary features of different types of regulatory elements.
The sentence is more concise (13 vs 19 words).
The subject is clearer.
The subject and verb are closer together.
Example 4 - What would you do?
Improvements are expected in the predictive power of all the scores being computed on multispecies alignments.

We suggest: use active voice

→ [We expect to] improve the predictive power of our multispecies alignment scores.
The sentence is more concise (12 vs 16 words).
Prepositions no longer disrupt flow.
Sentence is more direct.
Example 5 - What would you do?
Some astonishing questions about the nature of the universe have been raised by scientists studying the nature of black holes in space. The collapse of a dead star into a point perhaps no larger than a marble creates a black hole.

We suggest: put familiar info first, omit needless words

→ Scientists studying black holes have raised some astonishing questions about the universe. A black hole is created by the collapse of a dead star into a point perhaps no larger than a marble.
The link is clearer; these sentences are more cohesive.
Example 6 - What would you do?
The second reaction is really the end result of a very large number of reactions. It is also worth noting that these two reactions form a simple linear chain whereby the product of the first reaction is the reactant for the second.

We suggest: omit needless words

→ The second reaction is the result of numerous reactions. Moreover, these two reactions form a simple linear chain whereby the product of the first reaction is the reactant for the second.

More concise (32 vs. 42 words)
Example 7 - What would you do?
Significant positive correlations were evident between the substitution rate and a nucleosome score from resting human T-cells.

We suggest: Put actions in verbs

→ In resting human T-cells, the substitution rate correlated with a nucleosome score.

More concise (12 vs. 17)
The verb is correlate rather than the nebulous were evident
Example 1 (again, in context) - What would you do?
The model used by the software is a fairly rich probabilistic model, but it is clearly not realistic in several respects. The assumptions that all sites evolve at one of two evolutionary rates (conserved and nonconserved), that these rates are uniform across the genome, that sites evolve independently conditional on whether they are in conserved or nonconserved regions, and that the phylogenetic models for conserved and nonconserved regions have the same branch-length proportions, base compositions, and substitution patterns, all represent oversimplications of the complex process of sequence evolution in eukaryotic genomes.
Example 1 (again, in context)

We suggest: Put verbs near subjects

The gist of the sentence: Certain assumptions oversimplify the complex process of sequence evolution in eukaryotic genomes.

Should the gist of the sentence go first or last? Before the list of assumptions or after it?
Example 1 (again, in context)

A possible revision

→ [Our model admittedly] oversimplifies the complex process of sequence evolution in eukaryotic genomes by assuming that: (1) all sites evolve at one of two evolutionary rates (conserved and nonconserved), (2) these rates are uniform across the genome, (3) sites evolve independent of whether they are in conserved or nonconserved regions, and (4) the phylogenetic models for conserved and nonconserved regions have the same branch-length proportions, base compositions, and substitution patterns.
Example 1 (again, in context)

Positive consequences


The most important action (oversimplify) is now a verb
The verbs follow closely after the subjects
The sentence is more cohesive: familiar information links to the previous sentence at the beginning
The sentence contains cues for parsing information (by, [1, 2, 3, 4], however, etc.)

By selecting the soda cracker over the graham
during snack time, kindergarten history
is made by Kevin Wakefield, Nov. 12, 1957.

Nov. 12, 1957: Kevin Wakefield, during snack
time, makes kindergarten history by selecting
the soda cracker over the graham.
References and further reading
  • The Duke Scientific Writing Resource
  • Style: Toward clarity and grace (1990), Joseph Williams
  • Expections (2004) and The Sense of Structure (2004), George Gopen
  • How to write consistently boring scientific literature (2007), Kaj Sand-Jensen
  • The infectiousness of pompous prose (1992), Martin W. Gregory
  • How we write about biology (1991), Randy Moore
  • Writing intelligible English prose for biomedical journals (2007), John Ludbrook
  • Whose literature is science? (2003), Judith A. Swan
  • What is the scientific literature? (1986), John Maddox
  • Scientific literature: Clear as mud (2003), Jonathan Knight
  • The science of scientific writing (1990), George Gopen, Judith Swan
  • The readability of marketing journals: are award-winning articles better written? (2008), Sawyer, Laran, & Xu

Thanks for listening!

Slides at http://databio.org/slides/scientific_writing.html

Comics by Gary Larson (The Far Side)
Color by Gareth Wonfor
nsheff · databio.org · nsheffield@virginia.edu