Guesstimating the Size of the Global Array Synthesis Market

(Updated, Aug 31, for clarity.)

After chats with a variety of interested parties over the last couple of months, I decided it would be useful to try to sort out how much DNA is synthesized annually on arrays, in part to get a better handle on what sort of capacity it represents for DNA data storage. The publicly available numbers, as usual, are terrible, which is why the title of the post contains the word "guesstimating". Here goes.

First, why is this important? As the DNA synthesis industry grows, and the number of applications expands, new markets are emerging that use that DNA in different ways. Not all that DNA is produced using the same method, and the different methods are characterized by different costs, error rates, lengths, throughput, etc. (The Wikipedia entry on Oligonucleotide Synthesis is actually fairly reasonable, if you want to read more. See also Kosuri and Church, "Large-scale de novo DNA synthesis: technologies and applications".) If we are going to understand the state of the technology, and the economy built on that technology, then we need to be careful about measuring what the technology can do and how much it costs. Once we pin down what the world looks like today, we can start trying to make sensible projections, or even predictions, about the future.

While there is just one basic chemistry used to synthesize oligonucleotides, there are two physical formats that give you two very different products. Oligos synthesized on individual columns, which might be packed into 384 (or more) well plates, can be manipulated as individual sequences. You can use those individual sequences for any number of purposes, and if you want just one sequence at a time (for PCR or hybridization probes, gene therapy, etc), this is probably how you make it. You can build genes from column oligos by combining them pairwise, or in larger numbers, until you get the size construct you want (typically of order a thousand bases, or a kilobase [kB], at which point you start manipulating the kB fragments). I am not going to dwell on gene assembly and error correction strategies here; you can Google that.

The other physical format is array synthesis, in which synthesis takes place on a solid surface consisting of up to a million different addressable features, where light or charge is used to control which sequence is grown on which feature. Typically, all the oligos are removed from the array at once, which results in a mixed pool. You might insert this pool into a longer backbone sequence to construct a library of different genes that code for slightly different protein sequences, in order to screen those proteins for the characteristics you want. Or, if you are ambitious, you might use the entire pool of array oligos to directly assemble larger constructs such as genes. Again, see Google, Codon Devices, Gen9, Twist, etc. More relevant to my purpose here, a pool of array-synthesized oligos can be used as an extremely dense information storage medium. To get a sense of when that might be a viable commercial product, we need to have an idea of the throughput of the industry, and how far away from practical implementation we might be. 

Next, to recap, last year I made a stab at estimating the size of the gene synthesis market. Much of the industry revenue data came from a Frost & Sullivan report, commissioned by Genscript for its IPO prospectus. The report put the 2014 market for synthetic genes at only $137 million, from which I concluded that the total number of bases shipped as genes that year was 4.8 billion, or a bit less than a duplex human genome. Based on my conversations with people in the industry, I conclude that most of those genes were assembled from oligos synthesized on columns, with a modest, but growing, fraction from array oligos. (See "On DNA and Transistors", and preceding posts, for commentary on the gene synthesis industry and its future.)

The Frost & Sullivan report also claims that the 2014 market for single-stranded oligonucleotides was $241 million. The Genscript IPO prospectus does not specify whether this $241 million was from both array- and column-synthesized oligos, or not. But because Genscript only makes and uses column synthesis, I suspect it referred only to that synthesis format.  At ~$0.01 per base (give or take), this gives you about 24 billion bases synthesized on columns sold in 2014. You might wind up paying as much as $0.05 to $0.10 per base, depending on your specifications, which if prevalent would pull down the total global production volume. But I will stick with $0.01 per base for now. If you add the total number of bases sold as genes and the bases sold as oligos, you get to just shy of 30 billion bases (leaving aside for the moment the fact that an unknown fraction of the genes came from oligos synthesized on arrays).

So, now, what about array synthesis? If you search the interwebs for information on the market for array synthesis, you get a mess of consulting and marketing research reports that cost between a few hundred and many thousands of dollars. I find this to be an unhelpful corpus of data and analysis, even when I have the report in hand, because most of the reports are terrible at describing sources and methods. However, as there is no other source of data, I will use a rough average of the market sizes from the abstracts of those reports to get started. Many of the reports claim that in 2016 the global market for oligo synthesis was ~$1.3 billion, and that this market will grow to $2.X billion by 2020 or so. Of the $1.3B 2016 revenues, the abstracts assert that approximately half was split evenly between "equipment and reagents". I will note here that this should already make the reader skeptical of the analyses, because who is selling ~$260M worth of synthesis "equipment"? And who is buying it? Seems fishy. But I can see ~$260M in reagents, in the form of various columns, reagents, and purification kit. This trade, after all, is what keeps outfits like Glenn Research and Trilink in business.

Forging ahead through swampy, uncertain data, that leaves us with ~$650M in raw oligos. Should we say this is inclusive or exclusive of the $241M figure from Frost & Sullivan? I am going to split the difference and call it $500M, since we are already well into hand waving territory by now, anyway. How many bases does this $500M buy?

Array oligos are a lot cheaper than column oligos. Kosuri and Church write that "oligos produced from microarrays are 2–4 orders of magnitude cheaper than column-based oligos, with costs ranging from $0.00001–0.001 per nucleotide, depending on length, scale and platform." Here we stumble a bit, because cost is not the same thing as price. As a consumer, or as someone interested in understanding how actually acquiring a product affects project development, I care about price. Without knowing a lot more about how this cost range is related to price, and the distribution of prices paid to acquire array oligos, it is hard to know what to do with the "cost" range. The simple average cost would be $0.001 per base, but I also happen to know that you can get oligos en masse for less than that. But I do not know what the true average price is. For the sake of expediency, I will call it $0.0001 per base for this exercise.

Combining the revenue estimate and the price gives us about 5E12 bases per year. From there, assuming roughly 100-mer oligos, you get to 5E10 difference sequences. And adding in the number of features per array (between 100,000 and 1M), you get as many as 500,000 arrays run per year, or about 1370 per day. (It is not obvious that you should think of this as 1370 instruments running globally, and after seeing the Agilent oligo synthesis operation a few years ago, I suggest that you not do that.) If the true average price is closer to $0.00001 per base, then you can bump up the preceding numbers by an order of magnitude. But, to be conservative, I won't do that here. Also note that the ~30 billion bases synthesized on columns annually are not even a rounding error on the 5E12 synthesized on arrays.

Aside: None of these calculations delve into the mass (or the number of copies) per synthesized sequence. In principle, of course, you only need one perfect copy of each sequence, whether synthesized on columns or arrays, to use DNA in any just about application (except where you need to drive the equilibrium or reaction kinetics). Column synthesis gives you many more copies (i.e., more mass per sequence) than array synthesis. In principle — ignoring the efficiency of the chemical reactions — you could dial down the feature size on arrays until you were synthesizing just one copy per sequence. But then it would become exceedingly important to keep track of that one copy through successive fluidic operations, which sounds like a quite difficult prospect. So whatever the final form factor, an instrument needs to produce sufficient copies per sequence to be useful, but not so many that resources are wasted on unnecessary redundancy/degeneracy.

Just for shits and giggles, and because array synthesis could be important for assembling the hypothetical synthetic human genome, this all works out to be enough DNA to assemble 833 human duplex genomes per year, or 3 per day, in the absence of any other competing uses, of which there are obviously many. Also if you don't screw up and waste some of the DNA, which is inevitable. Finally, at a density of ~1 bit/base, this is enough to annually store 5 TB of data, or the equivalent of one very beefy laptop hard drive.

And so, if you have access to the entire global supply of single stranded oligonucleotides, and you have an encoding/decoding and sequencing strategy that can handle significant variations in length and high error rates at scale, you can store enough HD movies and TV to capture most of the new, good stuff that HollyBollyWood churns out every year. Unless, of course, you also need to accommodate the tastes and habits of a tween daughter, in which case your storage budget is blown for now and evermore no matter how much capacity you have at hand. Not to mention your wallet. Hey, put down the screen and practice the clarinet already. Or clean up your room! Or go to the dojo! Yeesh! Kids these days! So many exclamations!

Where was I?

Now that we have some rough numbers in hand, we can try to say something about the future. Based on my experience working on the Microsoft/UW DNA data storage project, I have become convinced that this technology is coming, and it will be based on massive increases in the supply of synthetic DNA. To compete with an existing tape drive (see the last few 'graphs of this post), able to read and write ~2 Gbits a second, a putative DNA drive would need to be able to read and write ~2 GBases per second, or ~183 Pbits/day, or the equivalent of ~10,000 human genomes a day — per instrument/device. Based on the guesstimate above, which gave a global throughput of just 3 human genomes per day, we are waaaay below that goal.

To be sure, there is probably some demand for a DNA storage technology that can work at lower throughputs: long term cold storage, government archives, film archives, etc. I suspect, however, that the many advantages of DNA data storage will attract an increasing share of the broader archival market once the basic technology is demonstrated on the market. I also suspect that developing the necessary instrumentation will require moving away from the existing chemistry to something new and different, perhaps enzymatically controlled synthesis, perhaps even with the aid of the still hypothetical DNA "synthase", which I first wrote about 17 years ago.

In any event, based on the limited numbers available today, it seems likely that the current oligo array industry has a long way to go before it can supply meaningful amounts of DNA for storage. It will be interesting to see how this all evolves.

A Few Thoughts and References Re Conservation and Synthetic Biology

Yesterday at Synthetic Biology 7.0 in Singapore, we had a good discussion about the intersection of conservation, biodiversity, and synthetic biology. I said I would post a few papers relevant to the discussion, which are below.

These papers are variously: the framing document for the original meeting at the University of Cambridge in 2013 (see also "Harry Potter and the Future of Nature"), sponsored by the Wildlife Conservation Society; follow on discussions from meetings in San Francisco and Bellagio; and my own efforts to try to figure out how quantify the economic impact of biotechnology (which is not small, especially when compared to much older industries) and the economic damage from invasive species and biodiversity loss (which is also not small, measured as either dollars or jobs lost). The final paper in this list is my first effort to link conservation and biodiversity with economic and physical security, which requires shifting our thinking from the national security of nation states and their political boundaries to the natural security of the systems and resources that those nation states rely on for continued existence.

"Is It Time for Synthetic Biodiversity Conservation?", Antoinette J. Piaggio1, Gernot Segelbacher, Philip J. Seddon, Luke Alphey, Elizabeth L. Bennett, Robert H. Carlson, Robert M. Friedman, Dona Kanavy, Ryan Phelan, Kent H. Redford, Marina Rosales, Lydia Slobodian, Keith WheelerTrends in Ecology & Evolution, Volume 32, Issue 2, February 2017, Pages 97–107

Robert Carlson, "Estimating the biotech sector's contribution to the US economy", Nature Biotechnology, 34, 247–255 (2016), 10 March 2016

Kent H. Redford, William Adams, Rob Carlson, Bertina Ceccarelli, “Synthetic biology and the conservation of biodiversity”, Oryx, 48(3), 330–336, 2014.

"How will synthetic biology and conservation shape the future of nature?", Kent H. Redford, William Adams, Georgina Mace, Rob Carlson, Steve Sanderson, Framing Paper for International Meeting, Wildlife Conservation Society, April 2013.

"From national security to natural security", Robert Carlson, Bulletin of the Atomic Scientists, 11 Dec 2013.

On DNA and Transistors

Here is a short post to clarify some important differences between the economics of markets for DNA and for transistors. I keep getting asked related questions, so I decided to elaborate here.

But first, new cost curves for reading and writing DNA. The occasion is some new data gleaned from a somewhat out of the way source, the Genscript IPO Prospectus. It turns out that, while preparing their IPO docs, Genscript hired Frost & Sullivan to do market survey across much of life sciences. The Prospectus then puts Genscript's revenues in the context of the global market for synthetic DNA, which together provide some nice anchors for discussing how things are changing (or not).

So, with no further ado, Frost & Sullivan found that the 2014 global market for oligos was $241 million, and the global market for genes was $137 million. (Note that I tweeted out larger estimates a few weeks ago when I had not yet read the whole document.) Genscript reports that they received $35 million in 2014 for gene synthesis, for 25.6% of the market, which they claim puts them in the pole position globally. Genscript further reports that the price for genes in 2014 was $.34 per base pair. This sounds much too high to me, so it must be based on duplex synthesis, which would bring the linear per base cost down to $.17 per base, which sounds much more reasonable to me because it is more consistent with what I hear on the street. (It may be that Gen9 is shipping genes at $.07 per base, but I don't know anyone outside of academia who is paying that low a rate.) If you combine the price per base and the size of the market, you get about 1 billion bases worth of genes shipped in 2014 (so a million genes, give or take). This is consistent with Ginkgo's assertions saying that their 100 million base deal with Twist was the equivalent of 10% of the global gene market in 2015. For oligos, if you combine Genscript's reported average price per base, $.05, with the market size you get about 4.8 billion bases worth of oligos shipped in 2014. Frost & Sullivan thinks that from 2015 to 2019 the oligo market CAGR will be 6.6% and the gene synthesis market will come in at 14.7%.

For the sequencing, I have capitulated and put the NextSeq $1000 human genome price point on the plot. This instrument is optimized to sequence human DNA, and I can testify personally that sequencing arbitrary DNA is more expensive because you have to work up your own processes and software. But I am tired of arguing with people. So use the plot with those caveats in mind.

NOTE: Replaces prior plot with an error in sequencing price.

NOTE: Replaces prior plot with an error in sequencing price.

What is most remarkable about these numbers is how small they are. The way I usually gather data for these curves is to chat with people in the industry, mine publications, and spot check price lists. All that led me to estimate that the gene synthesis market was about $350 million (and has been for years) and the oligo market was in the neighborhood of $700 million (and has been for years).

If the gene synthesis market is really only $137 million, with four or 5 companies vying for market share, then that is quite an eye opener. Even if that is off by a factor of two or three, getting closer to my estimate of $350 million, that just isn't a very big market to play in. A ~15% CAGR is nothing to sneeze at, usually, and that is a doubling rate of about 5 years. But the price of genes is now falling by 15% every 3-4 years (or only about 5% annually). So, for the overall dollar size of the market to grow at 15%, the number of genes shipped every year has to grow at close to 20% annually. That's about 200 million additional bases (or ~200,000 more genes) ordered in 2016 compared to 2015. That seems quite large to me. How many users can you think of who are ramping up their ability to design or use synthetic genes by 20% a year? Obviously Ginkgo, for one. As it happens, I do know of a small number of other such users, but added together they do not come close to constituting that 20% overall increase. All this suggests to me that the dollar value of the gene synthesis market will be hard pressed to keep up with Frost & Sullivan estimate of 14.7% CAGR, at least in the near term. As usual, I will be happy to be wrong about this, and happy to celebrate faster growth in the industry. But bring me data.

People in the industry keep insisting that once the price of genes falls far enough, the ~$3 billion market for cloning will open up to synthetic DNA. I have been hearing that story for a decade. And then price isn't the only factor. To play in the cloning market, synthesis companies would actually have to be able to deliver genes and plasmids faster than cloning. Given that I'm hearing delivery times for synthetic genes are running at weeks, to months, to "we're working on it", I don't see people switching en mass to synthetic genes until the performance improves. If it costs more to have your staff waiting for genes to show up by FedEx than to have them bash the DNA by hand, they aren't going to order synthetic DNA.

And then what happens if the price of genes starts falling rapidly again? Or, forget rapidly, what about modestly? What if a new technology comes in and outcompetes standard phosphoramidite chemistry? The demand for synthetic DNA could accelerate and the total market size still might be stagnant, or even fall. It doesn't take much to turn this into a race to the bottom. For these and other reasons, I just don't see the gene synthesis market growing very quickly over the next 5 or so years.

Which brings me to transistors. The market for DNA is very unlike the market for transistors, because the role of DNA in product development and manufacturing is very unlike the role of transistors. Analogies are tremendously useful in thinking about the future of technologies, but only to a point; the unwary may miss differences that are just as important as the similarities.

For example, the computer in your pocket fits there because it contains orders of magnitude more transistors than a desktop machine did fifteen years ago. Next year, you will want even more transistors in your pocket, or on your wrist, which will give you access to even greater computational power in the cloud. Those transistors are manufactured in facilities now costing billions of dollars apiece, a trend driven by our evidently insatiable demand for more and more computational power and bandwidth access embedded in every product that we buy. Here is the important bit: the total market value for transistors has grown for decades precisely because the total number of transistors shipped has climbed even faster than the cost per transistor has fallen.

In contrast, biological manufacturing requires only one copy of the correct DNA sequence to produce billions in value. That DNA may code for just one protein used as a pharmaceutical, or it may code for an entire enzymatic pathway that can produce any molecule now derived from a barrel of petroleum. Prototyping that pathway will require many experiments, and therefore many different versions of genes and genetic pathways. Yet once the final sequence is identified and embedded within a production organism, that sequence will be copied as the organism grows and reproduces, terminating the need for synthetic DNA in manufacturing any given product. The industrial scaling of gene synthesis is completely different than that of semiconductors.

70 Years After Hiroshima: "No government is well aware of the economic importance of biotechnology"

I was recently interviewed by Le Monde for a series on the impact of Hiroshima on science and science policy, with a particular focus on biotechnology, synthetic biology, and biosecurity. Here is the story in French. Since the translation via Google is a bit cumbersome to read, below is the English original.

Question 1

On the 16 of July 1945, after the first nuclear test at large scale in New Mexico (called trinity) the American physicist Kenneth Bainbridge, head of the shooting, told Robert Oppenheimer, head of the Manhattan Project, "Now we are all sons of bitches ".

In your discipline, do you feel that the time the searchers might have the same revelation has been reached ? Will it be soon?

I think this analogy does not apply to biotechnology. It is crucially important to distinguish between weapons developed in a time of war and the pursuit of science and technology in a time of peace. Over the last thirty years, biotechnology has emerged as a globally important technology because it is useful and beneficial. 

The development and maintenance of biological weapons is internationally outlawed, and has been for decades. The Trinity test, and more broadly the Manhattan Project, was a response to what the military and political leaders of the time considered an existential threat. These were actions taken in a time of world war. The scientists and engineers who developed the U.S. bombs were almost to a person ambivalent about their roles – most saw the downsides, yet were also convinced of their responsibility to fight against the Axis Powers. Developing nuclear weapons was seen as imperative for survival.

The scale of the Manhattan Project (both in personnel and as a fraction of GDP) was unprecedented, and remains so. In contrast to the exclusive governmental domain of nuclear weapons, biotechnology has been commercially developed largely with private funds. The resulting products – whether new drugs, new crop traits, or new materials – have clear beneficial value to our society.

Question 2

Do you have this feeling in other disciplines? Which ones ? Why?

No. There is nothing in our experience like the Manhattan Project and nuclear weapons. It is easy to point to the participants’ regrets, and to the long aftereffects of dropping the bomb, as a way to generate debate about, and fear of, new technologies. The latest bugaboos are artificial intelligence and genetic engineering. But neither of these technologies – even if they can be said to qualify as mature technologies – is even remotely as impactful as nuclear weapons.

Question 3

What could be the impact of a "Hiroshima" in your discipline?

In biosecurity circles, you often hear discussion of what would happen if there were “an event”. It is often not clear what that event might be, but it is presumed to be bad. The putative event could be natural or it could be artificial. Perhaps the event might kill many people as Hiroshima. (Though that would be hard, as even the most deadly organisms around today cannot wipe out populated cities in an instant.) Perhaps the event would be the intentional use of a biological weapon, and perhaps that weapon would be genetically modified in some way to enhance its capabilities. This would obviously be horrible. The impact would depend on where the weapon came from, and who used it. Was it the result of an ongoing state program? Was it a sample deployed, or stolen, from discontinued program? Or was it built and used by a terrorist group? A state can be held accountable by many means, but we are finding it challenging to hold non-state groups to account. If the organism is genetically modified, it is possible that there will be pushback against the technology. But biotechnology is producing huge benefits today, and restrictions motivated by the response to an event would reduce those benefits. It is also very possible that biotechnology will be the primary means to provide remedies to bioweapons (probably vaccines or drugs), in which case an event might wind up pushing the technology even faster.

Question 4

After 1945, physicists, including Einstein, have committed an ethical reflection on their own work. has your discipline done the same ? is it doing the same today ?

Ethical reflection has been built into biotechnology from its origins. The early participants met at Asilomar to discuss the implications of their work. Today, students involved in the International Genetically Engineered Machines (iGEM) competition are required to complete a “policy and practices” (also referred to as “ethical, legal, and social implications” (ELSI)) examination of their project. This isn’t window dressing, by any means. Everyone takes it seriously. 

Question 5

Do you think it would be necessary to rase the public awarereness about the issues related to your work?

Well, I’ve been writing and speaking about this issue for 15 years, trying to raise awareness of biotechnology and where it is headed. My book, “Biology is Technology”, was specifically aimed at encouraging public discussion. But we definitely need to work harder to understand the scope and impact of biotechnology on our lives. No government measures very well the size of the biotechnology industry – either in terms of revenues or in terms of benefits – so very few people understand how economically pervasive it is already. 

Question 6

What is, according to you, the degree of liberty of scientists face to political and industrial powers that will exploit the results of the scientific works?

Scientists face the same expectation of personal responsibility as every other member of the societies to which they belong. That’s pretty simple. And most scientists are motivated by ideals of truth, the pursuit of knowledge, and improving the human condition. That is one reason why most scientists publish their results for others to learn from. But it is less clear how to control scientific results after they are published. I would turn your question in another direction, and say politicians and industrialists should be responsible for how they use science, rather than putting this all on scientists. If you want to take this back to the bomb, the Manhattan Project was a massive military operation in a time of war, implemented by both government and the private sector. It relied on science, to be sure, but it was very much a political and industrial activity – you cannot divorce these two sides of the Project.

Question 7

Do you think about accurate measures [?] to prevent further Hiroshima?

I constantly think about how to prevent bad things from happening. We have to pay attention to how new technologies are developed and used. That is true of all technologies. For my part, I work domestically and internationally to make sure policy makers understand where biotechnology is headed and what it can do, and also to make sure it is not misused. 

But I think the question is rather off target. Bombing Hiroshima was a conscious decision made by an elected leader in a time of war. It was a very specific sort of event in a very specific context. We are not facing any sort of similar situation. If the intent of the question is to make an analogy to intentional use of biological weapons, these are already illegal, and nobody should be developing or storing them under any circumstances. The current international arms control regime is the way to deal with it. If the intent is to allude to the prevention of “bad stuff”, then this is something that every responsible citizen should be doing anyway. All we can do is pay attention and keep working to ensure that technologies are not used maliciously.

Planning for Toy Story and Synthetic Biology: It's All About Competition (Updated)

Here are updated cost and productivity curves for DNA sequencing and synthesis.  Reading and writing DNA is becoming ever cheaper and easier.  The Economist and others call these "Carlson Curves", a name I am ambivalent about but have come to accept if only for the good advertising.  I've been meaning to post updates for a few weeks; the appearance today of an opinion piece at Wired about Moore's Law serves as a catalyst to launch them into the world.  In particular, two points need some attention, the  notions that Moore's Law 1) is unplanned and unpredictable, and 2) somehow represents the maximum pace of technological innovation.

DNA Sequencing Productivity is Skyrocketing

First up: the productivity curve.  Readers new to these metrics might want to have a look at my first paper on the subject, "The Pace and Proliferation of Biological Technologies" (PDF) from 2003, which describes why I chose to compare the productivity enabled by commercially available sequencing and synthesis instruments to Moore's Law.  (Briefly, Moore's Law is a proxy for productivity; more transistors putatively means more stuff gets done.)  You have to choose some sort of metric when making comparisons across such widely different technologies, and, however much I hunt around for something better, productivity always emerges at the top.

It's been a few years since I updated this chart.  The primary reason for the delay is that, with the profusion of different sequencing platforms, it became somewhat difficult to compare productivity [bases/person/day] across platforms.  Fortunately, a number of papers have come out recently that either directly make that calculation or provide enough information for me to make an estimate.  (I will publish a full bibliography in a paper later this year.  For now, this blog post serves as the primary citation for the figure below.)

carlson_productivity_feb_2013.png

Visual inspection reveals a number of interesting things.  First, the DNA synthesis productivity line stops in about 2008 because there have been no new instruments released publicly since then.  New synthesis and assembly technologies are under development by at least two firms, which have announced they will run centralized foundries and not sell instruments.  More on this later.

Second, it is clear that DNA sequencing platforms are improving very rapidly, now much faster than Moore's Law.  This is interesting in itself, but I point it out here because of the post today at Wired by Pixar co-founder Alvy Ray Smith, "How Pixar Used Moore's Law to Predict the Future".  Smith suggests that "Moore's Law reflects the top rate at which humans can innovate. If we could proceed faster, we would," and that "Hardly anyone can see across even the next crank of the Moore's Law clock."

Moore's Law is a Business Model and is All About Planning -- Theirs and Yours

As I have written previously, early on at Intel it was recognized that Moore's Law is a business model (see the Pace and Proliferation paper, my book, and in a previous post, "The Origin of Moore's Law").  Moore's Law was always about economics and planning in a multi-billion dollar industry.  When I started writing about all this in 2000, a new chip fab cost about $1 billion.  Now, according to The Economist, Intel estimates a new chip fab costs about $10 billion.  (There is probably another Law to be named here, something about exponential increases in cost of semiconductor processing as an inverse function of feature size.  Update: This turns out to be Rock's Law.)  Nobody spends $10 billion without a great deal of planning, and in particular nobody borrows that much from banks or other financial institutions without demonstrating a long-term plan to pay off the loan.   Moreover, Intel has had to coordinate the manufacturing and delivery of very expensive, very complex semiconductor processing instruments made by other companies.  Thus Intel's planning cycle explicitly extends many years into the future; the company sees not just the next crank of the Moore's Law clock, but several cranks.  New technology has certainly been required to achieve these planning goals, but that is just part of the research, development, and design process for Intel.  What is clear from comments by Carver Mead and others is that even if the path was unclear at times, the industry was confident that they could to get to the next crank of the clock.

Moore's Law served a second purpose for Intel, and one that is less well recognized but arguably more important; Moore's Law was a pace selected to enable Intel to win.  That is why Andy Grove ran around Intel pushing for financial scale (see "The Origin of Moore's Law").  I have more historical work to do here, but it is pretty clear that Intel successfully organized an entire industry to move at a pace only it could survive.  And only Intel did survive.  Yes, there are competitors in specialty chips and in memory or GPUs, but as far as high volume, general CPUs go, Intel is the last man standing.  Finally, and alas I don't have a source anywhere for this other than hearsay, Intel could have in fact gone faster than Moore's Law.  Here is the hearsay: Gordon Moore told Danny Hillis who told me that Intel could have gone faster.  (If anybody has a better source for that particular point, give me a yell on Twitter.)  The inescapable conclusion from all this is that the management of Intel made a very careful calculation.  They evaluated product roll-outs to consumers, the rate of new product adoption, the rate of semiconductor processing improvements, and the financial requirements for building the next chip fab line, and then set a pace that nobody else could match but that left Intel plenty of headroom for future products.  It was all about planning.

The reason I bother to point all this out is that Pixar was able to use Moore's Law to "predict the future" precisely because Intel meticulously planned that future.  (Calling Alan Kay: "The best way to predict the future is to invent it.")  Which brings us back to biology.  Whereas Moore's Law is all about Intel and photolithography, the reason that productivity in DNA sequencing is going through the roof is competition among not just companies but among technologies.  And we only just getting started.  As Smith writes in his Wired piece, Moore's Law tells you that "Everything good about computers gets an order of magnitude better every five years."  Which is great: it enabled other industries and companies to plan in the same way Pixar did.  But Moore's Law doesn't tell you anything about any other technology, because Moore's Law was about building a monopoly atop an extremely narrow technology base.  In contrast, there are many different DNA sequencing technologies emerging because many different entrepreneurs and companies are inventing the future.

The first consequence of all this competition and invention is that it makes my job of predicting the future very difficult.  This emphasizes the difference between Moore's Law and Carlson Curves (it still feels so weird to write my own name like that): whereas Intel and the semiconductor industry were meeting planning goals, I am simply keeping track of data.  There is no real industry-wide planning in DNA synthesis or sequencing, other than a race to get to the "$1000 genome" before the next guy.  (Yes, there is a vague road-mappy thing promoted by the NIH that accompanied some of its grant programs, but there is little if any coordination because there is intense competition.)

Biological Technologies are Hard to Predict in Part Because They Are Cheaper than Chips

Compared to other industries, the barrier to entry in biological technologies is pretty low.  Unlike chip fabs, there is nothing in biology that costs $10 billion commercially, nor even $1 billion.  (I have come to mostly disbelieve pharma industry claims that developing drugs is actually that expensive, but that is another story for another time.)  The Boeing 787 reportedly cost $32 billion to develop as of 2011, and that is on top of a century of multi-billion dollar aviation projects that had to come before the 787.

There are two kinds of costs that are important to distinguish here.  The first is the cost of developing and commercializing a particular product.  Based on the money reportedly raised and spent by Life, Illumina, Ion Torrent (before acquisition), Pacific Biosciences, Complete Genomics (before acquisition), and others, it looks like developing and marketing second-generation sequencing technology can cost upwards of about $100 million.  Even more money gets spent, and lost, in operations before anybody is in the black.  My intuition says that the development costs are probably falling as sequencing starts to rely more on other technology bases, for example semiconductor processing and sensor technology, but I don't know of any real data.  I would also guess that nanopore sequencing, should it actually become a commercial product this year, will have cost less to develop than other technologies, but, again, that is my intuition based on my time in clean rooms and at the wet bench.  I don't think there is great information yet here, so I will suspend discussion for the time being.

The second kind of cost to keep in mind is the use of new technologies to get something done.  Which brings in the cost curve.  Again, the forthcoming paper will contain appropriate references.

carlson_cost per_base_oct_2012.png

The cost per base of DNA sequencing has clearly plummeted lately.  I don't think there is much to be made of the apparent slow-down in the last couple of years.  The NIH version of this plot has more fine grained data, and it also directly compares the cost of sequencing with the cost per megabyte for memory, another form of Moore's Law.  Both my productivity plot above and the NIH plot show that sequencing has at times improved much faster than Moore's Law, and generally no slower.

If you ponder the various wiggles, it may be true that the fall in sequencing cost is returning to a slower pace after a period in which new technologies dramatically changed the market.  Time will tell.  (The wiggles certainly make prediction difficult.)  One feature of the rapid fall in sequencing costs is that it makes the slow-down in synthesis look smaller; see this earlier post for different scale plots and a discussion of the evaporating maximum profit margin for long, double-stranded synthetic DNA (the difference between the orange and yellow lines above).

Whereas competition among companies and technologies is driving down sequencing costs, the lack of competition among synthesis companies has contributed to a stagnation in price decreases.  I've covered this in previous posts (and in this Nature Biotech article), but it boils down to the fact that synthetic DNA has become a commodity produced using relatively old technology.

Where Are We Headed?

Now, after concluding that the structure of the industry makes it hard to prognosticate, I must of course prognosticate.  In DNA sequencing, all hell is breaking loose, and that is great for the user.  Whether instrument developers thrive is another matter entirely.  As usual with start-ups and disruptive technologies, surviving first contact with the market is all about execution.  I'll have an additional post soon on how DNA sequencing performance has changed over the years, and what the launch of nanopore sequencing might mean.

DNA synthesis may also see some change soon.  The industry as it exists today is based on chemistry that is several decades old.  The common implementation of that chemistry has heretofore set a floor on the cost of short and long synthetic DNA, and in particular the cost of synthetic genes.  However, at least two companies are claiming to have technology that facilitates busting through that cost floor by enabling the use of smaller amounts of poorer quality, and thus less expensive, synthetic DNA to build synthetic genes and chromosomes.

Gen9 is already on the market with synthetic genes selling for something like $.07 per base.  I am not aware of published cost estimates for production, other than the CEO claiming it will soon drop by orders of magnitude.  Cambrian Genomics has a related technology and its CEO suggests costs will immediately fall by 5 orders of magnitude.  Of course, neither company is likely to drop prices so far at the beginning, but rather will set prices to undercut existing companies and grab market share.  Assuming Gen9 and Cambrian don't collude on pricing, and assuming the technologies work as they expect, the existence of competition should lead to substantially lower prices on genes and chromosomes within the year.  We will have to see how things actually work in the market.  Finally, Synthetic Genomics has announced it will collaborate with IDT to sell synthetic genes, but as far as I am aware nothing new is actually shipping yet, nor have they announced pricing.

So, supposedly we are soon going to have lots more, lots cheaper DNA.  But you have to ask yourself who is going to use all this DNA, and for what.  The important business point here is that both Gen9 and Cambrian Genomics are working on the hypothesis that demand will increase markedly (by orders of magnitude) as the price falls.  Yet nobody can design a synthetic genetic circuit with more than a handful of components at the moment, which is something of a bottleneck on demand.  Another option is that customers will do less up-front predictive design and instead do more screening of variants.  This is how Amyris works -- despite their other difficulties, Amyris does have a truly impressive metabolic screening operation -- and there are several start-ups planning to provide similar (or even improved) high-throughput screening services for libraries of metabolic pathways.  I infer this is the strategy at Synthetic Genomics as well.  This all may work out well for both customers and DNA synthesis providers.  Again, I think people are working on an implicit hypothesis of radically increased demand, and it would be better to make the hypothesis explicit in part to identify the risk of getting it wrong.  As Naveen Jain says, successful entrepreneurs are good at eliminating risk, and I worry a bit that the new DNA synthesis companies are not paying enough attention on this point.

There are relatively simple scaling calculations that will determine the health of the industry.  Intel knew that it could grow financially in the context of exponentially falling transistor costs by shipping exponentially more transistors every quarter -- that is the business model of Moore's Law.  Customers and developers could plan product capabilities, just as Pixar did, knowing that Moore's Law was likely to hold for years to come.  But that was in the context of an effective pricing monopoly.  The question for synthetic gene companies is whether the market will grow fast enough to provide adequate revenues when prices fall due to competition.  To keep revenues up, they will then have to ship lots of bases, probably orders of magnitudes more bases.  If prices don't fall, then something screwy is happening.  If prices do fall, they are likely to fall quickly as companies battle for market share.  It seems like another inevitable race to the bottom.  Probably good for the consumer; probably bad for the producer.

(Updated)  Ultimately, for a new wave of DNA synthesis companies to be successful, they have to provide the customer something of value.  I suspect there will be plenty of academic customers for cheaper genes.  However, I am not so sure about commercial uptake.  Here's why: DNA is always going to be a small cost of developing a product, and it isn't obvious making that small cost even cheaper helps your average corporate lab.

In general, the R part of R&D only accounts for 1-10% of the cost of the final product.  The vast majority of development costs are in polishing up the product into something customers will actually buy.  If those costs are in the neighborhood of $50-100 million, the reducing the cost of synthetic DNA from $50,000 to $500 is nice, but the corporate scientist-customer is more worried about knocking a factor of two, or an order of magnitude, off the $50 million.  This means that in order to make a big impact (and presumably to increase demand adequately) radically cheaper DNA must be coupled to innovations that reduce the rest of the product development costs.  As suggested above, forward design of complex circuits is not going to be adequate innovation any time soon.  The way out here may be high-throughpu t screening operations that enable testing many variant pathways simultaneously.  But note that this is not just another hypothesis about how the immediate future of engineering biology will change, but another unacknowledged hypothesis.  It might turn out to be wrong.

The upshot, just as I wrote in 2003, is that the market dynamics of biological technologies will  remain difficult to predict precisely because of the diversity of technology and the difficulty of the tasks at hand.  We can plan on prices going down; how much, I wouldn't want to predict.