ACM

Communications of the ACM

Home/Blogs/BLOG@CACM/Why Scientists and Engineers Must Learn Programming/Full Text

BLOG@CACM

Why Scientists and Engineers Must Learn Programming

By Philip Guo
July 18, 2013
Comments (10)

View as: Print Mobile App Share:

In recent years, there's been an admirable push to get more people to learn programming. But if I've never been exposed to programming, why should I invest all of the effort to learn? What's in it for me?

Pundits often give fuzzy responses like claiming that programming is the "literacy of the twenty-first century," that it helps you become a more empowered citizen, and that it enables you to create magical works of pure creativity.

Even though I agree with many of those claims, I'm not convinced that they're concrete enough to motivate someone to devote the thousands of hours necessary to get proficient at programming.

Instead of trying to convince everyone to learn programming, I have a more modest goal: encouraging scientists and engineers. Here's my value proposition to them:

If you're a scientist or engineer, programming can enable you to work 10 to 100 times faster and to come up with more creative solutions than your colleagues who don't know how to program.

Kevin's Story

Modern-day scientists and engineers are spending more and more of their work days in front of the computer. As an example, consider my friend Kevin, who works in oceanography and mechanical engineering. Whoa, sounds like he's probably spending all day out on high-tech boats rigging together mechanical devices like MacGyver and collecting data from underwater sensors, right? This must be his typical work day -- hard hats and heavy-duty work gloves.

Actually, Kevin spends less than 5% of his time out on the ocean; the other 95% of the time, he's sitting in front of the computer writing programs to clean up, transform, process, and extract insights from data collected out in the field. This is what Kevin looks like at work on most days:

The same story plays out for scientists and engineers in all sorts of fields: astronomers, biologists, physicists, aerospace engineers, economists, geneticists, ecologists, environmental engineers, neuroscientists ... the list goes on and on. Modern-day science and engineering is all about processing, analyzing, and extracting insights from data.

Three Reasons To Learn

Over the past few years, many scientists and engineers have ranted to me about how furious they are that nobody made them learn programming back in high school or college. They now realize how much more productive they could be at work if they had developed those skills earlier.

Based on these conversations, I've come up with three reasons why scientists and engineers must learn programming:

You can work 10 times faster by writing computer programs to automate tedious tasks (such as data cleaning and integration) that you would otherwise need to do by hand. If you know how to program, computer-related tasks that used to take you a week to finish will now take only a few hours. I can't think of any other skill that leads to an instant 10x productivity boost for scientists and engineers.
Programming allows you to discover more creative solutions than your colleagues who don't know how to program. It lets you go beyond simply using the tools and data sets that everyone else around you uses, to transcend the limitations that your peers are stuck with. For example, you'll be able to write programs to automatically acquire data from new sources, to clean, reformat, and integrate that data with your existing data, and to implement far more sophisticated analyses than your colleagues who can only use pre-existing tools. By doing so, you're more likely to make a creative innovation that your colleagues wouldn't even think of exploring due to lack of programming skill.
Finally, knowing how to program allows you to communicate effectively with programmers that your lab hires to do the heavy-duty coding. I don't expect you to become as adept as the professionals, but the more you know about programming, the more you'll be able to relate to them and to command their respect. If you can motivate programmers in your lab to spend more of their time helping you solve technical problems (e.g., by writing parallel programs that run on a compute cluster), you can work 100 times faster than if you had to attack those problems alone.

Postscript

Readers have responded with two main classes of comments:

Scientists and engineers in lots of fields already learn some amount of programming (e.g., in Excel, MATLAB, Mathematica, LabVIEW).
We should strive to create end-user programming tools that make it easy enough for scientists and engineers to do what they need without even knowing that they're programming.

I agree with both of these points. But in the foreseeable future, I think that programming skill will always be positively correlated with creative productivity in many technical fields. Thus, for scientists and engineers who already know some amount of programming, learning more will always provide a competitive advantage over colleagues who aren't as adept. We might someday get to a future where programming as we know it will become as obsolete as calculating integrals by hand, but I doubt that's going to happen anytime soon.

Comments

Nicholas Murphy

July 18, 2013 04:28

FWIW, my research credo (as you may know, Philip) is that we have to go to them, not the other way around. Programming is always a good skill to have, but asking people with immense amounts of domain knowledge (that took years to acquire) to _also_ be proficient coders (another skill it takes a lot of time to learn to be competent at) is simply not feasible. Ideally we should get to a place where the UIs and tools make it so they don't even know they're programming.

Perhaps put another way, I don't believe there's something special about these domains that requires the ability to do general-purpose programming when many other domains have been having to manipulate data for a much longer time and yet don't require programming skills. We've managed to create special-purpose tools that are narrow enough to be relatively easy to use but broad enough to cover a wide number of problems. Think Excel, for instance. Even professions like financial analysts, at worst, have to contend with SQL. And even that, I'd argue, is asking too much.

All that said, perhaps we'd agree that, sure, maybe they do ultimately need to program, but there's still a huge mismatch between the programming tools they currently have and what they need. I'll agree they need to learn to "program" if you agree we potentially need to change what it means for them to "program." ;)

CACM Administrator

July 19, 2013 10:06

Hi Nick,

I'm totally with you. I'd love to get to a world where end-user programming tools for scientists are powerful and usable enough that they can do the kinds of sophisticated analyses that they need without even thinking that they're programming. But as it stands, the amount that can be accomplished by existing tools is fairly limited; thus, as a scientist, having some understanding of programming will always give you a comparative advantage vs. your peers who can use only, say, Excel. When programming skills no longer become a comparative advantage in science, that's one sign that the tools have gotten powerful enough. Maybe an analogy is that nowadays, being able to do surface integrals or geometric proofs or draw pretty scatterplots on paper no longer provide a comparative advantage, since computer-based tools make it easy enough to do so.

Mark Guzdial

July 19, 2013 10:27

Greg Wilson has been trying to define what part of computing that scientists and engineers need to be productive today. He has been evolving his Software Carpentry workshops (http://software-carpentry.org) to focus on what scientists and engineers need, not everything that computer scientists find valuable.

Anonymous

July 19, 2013 01:21

If I may be so bold as to suggest a fourth reason that it is required:
Programming requires you to break big problems down into their smallest discrete components, and then to solve the big problem by systematically solving those smaller components. This is also exactly the kind of thinking that is required 90%* of the time in science and engineering. Once programming has forced you to learn how to think this way, it is far easier to apply it to non-programming problems.

*in the 90/10 perspiration vs inspiration breakdown

CACM Administrator

July 19, 2013 03:42

Thanks for the pointer, Mark! I'm a big fan of Greg's Software Carpentry work and would love to see more efforts to specialize programming curricula for working scientists/engineers.

Many people in the CS Ed community have already done a great job at introducing programming to a variety of audiences -- e.g., elementary/middle school kids, high school robotics aficionados, aspiring digital artists -- and I'd love to see more outreach to the scientific/engineering communities.

--Philip Guo

Michael Monagan

July 23, 2013 02:13

I am reading Phil's statement and I'm skeptical about reason 3 which claims one could work 100 times faster using parallel computers. The computer may run 100 times faster but that won't save me 99% of my time. Usually, computing cycles are not a good measure of productivity. On the other hand, I don't think we need "thousands of hours" to become a programmer. I think it's in the "hundreds of hours". A typical course at a university is less than 40 hours of instruction, and less than 120 hours of assignments. Five programming courses 5 x 160 = 800 hours is enough. That's one semester. That's enough to use Matlab / Maple / Mathematica / C++ proficiently.

Anonymous

July 23, 2013 02:53

Thanks for the article and I agree with what it is said in here.
However, my question would be, any recommendations about what language to learn?
I use MatLab on a daily basis and I have some proficiency on C and Java, enough to understand what is happening (still having issues with pointers hehehe).
What would be the best course of action, try to improve my proficiency on any of these languages or try to learn a new one like python?

Philip Guo

July 23, 2013 04:09

Hi Michael,

My (admittedly unscientific!) claim of being 100x faster using parallel computers is comparing someone who doesn't know how to program with someone who does know how to program and communicates their requests to a staff programmer, who writes the parallel code.

My thinking is that if you understand programming, you can spend , say, 10 hours communicating your requests to a programmer, who then goes off and writes the program for you. If you tried to do that task manually, it could take you well over 1000 hours (100x slower).

Also, *great* point about only needing hundreds of hours, rather than thousands, to achieve a good enough mastery of programming for research purposes.

To anonymous -- that's a hard question to answer (without sounding totally biased). For starters, I'd recommend for you to look at Software Carpentry: http://software-carpentry.org/

Anonymous

July 29, 2013 11:38

To Anonymous,

who writes "However, my question would be, any recommendations about what language to learn?
I use MatLab on a daily basis and I have some proficiency on C and Java, enough to understand what is happening (still having issues with pointers hehehe).
What would be the best course of action, try to improve my proficiency on any of these languages or try to learn a new one like python?" I'd say that that it depends on what you are trying to do.

If most of the researchers you collaborate with are using a particular language or tool, I'd say stick with that and improve your proficiency. If there's a mix of tools, there may be value in learning programming techniques in a "simpler" language.

Make sure you understand how to apply testing in the context of a piece of code to solve a scientific problem. Master version control. Give your code to a colleague and see if they can explain it back to you. Then go on to learning how to optimise and parallelise code - it's far easier to do this when you have good, clean code to start with.

You might find the following pre-print interesting: http://arxiv.org/abs/1210.0530
Declaration of interest: both myself and Greg Wilson (mentioned above) are authors.

Neil

Martin Leisner

September 16, 2014 04:26

Philip,

I'm not sure I agree.

People who use computers should be conversant in what computers to do.

But be able to deal with implementation? Depends on the scope. Even in interpreted languages you can spend a lot of time dealing with implementation.

I think its laughable when everyone tries "to be a programmer". The range of programming ability is enormous. I think its more important to "understand" algoirthms and express them than implement them.

View More Comments