acm-header
Sign In

Communications of the ACM

Kode Vicious

Think Before You Fork


Think Before You Fork illustration

back to top  Dear KV

Until recently I was working on a small open source project that had only three or four maintainers. I sent in a large and extensive patch, and got little or no comment. When I asked for commit access to the project repository I was refused, and was asked to go back and split up my patches into smaller sections for review. All of this back and forth happened over two months! Since the point of open source development and source-control systems is to commit and update early and often, you can imagine that I'm frustrated by this foot dragging on the part of the project's official maintainers. I have considered just forking the project, but I'm not sure if I want to go it alone. I really just want my patches integrated and my new features to be available to the community.

Frustrated and Forking

Back to Top

Dear Forking

Most people who work on open source projects—especially the smaller ones that are not supported by some corporate entity—are, as you probably know, volunteers. Volunteers don't have a great deal of time on their hands, unless they're very wealthy volunteers. Have you met a lot of wealthy volunteers? I thought not.

Submitting a large patch to an open source project with only a small number of possible reviewers is actually pretty rude. You are asking people who don't know you to trust you with their hard-won code base. Breaking down a large patch into small patches is just common courtesy to the maintainers, so they can understand your code in their few free moments and then integrate it without upsetting the quality of the larger code base. Having a patch rebuffed and then asking for commit access to their repo would be seen by most project maintainers as annoying, if not insulting, and a wise project maintainer would stay well away from a prospective committer who acted in this way.

Open source projects are as much social arrangements as they are technical undertakings. The people on the project have to trust each other, and trust is not earned by sending in a single large patch; it is earned over time and via repeated exposure to a person via mailing lists, forums, and other nontechnical communication channels associated with the project.

A good project maintainer will lay out these kinds of guidelines in documentation on the project site, as well as communicating them in personal and group email messages. Certainly you should have taken the first rebuff not as an insult or as some sort of foot dragging but as an invitation to work with the project maintainer to get your code into the system. Breaking down your large patch into smaller, more easily reviewable chunks would simply have been common courtesy.

I have worked on too many projects where someone comes along saying he or she has solved a ton of problems with one big patch, and rarely have I seen the integration of such a patch go well. Often through a lack of testing, or simple shortsightedness as to the knock-on effects of the change, the time it takes to integrate the large patch far outstrips the amount of time it would take to integrate a larger number of smaller, and easier-to-understand, patches.

As to whether or not one should fork a project, I can think of only one clear case where that is usually called for. When a project has gotten to the point of being unmaintained, it is easy to see why you would create a new one if you could not get the previous maintainers simply to hand over the old project to you. Forking a project in other circumstances often makes you look like a petulant and spoiled child who wants to take your toys and go home. Of course, I've worked with enough programmers to know that often that is exactly the case.

KV

Back to Top

Dear KV

I have finally tired of adding command-line switches to a program I have written and now I am going to switch it to using a configuration file. I think a config file will allow me to provide a richer environment to my users, and, I'm running out of both upper- and lowercase letters for command-line switches. One of the problems is there doesn't seem to be a standard for how to write a configuration file parser, and of the several I've seen, each has its warts. Do you have any advice on configuration file parsers?

Configed and Confused

Back to Top

Dear Configed

Name three letters that are not arguments to the ls command! Yes, it's an old joke, but it's funny because it's true.

Depending on which language you're programming in, there may already be a library to help you parse a configuration file. If there is a library, then I strongly suggest you use it. Configuration file parsers are another in a long list of technologies that programmers seem to delight in reimplementing, and, usually, they get it wrong.


Breaking down a large patch into small patches is just common courtesy to the maintainers.


My favorite example of getting it wrong is when white space is made too significant. A configuration file parser that requires spaces in one place but forbids them in another related place is an example of a piece of code that should be taken out, hanged, shot, burned, reconstituted, and then burned again (just for good measure). Although you may think this is a trivial complaint, I can assure you that people continue to write code in this way, because I just dealt last week with a system that does exactly the wrong thing. Alas, I did not select the product in question and was not allowed actually to rid myself of it, so I am now dealing with it. By dealing with it, I mean that I have made my displeasure quite clear. Of course, the person who did select the product sits quite close so he has been subjected to occasional outbursts of cursing every time some bit of white space goes awry. I notice he recently brought in headphones. I think it's time to make sure I can send some live audio data to his computer as well.

KV

Back to Top

Dear KV

Every time I optimize something at work I put the results of my latest benchmarks on our internal wiki. The problem is that people seem to leap onto these measurements to show whatever it is they want to prove, or claim to have proven before, usually without understanding the preliminary nature of some of these measurements. I don't want to hide my work, but it seems that no matter what kind of disclaimer I add, people just ignore it and read into the numbers things that aren't really there.

Disclaimed and Dejected

Back to Top

Dear Disclaimed

Announcing the results of your work, as you have now seen, is actually a lot trickier than you might have thought at first. The problem is people. I know, the problem is always people. While the scientific world has a basic way of handling this problem—which is to get colleagues to review work marked preliminary as, well, preliminary—this kind of peer-review process is not common inside of most companies or work groups. People in companies want results, and if they see something that looks like a result, they are going to jump on it, like a fox on a rabbit, often without checking to see if the rabbit is, in fact, alive.


Depending on which language you're programming in, there may already be a library to help you parse a configuration file.


While it is a fine idea to post preliminary results in an attempt to get some feedback, it is probably best to send these to a small number of people instead of placing them in an easy-to-access place that is open to any peeping Tom, Dick, or Mary. When people write scientific papers they send the first draft off to a small group of trusted friends for feedback. Note that I said trusted, so you might want to make sure these are people you can really trust. If someone passes your results on without asking you, you should not send them preliminary results again. Although you may not be writing a paper on your work, the same advice still applies: email a few people you trust to check your work before announcing to your group or company as a whole what you've come up with.

KV

q stamp of ACM QueueRelated articles
on queue.acm.org

A Plea to Software Vendors from Sysadmins—10 Do's and Don'ts
Thomas A. Limoncelli
http://queue.acm.org/detail.cfiti?id=1921361

A Conversation with Steve Bourne, Eric Allman, and Bryan Cantrill
http://queue.acm.org/detail.cfm?id=1454460

There's No Such Thing as a Free (Software) Lunch
Jay Michaelson
http://queue.acm.org/detail.cfm?id=1005066

Back to Top

Author

George V. Neville-Neil ([email protected]) is the proprietor of Neville-Neil Consulting and a member of the ACM Queue editorial board. He works on networking and operating systems code for fun and profit, teaches courses on various programming-related subjects, and encourages your comments, quips, and code snips pertaining to his Communications column.

Back to Top

Footnotes

DOI: http://doi.acm.org/10.1145/1953122.1953137


Copyright held by author.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2011 ACM, Inc.


 

No entries found