One thing that always fascinated me about Star Trek: The Next Generation is that there were no software developers. There were doctors, engineers, and scientists, but no one whose job was solely about creating software. Somehow, even the Klingon warrior Lieutenant Worf, perhaps better known for his menacing scowls and bat’leth blade skills than for his intellectual prowess, could easily program holodecks and photon torpedoes to do his bidding.
With the advent of Large Language Models (LLMs) and tools like GitHub CoPilot, I think we’re a step closer towards this kind of world, where software can be created by anyone. In fact, my colleague Brad Myers, who has investigated end-user programming for decades, believes that CoPilot is the biggest change to programming since Google search. I think this may be understating things. We’re going to see a wave of new kinds of software programming tools based on LLMs, and they will be transformative in how developers build and evaluate computer systems.
CoPilot is like a smart autocomplete for developers. You start typing in some code, and based on other code nearby and other comments or code you’ve recently entered, it offers suggestions for you. A common scenario is to type in a comment of what you’re trying to do, and in a few seconds CoPilot suggests several lines of code to do that task. Parth Thakkar has a nice analysis of how CoPilot works based on reverse engineering some of the code (see https://thakkarparth007.github.io/copilot-explorer/posts/copilot-internals.html).
A common sentiment among people I’ve talked to about CoPilot is that it feels like it’s reading your mind. One time, I was trying to use a rather confusing API. I was staring rather quizzically at the function call, and CoPilot suggested a code comment that made me laugh out loud: "// TODO what does this mean?" For me, using CoPilot feels like having a superpower. About half the time, it correctly guesses what kind of code I wanted, and sometimes even suggests issues and corner cases I hadn’t considered.
There’s also a lot of fun and silly behaviors I’ve found with CoPilot. I needed a list of URLs to test, and CoPilot started recommending YouTube video links, including one to the famous Nyan cat (https://www.youtube.com/watch?v=QH2-TGUlwu4 if you want to listen to some sugary music and see rainbow colors). A colleague of mine gave it a URL to a leaked version of TSA’s no-fly list and prompted CoPilot to fill out a no-fly list. It didn’t return names of individuals, but somewhat helpfully returned "guns, knives, explosives." You can also have some fun by putting in people’s names. I got "bill gates says we should do this", "richard stallman is a good test case for JS errors", and "jeff dean says that the browser is the best way to get the DOM".
While I’m not as willing as Matt Welsh to declare that the end of programming is nigh (see the January CACM article https://cacm.acm.org/magazines/2023/1/267976-the-end-of-programming/fulltext), we are still going to see significant changes to programming in the near future. Perhaps the surest change we will see in the future is that programming will change from writing code to more about interactively guiding, reading, and reviewing code.
The reading and reviewing code is especially important, and is one of the reasons why it’s unlikely that computer science programs or programming in general will go away for at least a decade or two. While CoPilot can be amazing at times, it can also offer strange suggestions. Some examples I’ve seen are suggesting hard coded directory names that had other people’s user names in it, or connecting to APIs or constants that don’t exist. Even if CoPilot improves from being 50% right to 95% right, it’s always those corner cases that get you. There are also still issues of code maintainability, configuration, compatibility, and versioning that cause lots of grief for developers.
As such, I wouldn’t recommend CoPilot for novice programmers. Novices just don’t have enough experience with understanding code in general, let alone reading and reviewing it. Also, the current version of CoPilot only suggests adding code, rather than removing, simplifying, or refactoring code. Novice developers would likely end up with lots of vestigial code that will just make the software a mess.
If I could offer a short-term wishlist for CoPilot and its future variants, the first item would be to help developers think more about security in their code. CoPilot actually can help you with security if you prompt it to. For example, I wanted to check if a URL was valid or not, and when I started with a comment about what I was trying to achieve, CoPilot usefully suggested some cases that I hadn’t considered. However, it only offers suggestions for security if explicitly prompted.
The second item on the wishlist would be suggestions for removing, simplifying, or refactoring code, as mentioned earlier. CoPilot currently only helps with immediate and proximate tasks on the timescale of seconds or minutes, but not on longer-term issues such as robustness or maintainability, which might be on the order of hours or days.
It would also be incredibly useful if CoPilot could offer options along with a set of tradeoffs, to help developers meaningfully choose between different constraints. For example, in Android, there are three general classes of identifiers: hardware IDs (like MAC address or IMEI), advertising IDs (sort of like browser cookies for Android), and application IDs (which can be created by developers in any way). Hardware IDs are discouraged due to privacy reasons, since they cannot be changed. However, an issue my colleagues and I have seen in past research is that developers tend to do a search and then copy-paste the first solution they find on StackOverflow, which often uses hardware identifiers and is suboptimal for privacy. Helping developers understand these tradeoffs not only leads to better applications, but also better educated developers.
In the long term, perhaps the most important issue with tools like CoPilot is having assurances or guarantees on the code it generates. We barely understand LLMs today, and struggle deeply with issues of bias and what people call hallucinated responses. There are currently no easy ways to specify hard constraints on behaviors, which makes it hard to understand and evaluate how well generated code complies with functional requirements, let alone non-functional requirements that cut across the entirety of an app such as safety, security, privacy, and usability.
Once thing, however, is certain. GitHub CoPilot is just the beginning of a new frontier for software development. Buckle up your safety belts, it’s going to be an interesting ride!
Jason I. Hong is a professor in the School of Computer Science and the Human Computer Interaction Institute of Carnegie Mellon University.
No entries found