Python is one of the most popular programming languages in existence. Easy to learn and easy to use, it has been around for years, so there is a large community of Python developers to support each other, and it has built up an ecosystem of libraries that allow users to drop in the functionalities they need. It does, however, come with downsides: its programs tend to run slowly, and because it is inefficient at running processes in parallel, it is not well suited to some of the latest artificial intelligence (AI) programming.
Hoping to overcome those difficulties, computer scientist Chris Lattner set out to create a new language, Mojo, which offers the ease of use of Python, but the performance of more complex languages such as C++ or Rust. He teamed up with Tim Davis, whom he had met when they both worked for Google, to form Modular in January 2022. The company, where Lattner is chief executive officer and Davis is president, provides support for companies working on AI and is developing Mojo.
A modern AI programming stack generally has Python on top, Lattner says, but because that is an inefficient language, it has C++ underneath to handle the implementation. The C++ then must communicate with performance accelerators or graphics processing units (GPUs), so developers add a platform such as Compute Unified Device Architecture (CUDA) to make efficient use of those GPUs. "Mojo came from the need to unify these three different parts of the stack so that we could build a unified solution that can scale up and down," Lattner says.
The result is a language with the same syntax as Python, so people used to programming in Python can adopt it with little difficulty, but which, by some measures, can run up to 35,000 times faster. For AI, Mojo is especially fast at performing the matrix multiplications used in many neural networks because it compiles the multiplication code to run directly on the GPU, bypassing CUDA.
Lattner is no stranger to developing programming languages. For his master's thesis at the University of Illinois at Urbana-Champaign, he and some colleagues created LLVM, a set of compiler and programming tools to optimize other programs. He also came up with the Swift programming language for Apple, which allows developers to write their own apps for Apple's iOS operating system.
Jeremy Howard, an honorary professor of computer science at the University of Queensland, Australia, and a co-founder of fast.ai, a company that provides free coding courses and a software library for deep learning applications, says something better than Python is needed for implementing neural networks, which handle a lot of data and therefore need to run fast. Generally speaking, programmers write such programs in languages such as C, C++, or Rust, which then run 100,000 to 1 million times faster than Python, says Howard, who is also an advisor to Modular. "Trouble is that now you've got to do a whole lot of things other than just thinking about how to implement your neural network. You have to think about things like allocating memory and freeing it again and dealing with string termination," he says. "If I want to write something in C, it's going to take maybe 10 times, maybe 100 times longer than writing in Python."
Additionally, GPUs and Tensor Processing Units (TPUs) can run C-based programs much faster than a Central Processing Unit (CPU) can. However, Howard says, it is more difficult to write C for a GPU or TPU than for a CPU. "So now we're talking another couple of orders of magnitude slower development time." While libraries can provide code to speed the development along, they are limited to operations other people already have created, which can stifle innovation, Howard argues.
Those are challenges enough for computer programmers, he says, but there needs to be a language that is usable by the general public, like Python. "Increasingly, code is not being written by computer programmers. It's being written by doctors and journalists and chemists and gamers," Howard says. "All data scientists write code, but very few data scientists would consider themselves professional computer programmers."
Mojo attempts to fill that need by being a superset of Python. A program written in Python can be copied into Mojo and will immediately run faster, the company says. The speedup comes from a variety of factors. For instance, Mojo, like other modern languages, enables threads, small tasks that can be run simultaneously, rather than in sequence. Instead of using an interpreter to execute code as Python does, Mojo uses a compiler to turn the code into assembly language. Mojo also gives developers the option of using static typing, which defines data elements and reduces the number of errors.
One of the factors that slows down Python is its Global Interpreter Lock, which allows only one thread to be executed at a time. That made sense when Python was created in the early 1990s, Howard says, because most people had only one CPU core with which to work. While it is possible to create some parallel processes in Python, doing so is cumbersome, and Python cannot use multiple threads efficiently so it cannot take full advantage of the available hardware. "A phone will have eight CPU cores in it. A modern desktop will have maybe 16. If you can only use one of those, that means you're getting 1/16 of the compute power of the system," Lattner says.
Additionally, he says, "Using a compiler instead of an interpreter gets a whole level of overhead out of the way." That allows a program to run 10 to 20 times faster, without changing the code. Other changes allow programs to run hundreds or thousands of times faster than they do in Python. The company used Mojo to create a Mandelbrot set, a fractal shape that has the same geometry at different scales. While not a practical application, it represents a benchmark, and Mojo was able to create the set 35,000 times as fast as Python could.
Because Python is dynamically typed, the types are not checked until runtime instead of when the code is compiled, which makes the program slower. Mojo allows developers to continue using dynamic typing if they want to, but it also provides the option of static typing. "Static behavior is good because it leads to performance. Static behavior is also good because it leads to more correctness and safety guarantees," Lattner says.
One innovation he added is auto-tuning, in which the programmer provides a range of values for various aspects of the program. They might, for example, specify that a tile could have a size of 2, 4, 8, or 16, or that a particular function could be implemented with any of a variety of methods. The compiler then implements all the different combinations of those variables and runs them to see which one is fastest. That way, the program can be optimized automatically for the particular hardware on which it is to run.
Guido van Rossum, the programmer who created Python and who was known as the language's "benevolent dictator for life" until he stepped back from that role in 2018, says he is interested to watch how Mojo develops and whether it can hit the lofty goals Lattner is setting for it. "If you hear Chris talk about it, Mojo is slated to become a complete superset of Python, where whenever you write just Python code, it will execute in Mojo exactly the same way as it executes in Python, but much faster." He is not yet sure whether Mojo can achieve that, but he emphasizes that the language is in its early stages and, as of July 2023, Mojo had not yet been made available for download.
Van Rossum thinks Mojo might prove more useful for experienced developers who already know how to write efficient code in C++ or Rust. "Someone who is a beginning Python user is not suddenly going to be able to write the type of Mojo code that executes much faster than it would in Python," he says.
In May, Modular made Mojo accessible to some users in a Jupyter notebook, an interactive development environment allowing people to play with the code. The company said it expected to allow downloads in the fall of 2023 (it was released locally for Linux in September, and on MacOS in October), with full release perhaps in the summer of 2024.
Lattner says there may be pieces of Python that do not work in Mojo, but they will be insignificant. He says Mojo relates to Python in the same ways C++ relates to C, with additions such as classes and templates that turned C into a higher-level language. "There are programs you can write in C that do not work the same way or don't even compile in C++, but they're so minuscule that it doesn't matter. The same thing is true in Mojo," he says. "Our goal is be as compatible as possible in all the cases that matter and make sure that we work with the existing ecosystem because we don't want to break Python, we want to make Python better."
Doug Meil, a software architect who has written about new programming languages, says Mojo is essentially Python++ for AI. "He's trying very hard to support Python and meet people where they are, which is I think remarkably pragmatic," Meil says. "They're not coming up with an entirely new syntax, and it's going to be way faster in scale across multiple hardware platforms. So that's really cool."
Further Reading
Mojo: A supercharged Python for AI; https://twimlai.com/podcast/twimlai/mojo-a-supercharged-python-for-ai/.
Don, E.
Getting started with the Mojo programming language for AI. LogRocket. (2023); https://blog.logrocket.com/getting-started-mojo-programming-language/.
Loy, J.
How to Build Your Own Neural Network from Scratch in Python. Towards Data Science. (May 14, 2018); https://towardsdatascience.com/how-to-build-your-own-neural-network-from-scratch-in-python-68998a08e4f6.
Meil, D.
Why are there so many programming languages? BLOG@CACM. (July 5, 2022); https://cacm.acm.org/blogs/blog-cacm/262424-why-are-there-so-many-programming-languages/fulltext.
Yegulalp, S.
A First Look at the Mojo Programming Language. InfoWorld. (June 7, 2023); https://www.infoworld.com/article/3697739/a-first-look-at-the-mojo-language.html.
©2023 ACM 0001-0782/23/12
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from [email protected] or fax (212) 869-0481.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2023 ACM, Inc.
No entries found