acm-header
Sign In

Communications of the ACM

Blogroll


bg-corner

How fast can you validate UTF-8 strings in JavaScript?
From Daniel Lemire's Blog

How fast can you validate UTF-8 strings in JavaScript?

When you recover textual content from the disk or from the network, you may expect it to be a Unicode string in UTF-8. It is the most common format. Unfortunately...

Parsing 8-bit integers quickly
From Daniel Lemire's Blog

Parsing 8-bit integers quickly

Suppose that you want to parse quickly 8-bit integers (0, 1, 2, …, 254, 255) from an ASCII/UTF-8 string. The problem comes up in the simdzone project lead by Jeroen...

A simple WebSocket benchmark in Python
From Daniel Lemire's Blog

A simple WebSocket benchmark in Python

Modern web applications often use the http/https protocols. However, when the server and client needs to talk to each other in a symmetrical fashion, the WebSocket...

A simple WebSocket benchmark in JavaScript: Node.js versus Bun
From Daniel Lemire's Blog

A simple WebSocket benchmark in JavaScript: Node.js versus Bun

Conventional web applications use the http protocol (or the https variant). The http protocol is essentially asymmetrical: a client application such as a browser...

Science and Technology links (November 12 2023)
From Daniel Lemire's Blog

Science and Technology links (November 12 2023)

Vitamin K2 supplements might reduce the risk of myocardial infarction (heart attacks) and of all-cause death (Hasific et al. 2022). You find vitamin K2 in someContinue...

Generating arrays at compile-time in C++ with lambdas
From Daniel Lemire's Blog

Generating arrays at compile-time in C++ with lambdas

Suppose that you want to check whether a character in C++ belongs to a fixed set, such as ‘\0’, ‘\x09’, ‘\x0a’,’\x0d’, ‘ ‘, ‘#’, ‘/’, ‘:’, ‘<‘, ‘>’, ‘?’, ‘@’, ‘...

Appending to an std::string character-by-character: how does the capacity grow?
From Daniel Lemire's Blog

Appending to an std::string character-by-character: how does the capacity grow?

In C++, suppose that you append to a string one character at a time: while(my_string.size() <= 10'000'000) { my_string += "a"; } In theory, it might be possible...

For processing strings, streams in C++ can be slow
From Daniel Lemire's Blog

For processing strings, streams in C++ can be slow

The C++ library has long been organized around stream classes, at least when it comes to reading and parsing strings. But streams can be surprisingly slow. ForContinue...

How many billions of transistors in your iPhone processor?
From Daniel Lemire's Blog

How many billions of transistors in your iPhone processor?

In about 10 years, Apple has multiplied by 19 the number of transistors in its mobile processors. It corresponds roughly to a steady rate of improvement of 34%Continue...

Randomness in programming (with Go code)
From Daniel Lemire's Blog

Randomness in programming (with Go code)

Computer software is typically deterministic on paper: if you run twice the same program with the same inputs, you should get the same outputs. In practice, the...

Parsing integers quickly with AVX-512
From Daniel Lemire's Blog

Parsing integers quickly with AVX-512

If I give a programmer a string such as "9223372036854775808" and I ask them to convert it to an integer, they might do the following in C++: std::string s = .....

Transcoding Unicode strings at crazy speeds with AVX-512
From Daniel Lemire's Blog

Transcoding Unicode strings at crazy speeds with AVX-512

In software, we store strings of text as arrays of bytes in memory using one of the Unicode Transformation Formats (UTF), the most popular being UTF-8 and UTF-16...

Science and Technology links (September 2 2023)
From Daniel Lemire's Blog

Science and Technology links (September 2 2023)

Physicists have a published a paper with 5154 authors. The list of authors takes 24 pages out of the 33 pages. The lesson is that if someone tell you that theyContinue...

Transcoding Latin 1 strings to UTF-8 strings at 12 GB/s using AVX-512
From Daniel Lemire's Blog

Transcoding Latin 1 strings to UTF-8 strings at 12 GB/s using AVX-512

Though most strings online today follow the Unicode standard (e.g., using UTF-8), the Latin 1 standard is still in widespread inside some systems (such as browsers)...

Transcoding UTF-8 strings to Latin 1 strings at 12 GB/s using AVX-512
From Daniel Lemire's Blog

Transcoding UTF-8 strings to Latin 1 strings at 12 GB/s using AVX-512

Most strings online are Unicode strings in the UTF-8 format. Other systems (e.g., Java, Microsoft) might prefer UTF-16. However, Latin 1 is still a common encoding...

Coding of domain names to wire format at gigabytes per second
From Daniel Lemire's Blog

Coding of domain names to wire format at gigabytes per second

When you enter in your browser the domain name lemire.me, it eventually gets encoded into a so-called wire format. The name lemire.me contains two labels, one of...

Science and Technology links (August 6 2023)
From Daniel Lemire's Blog

Science and Technology links (August 6 2023)

In an extensive study, You et al. (2022) found that meat consumption was correlated with higher life expectancies: Meat intake is positively correlated with life...

Decoding base16 sequences quickly
From Daniel Lemire's Blog

Decoding base16 sequences quickly

We sometimes represent binary data using the hexadecimal notation. We use a base-16 representation where the first 10 digits are 0, 1, 2, 3, 5, 6, 7, 8, 9 and where...

Science and Technology links (July 23 2023)
From Daniel Lemire's Blog

Science and Technology links (July 23 2023)

People increasingly consume ultra processed foods. They include energy drinks, mass-produced packaged breads, margarines, cereal, energy bars, fruit yogurts, fruit...

Fast decoding of base32 strings
From Daniel Lemire's Blog

Fast decoding of base32 strings

We often need to encode binary data into ASCII strings. The standards (e.g., email) to do so include base16, base32 and base64. There are some research papers on...
Sign In for Full Access
» Forgot Password? » Create an ACM Web Account