acm-header
Sign In

Communications of the ACM

Blogroll


bg-corner

Parsing integers quickly with AVX-512
From Daniel Lemire's Blog

Parsing integers quickly with AVX-512

If I give a programmer a string such as "9223372036854775808" and I ask them to convert it to an integer, they might do the following in C++: std::string s = .....

Transcoding Unicode strings at crazy speeds with AVX-512
From Daniel Lemire's Blog

Transcoding Unicode strings at crazy speeds with AVX-512

In software, we store strings of text as arrays of bytes in memory using one of the Unicode Transformation Formats (UTF), the most popular being UTF-8 and UTF-16...

Science and Technology links (September 2 2023)
From Daniel Lemire's Blog

Science and Technology links (September 2 2023)

Physicists have a published a paper with 5154 authors. The list of authors takes 24 pages out of the 33 pages. The lesson is that if someone tell you that theyContinue...

Transcoding Latin 1 strings to UTF-8 strings at 12 GB/s using AVX-512
From Daniel Lemire's Blog

Transcoding Latin 1 strings to UTF-8 strings at 12 GB/s using AVX-512

Though most strings online today follow the Unicode standard (e.g., using UTF-8), the Latin 1 standard is still in widespread inside some systems (such as browsers)...

Transcoding UTF-8 strings to Latin 1 strings at 12 GB/s using AVX-512
From Daniel Lemire's Blog

Transcoding UTF-8 strings to Latin 1 strings at 12 GB/s using AVX-512

Most strings online are Unicode strings in the UTF-8 format. Other systems (e.g., Java, Microsoft) might prefer UTF-16. However, Latin 1 is still a common encoding...

Coding of domain names to wire format at gigabytes per second
From Daniel Lemire's Blog

Coding of domain names to wire format at gigabytes per second

When you enter in your browser the domain name lemire.me, it eventually gets encoded into a so-called wire format. The name lemire.me contains two labels, one of...

Science and Technology links (August 6 2023)
From Daniel Lemire's Blog

Science and Technology links (August 6 2023)

In an extensive study, You et al. (2022) found that meat consumption was correlated with higher life expectancies: Meat intake is positively correlated with life...

Decoding base16 sequences quickly
From Daniel Lemire's Blog

Decoding base16 sequences quickly

We sometimes represent binary data using the hexadecimal notation. We use a base-16 representation where the first 10 digits are 0, 1, 2, 3, 5, 6, 7, 8, 9 and where...

Science and Technology links (July 23 2023)
From Daniel Lemire's Blog

Science and Technology links (July 23 2023)

People increasingly consume ultra processed foods. They include energy drinks, mass-produced packaged breads, margarines, cereal, energy bars, fruit yogurts, fruit...

Fast decoding of base32 strings
From Daniel Lemire's Blog

Fast decoding of base32 strings

We often need to encode binary data into ASCII strings. The standards (e.g., email) to do so include base16, base32 and base64. There are some research papers on...

Science and Technology links (July 16 2023)
From Daniel Lemire's Blog

Science and Technology links (July 16 2023)

Most people think that they are more intelligent than average. Lack of vitamin C may damage the arteries. Make sure you have enough! A difficult problem in software...

Recognizing string prefixes with SIMD instructions
From Daniel Lemire's Blog

Recognizing string prefixes with SIMD instructions

Suppose that I give you a long list of string tokens (e.g., “A”, “A6”, “AAAA”, “AFSDB”, “APL”, “CAA”, “CDS”, “CDNSKEY”, “CERT”, “CH”, “CNAME”, “CS”, “CSYNC”, “DHC...

Stealth, not secrecy
From Daniel Lemire's Blog

Stealth, not secrecy

The strategy for winning is simple: do good work and tell the world about it. In that order! This implies some level of stealth as you are doing the good work.Continue...

Packing a string of digits into an integer quickly
From Daniel Lemire's Blog

Packing a string of digits into an integer quickly

Suppose that I give you a short string of digits, containing possibly spaces or other characters (e.g., "20141103 012910"). We would like to pack the digits into...

Having fun with string literal suffixes in C++
From Daniel Lemire's Blog

Having fun with string literal suffixes in C++

The C++11 standard introduced used-defined string suffixes. It also added regular  expressions to the C++ language as a standard feature. I wanted to have fun and...

Parsing time stamps faster with SIMD instructions
From Daniel Lemire's Blog

Parsing time stamps faster with SIMD instructions

In software, it is common to represent time as a time-stamp string. It is usually specified by a time format string. Some standards use the format %Y%m%d%H%M%SContinue...

Dynamic bit shuffle using AVX-512
From Daniel Lemire's Blog

Dynamic bit shuffle using AVX-512

Suppose that you want to reorder, arbitrarily, the bits in a 64-bit word. This question was raised on Twitter by @experquisite. Formally, you might want to provide...

Science and Technology links (June 25 2023)
From Daniel Lemire's Blog

Science and Technology links (June 25 2023)

Women in highly religious relationships report the highest levels of relationship quality. US politics is largely divided into two parties (Republicans and Democrats)...

Science and Technology links (June 11 2023)
From Daniel Lemire's Blog

Science and Technology links (June 11 2023)

Similar species can have vastly different lifespan. Researchers have been looking for the limiting factors that explain these differences. As we age, our genesContinue...

Parsing IP addresses crazily fast
From Daniel Lemire's Blog

Parsing IP addresses crazily fast

Most of us are familiar with IP addresses: they are strings of the form “ddd.ddd.ddd.ddd” where ddd is a decimal number of up to three digits in the range 0 toContinue...
Sign In for Full Access
» Forgot Password? » Create an ACM Web Account