Introduction

Compiled executables

This blog is about comparing the performance of popular compiled languages using two simple micro benchmarks.

I am writing a series of blogs on using different languages to access Oracle databases [eg Python, Node.js, Rust and Julia]. Eventully, I want to compare the performance of various languages accessing Oracle. I have already compared some popular language runtimes and Making Java faster for the same micro benchmarks.

Updated Results

Updated results

I updated the source code and benchmark results:

C# is now significantly faster when using Spans instead of arrays of char
- Substring processing decreased from 307 seconds to 1.4 seconds
- Span uses the stack rather than the managed heap for a speedup of about 200x for this workload
- Many thanks to Marina Sundström for suggesting using Span<T>
Go is now significantly faster when using ReadAt instread of arrays of char
- Substring processing decreased from 1.7 seconds to 30 milliseconds [56x speedup]
- Using Span in C# inspired me to try ReadAt in Go

The compiled languages covered in this blog are:

gcc 11.3 and clang 14.0
g++ 11.3 and clang++ 14.0
C# .NET SDK 6.0 and SDK 7.0.100
Go 1.19.3
Rust 1.64

This blog covers the following topics:

An overview of the compiled languages
Memory Management
The two micro benchmarks that I created
The results
My source code for all of those languages
How I did the builds and tests
How I calculated the results
Summary

This blog is not a tutorial on these computer languages. This is also not a blog on how to download and configure the language tool-chains.

Overview of popular compiled languages

Compiler overview

The following is my opinion, so act accordingly 😉

C [1972] is the oldest language in the group and was created to implement and port operating systems, compilers and utilites
- C is a simple language which is very fast and efficient
- C has seen widespread use in all aspects of computing
- C does not use garbage collection and without great care, developers can create memory leaks and corruptions
C++ [1985] was created to add abstractions to C via objects and multiple inherentence
- C++ is a complicated language that can be fast and efficient
- C++ has seen widespread use in all aspects of computing
- C++ does not use garbage collection and without great care, developers can create memory leaks and corruptions
C# [2000] was created by Microsoft, originally to compete with Java on Windows
- C# is a general purpose object oriented language that now runs on Linux, MacOS and Windows
- C# has seen widespread use in many aspects of computing
- C# uses garbage collection, but can still have issues with unsafe code and thread race conditions
Go [2009] was created by Google as a reaction to the complexity of C++ for cloud development and deployment
- Go is a general purpose language that creates static executables for simple provisioning in the cloud
- Go has been popular in cloud native computing via projects such as Kubernetes, Docker and Terraform
- Go uses garbage collection, but without great care from developers, can still result in data and/or thread race conditions
Rust [2010] was desigend to be fast and safe language for creating Mozilla plugins
- Rust is a fast and efficient, general purpose languge with a fussy/slow compiler which analyzes source code for most issues related to data and thread safety
- Rust is finding popularity in servers, utilities and Web Assembly and is being used by companies like Google, Microsoft, Amazon, Facebook and Discord
- Rust does not use garbage collection, but instead uses a compile time borrow checker

These languages were all designed with different goals and hence they all have different strength and weaknesses. I have a love/hate relationship with all of these languages.

Pick your poison

Trying to determine which is the ‘best language’ is pointless. Your projects, existing source code, experience, tool-chains and biases will determine which lanugage you use.

Memory management

Memory management used to be a hard choice:

Languages like C and C++ were fast, but memory management was manual and error prone
Lanugages like C# and Go were not as fast, but automatic garbage collection tended to make memory management safer

Rust is an example of a language which offers a third choice for memory management:

Fast and safe executable code without garbage collection
The cost is at compile time
- The compiler will spend time analyzing your code to check for memory and thread safety issues
- The Rust compiler will not allow unsafe code
- The Rust compiler increases the compile time for your code
- The Rust compiler increases your learing curve for the Rust language
This does not mean that Rust is the best choice for a compiled language, but it is a new option which changes the memory management calculus

My micro benchmarks

I am not trying to state that one language compiler is better than another. There are many factors that influence which compiler that you choose to use and performance is only one of them.

choosing the best compiler

I needed some trivial workloads, so I chose to use the same micro benchmarks that I used for my blog on Runtime Performance:

Calculate the Fibonacci sequence with an input of 1475, repeated one million times
Some trivial string processing with strings. ie creating, concatenating and using substrings for strings under 2000 characters with a huge number of iterations

How valid are these results

Micro benchmarks are, by definition, only relevant to the specific workload that they cover. These workloads do not try to cover everything, they only cover what I care about. The only workload that matters to you is your workload. So compare your own workloads with your favourite languages. I have found that string processing and simple maths are important to enable fast SQL database drivers, so that is what I tested.

Your milage will vary

Results

Micro benchmarks with compiler executables [smaller is better]

Results

This chart shows the total execution time of my micro benchmarks for simple math and string processing:

Clang 14.0 was faster than gcc 11.3 for this workload
- The optimization flags made a huge difference to the total execution time
Clang++ 14.0 [C++] was faster than g++ 11.3 [C++] for this workload
Clang++ 14.0 [C++] gave the same performance as Clang 14.0 [C] for this workloads when both used the same source code, ie both used arrays of char for the strings
Clang++ 14.0 was significantly slower when C++ strings were used instead of arrays of char
- The substring method was the bottleneck as a new string needed to be allocated in the loop
- When strncpy() was used for the substring operations in C++ it went significantly faster [Mixed]
Rust used strings allocated within loops and yet was just as fast as C using pre-allocated arrays of char for string processing
C# 6.0.403 was significantly faster than C# 7.0.100 for this workload
- 2.8 vs 3.7 seconds

Top 7 results

Interpreted languages like Java can be faster than compiled languages like C# for these micro benchmarks
Dynamic languages like JavaScript when run on the latest version of V8 continue to get more optimizations and go faster
C, Rust and C++ with optimal code and serious LLVM optimizations will tend to be orders of magnitude faster than the other languages

Making substrings faster

Substring optimization

My micro benchmarks stress simple math and string handling
The bottleneck in all of the languages is getting a substring in a loop
- This occurs in the j and k FOR loops in function long_strings
- I want to do a logical substring in these loops
- Is there a faster way of doing this in your favorite language?
  - Please add you code solution to the comments section
You want to avoid allocating objects in a loop as it is logically a slow operation
- I started with substring methods in each language and then optimized my code where possible
- This pathelogical example is hard for any language
  - Rust used substrings and did the benchmark in 3 milliseconds
  - Java used substrings and did the benchmark in 1.4 seconds
  - C# used Span<T> and did the benchmark in 1.4 seconds
  - JavaScript used substrings and did the benchmark in 2.1 seconds

My trivial source code

The Rust Main Function

Rust main

The fibonacci function has an input of 1475 and was called one million times
- Why 1475, to avoid numeric overflow in some of the other languages that I tested this workload against
- I am using the type double [or equivalent] for all languages to avoid numeric overflow for the large numbers from the Fibonacci sequence
Both the strings and long_strings methods are called with an input of 1475
The ‘slow’ Rust compiler took 181 milliseconds to compile this trivial program [ie 140 lines of code for main, fibonacci, strings and long_strings functions]

The Rust Fibonacci Function

Rust Fibonacci function

Why am I using an f64 [double] for the variables?

The values of the Fibonacci sequence rapidly get larger
I also implemented these micro benchmarks in many other languages
Some of these languages had issues with integer overflow for large values in the Fibonacci sequence
So I used the type double to be fair and consistent across all of the languages

I am not using recursion as it is against my religion.

The Rust Strings Function

Rust strings

This function does some trivial operations on strings
- The operations include constructors, append, length, substring and copy
There are three nested loops, so the operations in the inner-most loop are executed about 26 million times
- n = 1475
- The string length is 12 characters
- 1475 * 12 * 1475 = 26,107,500
In Rust, substrings uses two indexes [first .. last]
- Other languages tend to use [first .. length] for substring indexes

The Rust long_strings Function – Part 1

Rust long_strings - part 2

The logic for function long_strings was the same as for function strings, but there were significantly more string concatenation operations.

The fully appended string is 1965 bytes long
Why did I not create the strings outside the loops?
- I wanted to make the comparisons with other languages fair and consistent
- I am not trying to optimize the code for this workload, I am trying to see how common / ‘bad’ code performs
The number of iterations of the string and operations is significantly larger
This also creates the opportunity of garbage collection in other languages

The Rust long_strings Function – Part 2

Rust long_strings - part 2

The ‘j’ for loop iterates based on the length of the string, ie 1965 times
The ‘k’ for loop iterates n times, ie 1475
The outer ‘i’ for loop also iterates n times, ie 1475
1475 * 1965 * 1475 = 4,275,103,125 iterations
So there are 4.2 billion iterations of the ‘k’ loop which creates strings from substrings
In Rust, substrings uses two indexes [first .. last]
- Other languages tend to use [first .. length] for substring indexes

The C Main Function

C main

The fibonacci function has an input of 1475 and was called one million times
- Why 1475, to avoid numeric overflow in some of the other languages which I tested this workload against
- I am using the type double [or equivalent] for all languages to avoid numeric overflow for the large numbers from the Fibonacci sequence
Both the strings and long_strings methods are called with an input of 1475

The C Fibonacci Function

C Fibonacci

Why am I using a double for the variables?

The values of the Fibonacci sequence rapidly get larger
I also implemented these micro benchmarks in many other languages
Some of these languages had issues with integer overflow for large values in the Fibonacci sequence
So I used the type double to be fair and consistent across all of the languages

I am not using recursion as it is against my religion.

The C Strings Function

C strings

This function does some trivial operations on strings
- The operations include creation, append, length, substring and copy
There are three nested loops, so the operations in the inner-most loop are executed about 26 million times
- n = 1475
- The string length is 12 characters
- 1475 * 12 * 1475 = 26,107,500

The C long_strings Function – Part 1

C long_strings - part 1

The logic for function long_strings was the same as for function strings, but there were significantly more string concatenation operations.

The fully appended string is 1965 bytes long
The number of iterations of the string and operations is significantly larger

The C long_strings Function – Part 2

C long_strings - part 2

The ‘j’ for loop iterates based on the length of the string, ie 1965 times
The ‘k’ for loop iterates n times, ie 1475
The outer ‘i’ for loop also iterates n times, ie 1475
1475 * 1965 * 1475 = 4,275,103,125 iterations
So there are 4.2 billion iterations of the ‘k’ loop which creates strings from substrings

C++

The C++ Main Function

C++ main

The fibonacci function has an input of 1475 and was called one million times
- Why 1475, to avoid numeric overflow in some of the other languages that I tested this workload against
- I am using the type double [or equivalent] for all languages to avoid numeric overflow for the large numbers from the Fibonacci sequence
Both the strings and long_strings methods are called with an input of 1475

The C++ Fibonacci Function

C++ Fibonacci function

Why am I using a double for the variables?

The values of the Fibonacci sequence rapidly get larger
I also implemented these micro benchmarks in many other languages
Some of these languages had issues with integer overflow for large values in the Fibonacci sequence
So I used the type double to be fair and consistent across all of the languages

I am not using recursion as it is against my religion.

The C++ Strings Function

C++ strings

This function does some trivial operations on strings
- The operations include constructors, append, length, substring and copy
There are three nested loops, so the operations in the inner-most loop are executed about 26 million times
- n = 1475
- The string length is 12 characters
- 1475 * 12 * 1475 = 26,107,500
The C++ std:Strings substring method is really slow compared to C

The C++ long_strings Function – Part 1

C++ long_strings - part 1

The logic for function long_strings was the same as for function strings, but there were significantly more string concatenation operations.

The fully appended string is 1965 bytes long
The number of iterations of the string and operations is significantly larger

The C++ long_strings Function – Part 2

C++ long_strings - part 2

The ‘j’ for loop iterates based on the length of the string, ie 1965 times
The ‘k’ for loop iterates n times, ie 1475
The outer ‘i’ for loop also iterates n times, ie 1475
1475 * 1965 * 1475 = 4,275,103,125 iterations
So there are 4.2 billion iterations of the ‘k’ loop which creates strings from substrings
The C++ std:Strings substring method was about 200x slower than using strncpy in C with a pre-allocated array of char

The Go Main Function

Go Main

The fibonacci function has an input of 1475 and was called one million times
- Why 1475, to avoid numeric overflow in some of the other languages that I tested this workload against
- I am using the type double [or equivalent] for all languages to avoid numeric overflow for the large numbers from the Fibonacci sequence
Both the strings and long_strings methods are called with an input of 1475

The Go Fibonacci Function

Go Fibioacci

Why am I using a double [float64] for the variables?

The values of the Fibonacci sequence rapidly get larger
I also implemented these micro benchmarks in many other languages
Some of these languages had issues with integer overflow for large values in the Fibonacci sequence
So I used the type double to be fair and consistent across all of the languages

I am not using recursion as it is against my religion.

My Go Strings Function

Go strings

This function does some trivial operations on strings
- The operations include constructors, append, length, substring and copy
There are three nested loops, so the operations in the inner-most loop are executed about 26 million times
- n = 1475
- The string length is 12 characters
- 1475 * 12 * 1475 = 26,107,500
Using ReadAt was significantly faster than using substrings

The Go long_strings Function – Part 1

Go long_strings - part 1

The logic for function long_strings was the same as for function strings, but there were significantly more string concatenation operations.

The fully appended string is 1965 bytes long
The number of iterations of the string and operations is significantly larger
Using StringBuilder was faster than using string for the concatenation operations
The byte array buffer is needed for the ReadAt Method

The Go long_strings Function – Part 2

Go long_strings - part 2

The ‘j’ for loop iterates based on the length of the string, ie 1965 times
The ‘k’ for loop iterates n times, ie 1475
The outer ‘i’ for loop also iterates n times, ie 1475
1475 * 1965 * 1475 = 4,275,103,125 iterations
So there are 4.2 billion iterations of the ‘k’ loop which creates strings from substrings
Using ReadAt was faster than using a substring

The C# Main Function

C# Main

The fibonacci function has an input of 1475 and was called one million times
- Why 1475, to avoid numeric overflow in some of the other languages that I tested this workload against
- I am using the type double [or equivalent] for all languages to avoid numeric overflow for the large numbers from the Fibonacci sequence
Both the strings and long_strings methods are called with an input of 1475

The C# Fibonacci Function

C# Fibonacci function

Why am I using a double for the variables?

The values of the Fibonacci sequence rapidly get larger
I also implemented these micro benchmarks in many other languages
Some of these languages had issues with integer overflow for large values in the Fibonacci sequence
So I used the type double to be fair and consistent across all of the languages

I am not using recursion as it is against my religion.

The C# Strings Function

C# strings

This function does some trivial operations on strings
- The operations include constructors, append, length, substring and copy
There are three nested loops, so the operations in the inner-most loop are executed about 26 million times
- n = 1475
- The string length is 12 characters
- 1475 * 12 * 1475 = 26,107,500
The C# strings substring method is really slow compared to C
The C# Span technique method was significantly faster than using C# substrings

The C# long_strings Function – Part 1

C# long_strings - part 1

The logic for function long_strings was the same as for function strings, but there were significantly more string concatenation operations.

The fully appended string is 1965 bytes long
The number of iterations of the string and operations is significantly larger

The C# long_strings Function – Part 2

C# long_string - part 2

The ‘j’ for loop iterates based on the length of the string, ie 1965 times
The ‘k’ for loop iterates n times, ie 1475
The outer ‘i’ for loop also iterates n times, ie 1475
1475 * 1965 * 1475 = 4,275,103,125 iterations
So there are 4.2 billion iterations of the ‘k’ loop which creates strings from substrings
Using C# Span was significantly faster than using Substring or CopyTo for the concatenation operations

My environment

I repeated these tested on two different machines:

Oracle Linux 8.6 on Oracle Cloud. 4 OCPU with 128 GB RAM
Ubuntu 22.04 on Oracle Cloud. 4 OCPU with 128 GB RAM
As these were VMs, to avoid the risk of a noisy neighbor, I repeated the tests many times over three days
My micro benchmarks were not doing any disk nor network IO. Instead they were CPU bound for a single threaded workload.
As measured by ‘top‘, the VIRT and RSS memory was stable for the duration of the tests and there was 128 GB of RAM

How I built and ran each test

For C

clang fib.c -Oz -o fib
time ./fib

For C++

clang++ fib.cpp -O -o fib
time ./fib

For C#

dotnet publish –configuration Release –runtime linux-x64 –self-contained true
time bin/Release/net6.0/linux-x64/fibStr6

For Go

go build
time ./fibStr

For Rust

cargo build –release
time ./target/release/fib_str

How I calculated the results

On three different days, I did the following:

Run the tests for each runtime 10 times using the Linux time command until I got stable results
I eliminated the highest and lowest results
I took the average of the remaining eight results
- The Linux time command gives a resolution of 1 millisecond
- The fastest results for the three functions took 3 milliseconds
  - This meant that the cost of starting and stopping the C, C++ and Rust processes was a significant factor in the measurement
  - I did not care whether C, C++ or Rust was the fastest as they were all ‘fast enough’
  - I cared more about why my C# and Go code were so much slower
There was always some variation between the runs, however the relative performance was always the same

Summary

Based on my micro benchmarks, C, Rust and C++ gave the fastest performance for these micro benchmarks
C# and Go were significantly slower, likely becuase they use managed memory
Dynamic languages like JavaScript keep getting faster via clever optimizations

Disclaimer: These are my personal thoughts and do not represent Oracle’s official viewpoint in any way, shape, or form.

Executable performance

Introduction

Updated Results

Overview of popular compiled languages

Memory management

My micro benchmarks

How valid are these results

Results

Micro benchmarks with compiler executables [smaller is better]

Top 7 results

Making substrings faster

My trivial source code

The Rust Main Function

The Rust Fibonacci Function

The Rust Strings Function

The Rust long_strings Function – Part 1

The Rust long_strings Function – Part 2

The C Main Function

The C Fibonacci Function

The C Strings Function

The C long_strings Function – Part 1

The C long_strings Function – Part 2

The C++ Main Function

The C++ Fibonacci Function

The C++ Strings Function

The C++ long_strings Function – Part 1

The C++ long_strings Function – Part 2

The Go Main Function

The Go Fibonacci Function

My Go Strings Function

The Go long_strings Function – Part 1

The Go long_strings Function – Part 2

The C# Main Function

The C# Fibonacci Function

The C# Strings Function

The C# long_strings Function – Part 1

The C# long_strings Function – Part 2

My environment

How I built and ran each test

For C

For C++

For C#

For Go

For Rust

How I calculated the results

Summary

Authors

Doug Hood

Oracle AI Vector Search Product Manager

Executable performance - Part 2

Using Python with Oracle Databases