Introduction

Compiled executables

This blog is about comparing the performance of popular compiled languages using two simple micro benchmarks.

I am writing a series of blogs on using different languages to access Oracle databases [eg Python, Node.jsRust and Julia].  Eventully, I want to compare the performance of various languages accessing Oracle. I have already compared some popular language runtimes and Making Java faster for the same micro benchmarks.

 

 

Updated Results

Updated results

I updated the source code and benchmark results:

  • C# is now significantly faster when using Spans instead of arrays of char
    • Substring processing decreased from 307 seconds to 1.4 seconds
    • Span uses the stack rather than the managed heap for a speedup of about 200x for this workload
    • Many thanks to Marina Sundström for suggesting using Span<T>
  • Go is now significantly faster when using ReadAt instread of arrays of char
    • Substring processing decreased from 1.7 seconds to 30 milliseconds [56x speedup]
    • Using Span in C# inspired me to try ReadAt in Go

 

 

The compiled languages covered in this blog are:

 

This blog covers the following topics:

  • An overview of the compiled languages
  • Memory Management
  • The two micro benchmarks that I created
  • The results
  • My source code for all of those languages
  • How I did the builds and tests
  • How I calculated the results
  • Summary

 

This blog is not a tutorial on these computer languages.  This is also not a blog on how to download and configure the language tool-chains.

 

 

 

Overview of popular compiled languages

Compiler overview

The following is my opinion, so act accordingly 😉

  • C [1972] is the oldest language in the group and was created to implement and port operating systems, compilers and utilites
    • C is a simple language which is very fast and efficient
    • C has seen widespread use in all aspects of computing
    • C does not use garbage collection and without great care, developers can create memory leaks and corruptions
  • C++ [1985] was created to add abstractions to C via objects and multiple inherentence
    • C++ is a complicated language that can be fast and efficient
    • C++ has seen widespread use in all aspects of computing
    • C++ does not use garbage collection and without great care, developers can create memory leaks and corruptions
  • C# [2000] was created by Microsoft, originally to compete with Java on Windows
    • C# is a general purpose object oriented language that now runs on Linux, MacOS and Windows
    • C# has seen widespread use in many aspects of computing
    • C# uses garbage collection, but can still have issues with unsafe code and thread race conditions
  • Go [2009] was created by Google as a reaction to the complexity of C++ for cloud development and deployment
    • Go is a general purpose language that creates static executables for simple provisioning in the cloud
    • Go has been popular in cloud native computing via projects such as Kubernetes, Docker and Terraform
    • Go uses garbage collection, but without great care from developers, can still result in data and/or thread race conditions
  • Rust [2010] was desigend to be fast and safe language for creating Mozilla plugins
    • Rust is a fast and efficient, general purpose languge with a fussy/slow compiler which analyzes source code for most issues related to data and thread safety
    • Rust is finding popularity in servers, utilities and Web Assembly and is being used by companies like Google, Microsoft, Amazon, Facebook and Discord
    • Rust does not use garbage collection, but instead uses a compile time borrow checker

 

These languages were all designed with different goals and hence they all have different strength and weaknesses.  I have a love/hate relationship with all of these languages.

Pick your poison

Trying to determine which is the ‘best language’ is pointless. Your projects, existing source code, experience, tool-chains and biases will determine which lanugage you use.

 

 

 

Memory management

Memory management

Memory management used to be a hard choice:

  • Languages like C and C++ were fast, but memory management was manual and error prone
  • Lanugages like C# and Go were not as fast, but automatic garbage collection tended to make memory management safer

 

Rust is an example of a language which offers a third choice for memory management:

  • Fast and safe executable code without garbage collection
  • The cost is at compile time
    • The compiler will spend time analyzing your code to check for memory and thread safety issues
    • The Rust compiler will not allow unsafe code
    • The Rust compiler increases the compile time for your code
    • The Rust compiler increases your learing curve for the Rust language
  • This does not mean that Rust is the best choice for a compiled language, but it is a new option which changes the memory management calculus

 

 

 

My micro benchmarks

I am not trying to state that one language compiler is better than another.  There are many factors that influence which compiler that you choose to use and performance is only one of them.

choosing the best compiler

I needed some trivial workloads, so I chose to use the same micro benchmarks that I used for my blog on Runtime Performance:

  • Calculate the Fibonacci sequence with an input of 1475, repeated one million times
  • Some trivial string processing with strings. ie creating, concatenating and using substrings for strings under 2000 characters with a huge number of iterations

 

 

 

How valid are these results

Micro benchmarks are, by definition, only relevant to the specific workload that they cover.  These workloads do not try to cover everything, they only cover what I care about. The only workload that matters to you is your workload.  So compare your own workloads with your favourite languages.  I have found that string processing and simple maths are important to enable fast SQL database drivers, so that is what I tested.

Your milage will vary

 

 

 

Results

 

Micro benchmarks with compiler executables [smaller is better]

Results

This chart shows the total execution time of my micro benchmarks for simple math and string processing:

  • Clang 14.0 was faster than gcc 11.3 for this workload
    • The optimization flags made a huge difference to the total execution time
  • Clang++ 14.0 [C++] was faster than g++ 11.3 [C++] for this workload
  • Clang++ 14.0 [C++] gave the same performance as Clang 14.0 [C] for this workloads when both used the same source code, ie both used arrays of char for the strings
  • Clang++ 14.0 was significantly slower when C++ strings were used instead of arrays of char
    • The substring method was the bottleneck as a new string needed to be allocated in the loop
    • When strncpy() was used for the substring operations in C++ it went significantly faster [Mixed]
  • Rust used strings allocated within loops and yet was just as fast as C using pre-allocated arrays of char for string processing
  • C# 6.0.403 was significantly faster than C# 7.0.100 for this workload
    • 2.8 vs 3.7 seconds

 

 

Top 7 results

Top 7 results

  • Interpreted languages like Java can be faster than compiled languages like C# for these micro benchmarks
  • Dynamic languages like JavaScript when run on the latest version of V8 continue to get more optimizations and go faster
  • C, Rust and C++ with optimal code and serious LLVM optimizations will tend to be orders of magnitude faster than the other languages

 

 

 

Making substrings faster

Substring optimization

  • My micro benchmarks stress simple math and string handling
  • The bottleneck in all of the languages is getting a substring in a loop
    • This occurs in the j and k FOR loops in function long_strings
    • I want to do a logical substring in these loops
    • Is there a faster way of doing this in your favorite language?
      • Please add you code solution to the comments section
  • You want to avoid allocating objects in a loop as it is logically a slow operation
    • I started with substring methods in each language and then optimized my code where possible
    • This pathelogical example is hard for any language
      • Rust used substrings and did the benchmark in 3 milliseconds
      • Java used substrings and did the benchmark in 1.4 seconds
      • C# used Span<T> and did the benchmark in 1.4 seconds
      • JavaScript used substrings and did the benchmark in 2.1 seconds

 

 

 

My trivial source code

 

Rust

The Rust Main Function

Rust main

  • The fibonacci function has an input of 1475 and was called one million times
    • Why 1475, to avoid numeric overflow in some of the other languages that I tested this workload against
    • I am using the type double [or equivalent] for all languages to avoid numeric overflow for the large numbers from the Fibonacci sequence
  • Both the strings and long_strings methods are called with an input of 1475
  • The ‘slow’ Rust compiler took 181 milliseconds to compile this trivial program [ie 140 lines of code for main, fibonacci, strings and long_strings functions]

 

 

The Rust Fibonacci Function

Rust Fibonacci function

Why am I using an f64 [double] for the variables?

  • The values of the Fibonacci sequence rapidly get larger
  • I also implemented these micro benchmarks in many other languages
  • Some of these languages had issues with integer overflow for large values in the Fibonacci sequence
  • So I used the type double to be fair and consistent across all of the languages

I am not using recursion as it is against my religion.

 

 

The Rust Strings Function

Rust strings

  • This function does some trivial operations on strings
    • The operations include constructors, append, length, substring and copy
  • There are three nested loops, so the operations in the inner-most loop are executed about 26 million times
    • n = 1475
    • The string length is 12 characters
    • 1475 * 12 * 1475 = 26,107,500
  • In Rust, substrings uses two indexes [first .. last]
    • Other languages tend to use [first .. length] for substring indexes

 

 

The Rust long_strings Function – Part 1

Rust long_strings - part 2

The logic for function long_strings was the same as for function strings, but there were significantly more string concatenation operations.

  • The fully appended string is 1965 bytes long
  • Why did I not create the strings outside the loops?
    • I wanted to make the comparisons with other languages fair and consistent
    • I am not trying to optimize the code for this workload, I am trying to see how common / ‘bad’ code performs
  • The number of iterations of the string and operations is significantly larger
  • This also creates the opportunity of garbage collection in other languages

 

 

The Rust long_strings Function – Part 2

Rust long_strings - part 2

  • The ‘j’ for loop iterates based on the length of the string, ie 1965 times
  • The ‘k’ for loop iterates n times, ie 1475
  • The outer ‘i’ for loop also iterates n times, ie 1475
  • 1475 * 1965 * 1475 = 4,275,103,125 iterations
  • So there are 4.2 billion iterations of the ‘k’ loop which creates strings from substrings
  • In Rust, substrings uses two indexes [first .. last]
    • Other languages tend to use [first .. length] for substring indexes

 

 

 

 

C

The C Main Function

C main

  • The fibonacci function has an input of 1475 and was called one million times
    • Why 1475, to avoid numeric overflow in some of the other languages which I tested this workload against
    • I am using the type double [or equivalent] for all languages to avoid numeric overflow for the large numbers from the Fibonacci sequence
  • Both the strings and long_strings methods are called with an input of 1475

 

 

The C Fibonacci Function

C Fibonacci

Why am I using a double for the variables?

  • The values of the Fibonacci sequence rapidly get larger
  • I also implemented these micro benchmarks in many other languages
  • Some of these languages had issues with integer overflow for large values in the Fibonacci sequence
  • So I used the type double to be fair and consistent across all of the languages

I am not using recursion as it is against my religion.

 

 

The C Strings Function

C strings

  • This function does some trivial operations on strings
    • The operations include creation, append, length, substring and copy
  • There are three nested loops, so the operations in the inner-most loop are executed about 26 million times
    • n = 1475
    • The string length is 12 characters
    • 1475 * 12 * 1475 = 26,107,500

 

 

The C long_strings Function – Part 1

C long_strings - part 1

The logic for function long_strings was the same as for function strings, but there were significantly more string concatenation operations.

  • The fully appended string is 1965 bytes long
  • The number of iterations of the string and operations is significantly larger

 

 

The C long_strings Function – Part 2

C long_strings - part 2

  • The ‘j’ for loop iterates based on the length of the string, ie 1965 times
  • The ‘k’ for loop iterates n times, ie 1475
  • The outer ‘i’ for loop also iterates n times, ie 1475
  • 1475 * 1965 * 1475 = 4,275,103,125 iterations
  • So there are 4.2 billion iterations of the ‘k’ loop which creates strings from substrings

 

 

 

 

C++

 

The C++ Main Function

C++ main

  • The fibonacci function has an input of 1475 and was called one million times
    • Why 1475, to avoid numeric overflow in some of the other languages that I tested this workload against
    • I am using the type double [or equivalent] for all languages to avoid numeric overflow for the large numbers from the Fibonacci sequence
  • Both the strings and long_strings methods are called with an input of 1475

 

 

The C++ Fibonacci Function

C++ Fibonacci function

Why am I using a double for the variables?

  • The values of the Fibonacci sequence rapidly get larger
  • I also implemented these micro benchmarks in many other languages
  • Some of these languages had issues with integer overflow for large values in the Fibonacci sequence
  • So I used the type double to be fair and consistent across all of the languages

I am not using recursion as it is against my religion.

 

 

The C++ Strings Function

C++ strings

  • This function does some trivial operations on strings
    • The operations include constructors, append, length, substring and copy
  • There are three nested loops, so the operations in the inner-most loop are executed about 26 million times
    • n = 1475
    • The string length is 12 characters
    • 1475 * 12 * 1475 = 26,107,500
  • The C++ std:Strings substring method is really slow compared to C

 

 

The C++ long_strings Function – Part 1

C++ long_strings - part 1

The logic for function long_strings was the same as for function strings, but there were significantly more string concatenation operations.

  • The fully appended string is 1965 bytes long
  • The number of iterations of the string and operations is significantly larger

 

 

The C++ long_strings Function – Part 2

C++ long_strings - part 2

  • The ‘j’ for loop iterates based on the length of the string, ie 1965 times
  • The ‘k’ for loop iterates n times, ie 1475
  • The outer ‘i’ for loop also iterates n times, ie 1475
  • 1475 * 1965 * 1475 = 4,275,103,125 iterations
  • So there are 4.2 billion iterations of the ‘k’ loop which creates strings from substrings
  • The C++ std:Strings substring method was about 200x slower than using strncpy in C with a pre-allocated array of char

 

 

 

 

 

Go

 The Go Main Function

Go Main

  • The fibonacci function has an input of 1475 and was called one million times
    • Why 1475, to avoid numeric overflow in some of the other languages that I tested this workload against
    • I am using the type double [or equivalent] for all languages to avoid numeric overflow for the large numbers from the Fibonacci sequence
  • Both the strings and long_strings methods are called with an input of 1475

 

 

The Go Fibonacci Function

Go Fibioacci

Why am I using a double [float64] for the variables?

  • The values of the Fibonacci sequence rapidly get larger
  • I also implemented these micro benchmarks in many other languages
  • Some of these languages had issues with integer overflow for large values in the Fibonacci sequence
  • So I used the type double to be fair and consistent across all of the languages

I am not using recursion as it is against my religion.

 

 

My Go Strings Function

Go strings

  • This function does some trivial operations on strings
    • The operations include constructors, append, length, substring and copy
  • There are three nested loops, so the operations in the inner-most loop are executed about 26 million times
    • n = 1475
    • The string length is 12 characters
    • 1475 * 12 * 1475 = 26,107,500
  • Using ReadAt was significantly faster than using substrings

 

 

The Go long_strings Function – Part 1

Go long_strings - part 1

The logic for function long_strings was the same as for function strings, but there were significantly more string concatenation operations.

  • The fully appended string is 1965 bytes long
  • The number of iterations of the string and operations is significantly larger
  • Using StringBuilder was faster than using string for the concatenation operations
  • The byte array buffer is needed for the ReadAt Method

 

 

The Go long_strings Function – Part 2

Go long_strings - part 2

  • The ‘j’ for loop iterates based on the length of the string, ie 1965 times
  • The ‘k’ for loop iterates n times, ie 1475
  • The outer ‘i’ for loop also iterates n times, ie 1475
  • 1475 * 1965 * 1475 = 4,275,103,125 iterations
  • So there are 4.2 billion iterations of the ‘k’ loop which creates strings from substrings
  • Using ReadAt was faster than using a substring

 

 

 

 

C#

The C# Main Function

C# Main

  • The fibonacci function has an input of 1475 and was called one million times
    • Why 1475, to avoid numeric overflow in some of the other languages that I tested this workload against
    • I am using the type double [or equivalent] for all languages to avoid numeric overflow for the large numbers from the Fibonacci sequence
  • Both the strings and long_strings methods are called with an input of 1475

 

 

The C# Fibonacci Function

C# Fibonacci function

Why am I using a double for the variables?

  • The values of the Fibonacci sequence rapidly get larger
  • I also implemented these micro benchmarks in many other languages
  • Some of these languages had issues with integer overflow for large values in the Fibonacci sequence
  • So I used the type double to be fair and consistent across all of the languages

I am not using recursion as it is against my religion.

 

 

The C# Strings Function

C# strings

  • This function does some trivial operations on strings
    • The operations include constructors, append, length, substring and copy
  • There are three nested loops, so the operations in the inner-most loop are executed about 26 million times
    • n = 1475
    • The string length is 12 characters
    • 1475 * 12 * 1475 = 26,107,500
  • The C# strings substring method is really slow compared to C
  • The C# Span technique method was significantly faster than using C# substrings

 

 

 

The C# long_strings Function – Part 1

C# long_strings - part 1

The logic for function long_strings was the same as for function strings, but there were significantly more string concatenation operations.

  • The fully appended string is 1965 bytes long
  • The number of iterations of the string and operations is significantly larger

 

 

The C# long_strings Function – Part 2

C# long_string - part 2

  • The ‘j’ for loop iterates based on the length of the string, ie 1965 times
  • The ‘k’ for loop iterates n times, ie 1475
  • The outer ‘i’ for loop also iterates n times, ie 1475
  • 1475 * 1965 * 1475 = 4,275,103,125 iterations
  • So there are 4.2 billion iterations of the ‘k’ loop which creates strings from substrings
  • Using C# Span was significantly faster than using Substring or CopyTo for the concatenation operations

 

 

 

My environment

I repeated these tested on two different machines:

  • Oracle Linux 8.6 on Oracle Cloud. 4 OCPU with 128 GB RAM
  • Ubuntu 22.04 on Oracle Cloud. 4 OCPU with 128 GB RAM
  • As these were VMs, to avoid the risk of a noisy neighbor, I repeated the tests many times over three days
  • My micro benchmarks were not doing any disk nor network IO. Instead they were CPU bound for a single threaded workload.
  • As measured by ‘top‘, the VIRT and RSS memory was stable for the duration of the tests and there was 128 GB of RAM

 

 

 

How I built and ran each test

 

For C

  • clang fib.c -Oz -o fib
  • time ./fib

 

For C++

  • clang++ fib.cpp -O -o fib
  • time ./fib

 

For C#

  • dotnet publish –configuration Release –runtime linux-x64 –self-contained true
  • time bin/Release/net6.0/linux-x64/fibStr6

 

For Go

  • go build
  • time ./fibStr

 

For Rust

  • cargo build –release
  • time ./target/release/fib_str

 

 

 

How I calculated the results

On three different days, I did the following:

  • Run the tests for each runtime 10 times using the Linux time command until I got stable results
  • I eliminated the highest and lowest results
  • I took the average of the remaining eight results
    • The Linux time command gives a resolution of 1 millisecond
    • The fastest results for the three functions took 3 milliseconds
      • This meant that the cost of starting and stopping the C, C++ and Rust processes was a significant factor in the measurement
      • I did not care whether C, C++ or Rust was the fastest as they were all ‘fast enough’
      • I cared more about why my C# and Go code were so much slower
  • There was always some variation between the runs, however the relative performance was always the same

 

 

 

Summary

  • Based on my micro benchmarks, C, Rust and C++ gave the fastest performance for these micro benchmarks
  • C# and Go were significantly slower, likely becuase they use managed memory
  • Dynamic languages like JavaScript keep getting faster via clever optimizations

 

 

Disclaimer: These are my personal thoughts and do not represent Oracle’s official viewpoint in any way, shape, or form.