[HTCondor-devel] CppCon 2021 trip report and lessons learned


Date: Wed, 10 Nov 2021 15:14:05 -0600
From: Greg Thain <gthain@xxxxxxxxxxx>
Subject: [HTCondor-devel] CppCon 2021 trip report and lessons learned


I went to CppCon 2021 in Denver the week of October 24th through 30th. I learned a lot about C++ language, tools and software in general. This may be long, but I wanted to memorialize what I learned, and how we can improve the HTCondor and PATh software. I'm curious what other PATh software we are responsible for that is written in C++ that we should apply these lessons to: scitokens? Xrootd?

Overview

C++ and its ecosystem has improved tremendously since we started HTCondor, but, for the most part, neither the code nor our knowledge of the tools has kept up. ÂThere are many things we can do with the newer tools to improve the speed we develop new features, to improve the safety and security of our code, and its performance. Currently, HTCondor uses C++ 11, but 14, 17, and 20 have been released. Working to increase the minimum level of C++ on our platforms is something we should work for, even if that means using non-stock compilers. I'm pretty sure that we should be able to get to a minimum of C++ 17 across all of our platforms today, and reap a lot of safety and other language benefits.

libfuzzer

One of the tools we should look more into is clang's profile-driven fuzzer tools:Âhttps://llvm.org/docs/LibFuzzer.htmlÂ
If you compile your code with special profiling flags, link with the fuzzer library, and write one function that takes a string, and does something with it, the framework handles the rest. It keeps track of all the branches your code took when doing the something with the string, and tries modifying the input string in ways that cause your code to take different branches. I wrote a simple function that parses a string as a classad. In a few seconds, it found a bug in the classad parser that caused a crash. After fixing that, it found three others. (I've subsequently pushed fixes for these). (Don't tell Bart we're only now fuzzing HTCondor). Are there other areas of the code that would be easy to fuzz test?

vectors vs lists

An issue that was mentioned several times, is that pointers are now 8 bytes long generally, and much, much more expensive to traverse than when we started writing HTCondor. It can often cost 100s of cycles to traverse a single pointer that isn't in L1 or L2 cache, on the same machine that can multiply 8 floating point numbers in a single cycle. As a result, a linked list is almost never the best data structure to use. Typically, linked lists are now only best when there are thousands of entries with frequent insertions in the middle. Even for containers with hundreds of elements, an array or vector is smaller, and the cost of insertion in the middle can be less than a linked list. This is also true for hashtables, and again, for data structures of less than a thousand elements, a sorted array or vector is not just smaller, but also faster than a hash table. To test this, I changed our classad code to store the map of attribute names to _expression_ trees from a hash table of string to _expression_ Trees to a sorted vector of [string, _expression_ Tree] pairs. A quick test shows that the collector was about 20 % smaller, and no slower. Representing classads and other data structures this way also opens us to easily and effectively using some of the standard c++ algorithm functions that operate on sorted ranges, like set_union and set_intersection, which we do all the time with classad attribute include and exclude sets. This leads us to...

Standard C++ library algorithms

Another best practice that applies to us concerns the standard C++ library algorithms, which have existed in the standard for 20+ years. We rarely use any of these algorithms in HTCSS code, to the detriment of the clarity and correctness of the code. For example, there are a lot of places in HTCondor where we remove duplicate entries from a collection, and there is a standard C++ algorithm: std::unique that implements this, but there are zero uses of std::unique in HTCSS. We don't need to go on a rampage and redo existing code, but this is something that we should be aware of for new work, and something the code reviewers should keep in mind. There are something like 100 algorithms, and I think we use 3: sort, remove and find. Consider the following example of real HTCondor code -- when we put a classad on the wire, we need to know how many attributes there are, as we put the count on the wire first. The current (very old) code does this, is a bit tricky because of chained ads and privacy:


int private_count = 0;
int num_exprs = 0;
for(int pass = 0; pass < 2; pass++){
/*
* Count the number of chained attributes on the first
* Â pass (if any!), then the number of attrs in this classad on
* Â pass number 2.
*/
if(pass == 0){
 if(!haveChainedAd){
 continue;
}
 itor = chainedAd->begin();
 itor_end = chainedAd->end();
}
else {
 itor = ad.begin();
 itor_end = ad.end();
}

for(;itor != itor_end; itor++) {
 std::string const &attr = itor->first;

 if(!exclude_private || !(ClassAdAttributeIsPrivate(attr) ||
 Â(encrypted_attrs && (encrypted_attrs->find(attr) != encrypted_attrs->end()))))Â
 {
  numExprs++;
 }Â
}
}


But with the standard c++ algorithm count_if, this could be written like...

// return true if this attr should be sent on the wire
static bool isAttrVisible(const pair<string, ExprTree*> &attr) {
 return !ClassAdAttributeIsPrivate(attr.first) ||
     (encrypted_attrs && (encrypted_attrs->find(attr.first) != encrypted_attrs.end()));
}

static size_t numVisibleAttrs(bool exclude_private, const Classad &ad) {
 if (exclude_private) {
   returnÂstd::count_if(ad.begin(), ad.end(), isAttrVisible);
  } else {
   return ad.size();
  }
}

size_t num_exprs = 0;
if (haveChainedAd) {
 num_exprsÂ+= numVisibleAttrs(exclude_private, chainedAd);
}
num_exprsÂ+= numVisibleAttrs(exclude_private, ad);

And, even better, we've extracted separate functions which we can write unit tests for. Again, I'm not advocating that we go back and re-write all of HTCondor, but I would recommend writing in this style for new code which we touch.

clang-refactor

clang-refactor is a tool that automatically refactors C++ source code to use newer, safer features, like range-based for loops, the nullptr literal, and other things. This would be a big change to the source code, so it would create merge conflicts. Therefore, we should carefully schedule the running of clang-refactor. Many large organizations with much, much larger code bases that ours report using automated refactoring tools regularly, so we should not be afraid of this. In particular, we have been so afraid to make large-scale changes to our code, that we execute awful hacks to work around even simple renaming. For example, our logging and debugging function is named dprintf, which we wrote in the ancient times. For dozens of years now, there has been a conflict with the glibc function with the same name and signature. Instead of biting the bullet, and renaming our function (or putting our in a C++ namespace), we got out of our way to try to use C macros before and after including libc headers to hide their dprintf symbol. Including headers in the wrong order results in surprising compile and link time errors. We shouldn't be afraid to do wide scale automated renaming to fix these problems instead of poisoning system headers.

std::format and libfmt
And speaking of renaming dprintf, C++20 gets a much better library for formatting strings, more typesafe and faster than [s]printf, but also extensible and looking a bit like python, named std::format. Better yet, it is substantially similar to an open source library which works with C++ 14, libfmt. I think it would not be too hard to mechanically translate more (if not all) of our dprintf calls to use this new syntax, and avoid all the bugs and compiler warnings around things like figuring out if time_t is 64 or 32 bits on any given platform, so instead of writing

dprintf(D_ALWAYS, "The timestamp is %lu\n", some_timestamp);

we could write

dfmt(D_ALWAYS, "The timestamp is {}\n", some_timestamp);Â

and the correct type is chosen, we don't need to try to match the type to the variable. Better yet, this is extensible, so we could write a formatter for classads, and be able to write

dfmt(D_ALWAYS, "The job ad is {}\n", job_ad);

even better, you can pass arguments to the formatter, so if our classad printer took a "P" argument, to show the private attributes, say, we could write

append_string_to_history_file(format("{:P}", job_ad));

note that the library matches the type to the expansion at compile time, so you don't get segfaults or surprises if you pass a pointer to something that needs a value or vice-versa, as we do today with *printf.

GSL
It is well understood in the C++ community that the language is already very complex, and adding new features to it isn't really making it easier to use. To combat this, Bjarne Stroustrup has been writing the "Core Guidelines", (https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines) which are a set of principles for writing C++ and focusing on the subset of features that work well together and are the safest. This isn't really a coding standard -- it doesn't talk about what case to use in identifiers, but this is a document we should all be familiar with, and I highly recommend that we all read and understand it, again, even if we can't change the HTCondor code overnight.

std::function
Another nice library addition in C++11 is the class std::function. This object wraps a callback, and can be constructed from a C function, a C++11 lambda, a C++ method, or a functor object. Replacing our sketchy daemon core callbacks, where we explicitly register either a c function, or a subclass of service would allow for much safer code. This is because instead of passing one void * in at registration time, the per-callback data can be stored in a type-safe capture of a lambda. Also, each daemon core callback doesn't need to store both a function pointer and a method pointer, and decide which one to use, but just a single instance of a std::function.

Compile-time programming

C++ 11 has rudimentary support for "constexpr", which allows very simple functions to run at compile time. C++ 14, 17 and 20 progressively expand what kind of functions can be evaluated at compile time. This is not just important for performance, but for correctness. I believe with C++20, we could mark our classad parser as constexpr. The benefit of this is that any constant classad expressions in our code (i.e. defaults) would get parsed at compile time, and any parse errors in the classad language would become compile time errors, not runtime errors. That way, if we have a parse error in a default that is rarely used, we find out immediately, and not when a customer runs the code. In general, I believe there are many places in the code base which would benefit from compile-time evaluation.

string_view
While C++ has had a standard string class since C++98, C++17 adds a read-only, non-owning class called "string_view". This is similar to our "YourString" class, except that it supports most of the const methods of std::string. If CMS needs us to store many more classads in the collector, perhaps one avenue to pursue is to store classads as a map of string_views to ExprTrees, instead of strings, and have a mapping of the common pre-definedÂjob attributes as string_views in the text segment. string_views are marked constexpr in C++ 17, which means you can do interesting things like compute theirÂhash functions at compile time. This leads us to ...
Â
User defined literals (UDLs)
C++ has always had literals, integers, floats and arrays. With C++11, we can now write overloaded operators to define new literals at compile time. In C++14, there is library support for the common library types, like string, (and in C++17) string view. These operators are marked constexpr, so they are (almost always) run at compile time. This is another big area for improvement in the condor code, as most of our string literals we use for classad attributes are stored today as char *s, and we don'tÂcompute their length or their hash code until runtime. These are written as suffices, with a leading underscore reserved for user-defined literals, and those without leading underscores for implementation-provided literals, e.g.

constexpr string_view requirements = "Requirements"sv;

one could even imagine a classad expr UDL, so that our default expressions could be parsed and checked at compile time:

constexpr ExprTree defaultRequirements = "HasFileTransfer && Memory > RequestMemory"_classadExpr;

Moving to exceptions

For the longest time, the HTCondor coding practice has been to not use C++ exceptions. Like it or not, library code we use today, both 3rd party (json parsers) and standard library code uses exceptions. When we write code today, we need to try to be more aware of making our code safe in the presence of C++ exceptions, even if we aren't ourselves throwing them. One features in the hopper for some future version of C++ is "static exceptions", or "HerbCeptions, (from their author, Herb Sutter)". These static exceptions look syntactically like the current ones, but can be written to never allocate memory at throw time, which is a problem for some embedded systems. At such time as they land, I'm sure we'll see even more C++ code become exception happy, and we should start now to become prepared to deal with that code.

Small Things

I picked up a small book of 50 C++ best practices. I've got it inÂmy office, if anyone wants to stop by and borrow it.

C++11 also gives us the type std::ignore which anything is assignable to, but is defined to not generate a compiler warning when assigned to the results of a function marked [[nodiscard]]. Thus we can eliminate pesky compiler warnings by writing

std::ignore = open("dev/null", 0600); // possibly a bad example...

In C++14, the language allows for the ' character as a separator in numeric literals, which is just ignored. e.g. you can writeÂ

int >

which is a lot easier to read than without. We should consider adding this to the classad lexer, so we can use such literals in classads. It might be years before we could emit such numbers on the wire, but the change would be worth it, just like when we added the elvis operator to classads, which we can nowÂMatyas pretty much uses everywhere.

--
-greg
[← Prev in Thread] Current Thread [Next in Thread→]
  • [HTCondor-devel] CppCon 2021 trip report and lessons learned, Greg Thain <=