• Home
  • Architecture
  • Tracking data in complex Java code: A functional programming approach

Tracking data in complex Java code: A functional programming approach

Companies across all industries are increasingly adopting cloud technologies. Naturally, many Visio users move to Lucidchart as they step into a more modern, collaborative, and integrated diagramming environment. To help make their transition as smooth as possible, users can import existing Visio documents into Lucidchart.  We’re constantly collecting feedback on how well this import is working so we can focus our efforts on the most common problems.

Converting Visio documents into Lucidchart documents requires a veritable maze of objects, methods, and recursive function calls, and altogether the process is pretty complex.  Our original version of this conversion code was written without the ability to track how faithful our imports are.  When we decided to add code to track import fidelity, we realized that import fidelity data is often produced deep inside the complex call tree. We needed a way to reliably transmit data from deep inside a complex call tree back to the top, preferably in a way that wouldn’t introduce a lot of extra bugs or be a serious maintenance nightmare.


Typical call stack in our conversion code
A typical call stack in our Visio import code. The function which knows how and why our import wasn’t perfect is buried 12 function calls below the last function which had easy access to the database. Because some of the function calls along the way are recursive, it is very difficult to predict the call stack in advance, and yet somehow we must get both the results and all discovered warnings reliably from the bottom function to the top.

In this article, I’ll discuss three different ways I considered solving this problem, along with the pros and cons of each method.  All of these methods use common programming constructs and all have serious downsides.  Fortunately, by taking inspiration from purely functional programming, I found a way to fix one of the methods so that I felt comfortable using it in a production system.  That fix will be the topic of the next article.

1. Using a singleton to track warnings

Experienced programmers are shaking their heads right now.  Singletons have a bad reputation in the programming community, for good reason.  It is, however, a very natural solution to this kind of problem, and so it is worth some effort to understand what a singleton is, how it works, and why it breaks.  That way the advantages of the later solutions I outline, which seem like more work, will be clearer by comparison.

A first cut at the Java code for this singleton would look something like this:

class WarningManager {

    private static WarningManager only = new WarningManager();
    private List<String> warnings;

    private WarningManager() {
        this.warnings = new ArrayList<>();
    }

    public static WarningManager getInstance() {
        return only;
    }

    public void addWarning(String warning) {
        this.warnings.add(warning);
    }

    public List<String> getWarnings() {
        return new ArrayList<>(this.warnings);
    }

}

This solution has the advantage that it is easy to implement and easy to call from anywhere in the code.  No matter how deep in the code you are, you just add the following:

WarningManager.getInstance().addWarning(“warning text”);

And the warning will be logged.

If it is this simple, why did I say this is a bad solution?  Well, you can look up lots of discussions about singletons online.  For our purpose, the biggest downside is that the singleton’s behaviour is only simple if you are parsing the entire content of a single document in a single thread in the “official” code path.

If you start parsing multiple documents at a time (say, in different threads), parse document elements multiple ways and choose the best outcome, or try to create unit tests, the naive code above will break.  You have broken modularity by creating a global variable and can therefore no longer take the code apart and run it in different environments without cross-talk.  As experienced programmers can tell you, this leads to complications quickly.

This is just scratching the surface of reasons not to use singletons this way.  Don’t use stateful singletons.

2. Pass around a mutable list of warnings

In this method, you modify all the functions which can produce warnings such that they pass a list of warnings around as a local parameter, as shown in this code:

class ElementParser {

    public Element getElement(List<String> warnings, …) {
        ...
        warnings.add(warning1);
        ...
        subMethod(warnings, ...);
        ...
    }

    private Element subMethod(List<String> warnings, ...) {
        ...
        warnings.add(warning2);
        ...
    }

}

A major con of this approach is that it requires much more extensive changes to the code.  This is balanced out by two important facts: First, the code itself now makes it clear where in the code we can expect to see warnings.  Secondly, we have solved some major downsides to the singleton approach.  By passing the warnings as a parameter, we have restored modularity—the warnings exist locally.  Unit tests, multiple documents, and multiple parsing passes no longer interfere with each other.

Despite these improvements, there is still a downside in that this method relies on side-effects in the warning list.  The problems are far less severe than with the side-effects in a global singleton, but there remains the fact that it looks like the warning list is an input to all of the functions, when in reality, it should only be an output.

Which brings us to our last method…

3. Make the warnings part of the return type

This method used to be a lot harder.  Nowadays, though, compilers and language design have reached the point where there is no excuse for not using this if you can.

In Java, you would implement this by creating a class of the type:

public class ConversionResult<T> {

    private T result;
    private List<String> warnings;

    public T get() { return result; }

    public List<String> getWarnings() { return warnings; }

    public ConversionResult(T result) {
        this.result = result;
        this.warnings = new ArrayList<>();
    }

    public addWarnings(List<String> warnings) {
        this.warnings.addAll(warnings);
    }

}

Rather than passing in a list of warnings, every function which can create warnings returns an object of type ConversionResult<ParseResult>.

This approach has most of the advantages of option 2.  It is entirely modular;  in fact, it is more modular than the list of warnings because when using this method, the warnings are only passed out in the return type, not in.  It is purely functional.  This method also has the advantage that the compiler forces the programmer to consider the possibility of warnings after every function call which can produce warnings—you have to go out of your way to parse the result, which takes more time but also makes the compiler more cautious.

One thing some programmers will complain about is that it forces the changes in the code to spread quickly (like cancer, as I described it) through an entire codebase.  I argue, however, that this is really what you want.  After all, the original problem was one of connecting very different layers of complex code with a common mechanism for passing messages around.  Unless you use a singleton (don’t), this will require big changes. The important bit is that with a parameterized return type, the compiler enforces the changes in all relevant layers, meaning the computer does the lion’s share of finding places that need to change. Always let the compiler do the work for you, if you possibly can.

The biggest downside to this approach is the fact that you have to manually combine all the warnings after you receive them.  Doing so makes it much more likely that you will forget to include warnings that should have been kept.  The code to pass the warnings up the call chain is also very repetitive which makes it error-prone.

Conclusions

As a programmer with a theoretical bent and a decent amount of experience, I really didn’t want to use a singleton.  Too much can go wrong there.  I really liked the mutable list but considered using the return type to be more elegant.  However, I felt that manually combining the warnings was a serious downside to this approach.  It would be so easy to break the chain and let messages be ignored.

Fortunately, some subtle ideas from functional programming suddenly became clear to me, and I was able to solve these problems to my own satisfaction.  We will return to how I did this in a later post.

No Comments, Be The First!

Your email address will not be published.