Java Feature Spotlight: Local Variable Type Inference

Image result for Java Feature Spotlight: Local Variable Type InferenceIn Java Futures at QCon New York, Java Language Architect Brian Goetz took us on a whirlwind tour of some recent and future features in the Java Language. In this article, he dives into Local Variable Type Inference.

Java SE 10 (March 2018) introduced type inference for local variables. Previously, declaring a local variable required a manifest (explicit) type declaration. Now, type inference empowers the compiler to choose the static type of the variable, based on the type of its initializer:

var names = new ArrayList<String>();

In this simple example, the variable names will have the type ArrayList<String>.

Despite the syntactic similarity to a similar feature in JavaScript, this is not dynamic typing—all variables in Java still have a static type. Local variable type inference merely allows us to ask the compiler to figure this type out for us, rather than forcing us to provide it explicitly.

Type Inference in Java

Type inference is a technique used by statically typed languages, where the types of variables may be inferred from context by the compiler. Languages vary in their use and interpretation of type inference. Type inference generally provides the programmer with an option, not an obligation; we are free to choose between manifest and inferred types, and we should make this choice responsibly, using type inference where it enhances readability and avoiding it where it might create confusion.

Type names in Java can be long, either because the class name itself is long, has complex generic type parameters, or both. It is a general fact of programming languages that the more interesting your types are, the less fun they are to write down — which is why languages with more sophisticated type systems tend to lean more heavily on type inference.

Java started with a limited form of type inference in Java 5, and its scope has steadily expanded over the years. In Java 5, when generic methods were introduced, we also introduced the ability to infer the generic type parameters at the use site; we typically say:

List<String> list = Collection.emptyList();

rather than providing explicit type witnesses:

List<String> list = Collection.<String>emptyList();

In fact, the inferred form is so common, that some Java developers have never even seen the explicit form!

In Java 7, we extended the scope of type inference to infer type parameters of generic constructor invocations (also known as “diamond”); we can say

List<String> list = new ArrayList<>();

as a shorthand for the more explicit

List<String> list = new ArrayList<String>();

In Java 8, when we introduced lambda expressions, we also introduced the ability to infer the types of the formal parameters of lambda expressions. So, we could say:

list.forEach(s -> System.out.println(s))

as a shorthand for the more explicit

list.forEach((String s) -> System.out.println(s))

And, in Java 10, we further extended type inference to the declaration of local variables.

Some developers might think routine use of inferred types is better, because it results in a more concise program; others might think it’s worse because it removes potentially useful information from view. But, both of these views are simplistic. Sometimes, the information that would be inferred is merely clutter that would otherwise just get in the way (no one complains that we routinely use type inference for generic type parameters), and in these cases, type inference makes our code more readable. In other cases, the type information provides vital clues about what is going on, or reflects creative choices by the developer; in these cases, it is better to stick with manifest types.

While we’ve expanded the scope of type inference over the years, one design principle we’ve followed is only using type inference for implementation details, not declaration of API elements; the types of fields, method parameters, and method returns must always be manifestly typed, because we don’t want API contracts subtly changing based on changes in the implementation. But, inside the implementation of method bodies, it’s reasonable to have more latitude to make choices on the basis of what is more readable.

How Does Type Inference Work?

Type inference is frequently misunderstood as something close to magic or mind-reading; developers often anthropomorphize the compiler and ask “why couldn’t the compiler figure out what I wanted.” In reality, type inference is something much simpler: constraint solving.

Different languages use type inference differently, but the basic concept is the same across all languages: gather constraints on unknown types, and at some point, solve for them. Where language designers have latitude is where type inference can be used, what constraints are gathered, and over what scope the constraints are solved.

Type inference in Java is local; the scope over which we gather constraints and when we solve them is restricted to a narrow portion of the program, such as a single expression or statement. For example, for local variables, the scope over which we gather constraints and solve is the declaration of the local itself — regardless of other assignments to that local. Other languages pursue a more global approach to type inference, considering all uses of the variable before attempting to solve for its type. While at first this might seem better because it is more precise, it is frequently harder to use. If the types of variables can be influenced by each of their uses, when things go wrong (such as the type being overconstrained because of a programming error), the error messages are often quite unhelpful, and can pop up far away from either the declaration of the variable whose type is being inferred or the site of its erroneous use. These choices illustrate one of the fundamental trade-offs language designers face when using type inference — we are always trading precision and predictive power for complexity and predictability. We can tweak the algorithm to increase the prevalence of the compiler “getting it right” (say, by gathering more constraints or solving over a larger scope), but the consequence is almost always more unpleasantness when it fails.

As a simple example, consider a diamond invocation:

List<String> list = new ArrayList<>();

We know that the type of list is List<String>, because it has a manifest type. We are trying to infer the type parameter of ArrayList, which we’ll write as x. So, the type of the right-hand side is ArrayList<x>. Because we’re assigning the right to the left, the type of the right must be a subtype of the left, so we gather the constraint:

ArrayList<x> <: List<String>

Where <: means “subtype of”. (We also gather the trivial bound x <: Object, from the fact that x is a generic type variable, whose implicit bound is Object.) We also know from the declaration of ArrayList that List<x> is a supertype of ArrayList<x>. From these, we can derive the bound constraint x <: String (JLS 18.2.3) and because this is our only constraint on x, we can conclude x=String.

Here’s a more complicated example:

List<String> list = ...
Set<String> set = ...
var v = List.of(list, set);

Here, the right-hand side is a generic method call, so we are inferring the generic type parameter from the following method in List:

public static <X> List<X> of(X... values)

Here, we have more information to work with than in the preview example — the parameter types, which are List<String> and Set<String>. So, we can gather the constraints:

List<String> <: x
Set<String> <: x

Given this set of constraints, we solve for x by computing the least upper bound (JLS 4.10.4) — the most precise type that is a supertype of both — which in this case is Collection<String>. So, the type of v is List<Collection<String>>.

Which constraints do we gather?

When designing a type inference algorithm, a key choice is how we gather constraints from the program. For some program constructs, such as assignment, the type on the right side must be compatible with the type on the left, so we would surely gather constraints from that. Similarly, for parameters of generic methods, we can gather constraints from their types. But there are other sources of information we might choose to ignore in certain circumstances.

At first, this sounds surprising; wouldn’t gathering more constraints be better, because it leads to a more precise answer? Again, precision is not always the most important goal; gathering more constraints may also increase the likelihood of an overconstrained solution (in which case inference fails, or picks a fallback answer like Object), as well as leading to greater instability in the program (small changes in the implementation can lead to surprising changes in typing or overload resolution elsewhere.) As with the scope over which we solve, we are trading off precision and predictive power against complexity and predictability — which is a subjective task.

As a concrete example of when it makes sense to ignore a possible source of constraints, consider a related example: method overload resolution when lambdas are passed as method parameters. We could use the exceptions thrown by the lambda bodies to narrow the set of applicable methods (greater precision), but this would also make it possible for small changes in the lambda body implementation to change the result of overload selection, which would be surprising (reduced predictability.) In this case, the increased precision did not pay for the reduced predictability, so this constraint was not considered when making overload resolution decisions.

The Fine Print

Now that we understand how type inference works in general, let’s dive into some details of how it applies to local variable declarations. For a local variable declared with var, we first compute the standalone type of the initializer. (The standalone type is the type we get by computing the type of an expression “bottom up,” ignoring the assignment target. Some expressions, such as lambdas and method references, do not have a standalone type, and hence cannot be the initializer for a local whose type is inferred.)

For most expressions, we just use the standalone type of the initializer as the type of the local. However, in several cases — specifically when the standalone type is non-denotable — we may refine or reject this type.

A non-denotable type is one that we cannot write down in the syntax of the language. Non-denotable types in Java include intersection types (Runnable & Serializable), capture types (those that derive from wildcard capture conversion), anonymous class types (the type of an anonymous class creation expression), and the Null type (the type of the null literal.) At first, we considered rejecting inference on all non-denotable types, under the theory that var should just be a shorthand for a manifest type. But, it turned out that non-denotable types were so pervasive in real programs, that such a restriction would make the feature less useful and more frustrating. This means that programs using var are not necessarily merely a shorthand for a program that uses explicit types — there are some programs that are expressible with var that are not expressible directly.

As an example of such a program, consider the anonymous class declaration:

var v = new Runnable() {
    void run() {}
    void runTwice() { run(); run(); }
};

v.runTwice();

Were we to provide a manifest type – the obvious choice being Runnable – the runTwice() method would not be accessible through the variable v because it is not a member of Runnable.  But with an inferred type, we are able to infer the sharper type of the anonymous class creation expression, and therefore are able to access the method.

Each category of non-denotable type is its own story. For the Null type (which is what we’d infer from var x = null), we simply reject the declaration. This is because the only value that inhabits the Null type is null — and it’s quite unlikely that what was intended was a variable that can only hold null. Because we don’t want to “guess” at the intent by inferring Object or some other type, we reject this case so the developer can provide the correct type.

For anonymous class types and intersection types, we simply use the inferred type; these types are weird and novel but fundamentally harmless. This means we are now more widely exposed to some “weird” types that previously remained below the waterline. As an example, suppose we have:

var list = List.of(1, 3.14d);

This looks like the earlier example, so we know how this is going to play out — we’re going to take the least upper bound of Integer and Double. This turns out to be the somewhat ugly type Number & Comparable<? extends Number & Comparable<?>>. So, the type of list is List<Number & Comparable<? extends Number & Comparable<?>>>

As you can see, even a simple example can give rise to some surprisingly complicated types — including some which cannot explicitly write down.

The trickiest case is what we do with wildcard capture types. Capture types come from the dark corners of generics; they stem from the fact that each use of ? in a program corresponds to a different type. Consider this method declaration:

void m(List<?> a, List<?> b)

Even though the types of a and b are textually identical, they are not actually the same type — because we have no reason to believe that both lists are of the same kind of element. (If we wanted the two lists to be the same type, we’d make m() a generic method in T, and use List<T> for both.) So, the compiler invents a placeholder, called a “capture,” for each use of ? in the program, so we can keep distinct uses of wildcards separate. Until now, capture types stayed in the darkness where they belong, but if we allowed them to escape into the wild, they could spread confusion.

For example, suppose we have this code in class MyClass:

var c = getClass();

We might expect that the type of c would be Class<?>, but the type of the expression on the right-hand side is actually Class<capture<?>>. Setting that type loose in our program would not help anyone.

Banning inference of capture types seemed attractive at first, but again, there were too many cases where these types popped up. So instead, we chose to sanitize them, using a transform known as upward projection (JLS 4.10.5), which takes a type that might include capture types, and produces a supertype of that type with no capture types. In the case of the above example, upward projection sanitizes the type of c to Class<?>, which is a more well-behaved type.

Sanitizing types is a pragmatic solution, but it is not without compromise. By inferring a different type than the natural type of the expression, it means if we were to refactor a complex expression f(e) into var x = e; f(x) using an “extract variable” refactoring, this could change downstream type inference or overload selection decisions. Most of the time this is not a problem, but it is a risk we take when we tinker with the “natural” type of an expression. In the case of capture types, the side-effects of the cure were better than the disease.

Divergent Opinions

Compared to something like lambdas, or generics, type inference for local variables is a pretty small feature (though, as you’ve seen, the details are more complicated than most people give them credit for.) But, the controversy around this feature was anything but small.

For several years, this was one of the most frequently requested features for Java; developers got used to this feature from C#, or Scala, or later, Kotlin, and missed it terribly when coming back to Java — and they were quite vocal about it. We decided to move forward on the basis of its popularity, that it had been proven to work well in other Java-like languages, and that it had a relatively small scope for interaction with other language features.

Perhaps surprisingly, as soon as we announced we were moving forward on this, another vocal faction appeared — those who clearly thought this was the dumbest idea they’d ever seen. It was described as “giving in to fashion” or “encouraging laziness” (and worse), and dire predictions were made about a dystopian future of unreadable code. And both the proponents and the antagonists justified their position by appealing to the same core value: readability.

After delivering the feature, the reality was not remotely so dire; while there is an initial learning curve where developers have to find the right way to use the new feature (just as with every other feature), for the most part developers can easily internalize some reasonable guidelines about when the feature adds value, and when it does not, and use it accordingly.

Style Advice

Stuart Marks, of the Java Libraries team at Oracle, has compiled a useful style guide to help understand the trade-offs surrounding type inference for locals.

As with most sensible style guides, this one focuses on making clear the trade-offs involved. Explicitness is a trade-off; one the one hand, an explicit type provides an unambiguous and precise statement of a variable’s type, but on the other hand, sometimes the type is obvious or unimportant, and the explicit type may vie with more important information for the reader’s attention.

General principles outlined in the style guide include:

  • Choose good variable names. If we choose expressive names for local variables, it’s more likely the type declaration is unnecessary, or even in the way. On the other hand, if we choose variable names like x and a3, then taking away the type information may very well render the code harder to understand.
  • Minimize the scope of local variables. The greater the distance between a variable’s declaration and its use, the more likely we are to reason about it imprecisely. Using var to declare local variables whose scope spans many lines is more likely to result in oversights than those that have smaller scopes, or have explicit types.
  • Consider var when the initializer provides sufficient information to the reader. For many local variable declarations, the initializer expression makes it completely obvious what is going on (such as var names = new ArrayList<String>()), and hence the need for an explicit type is lessened.
  • Don’t worry too much about “programming to the interface.” A common worry among developers is that we’ve long been encouraged to use abstract types (like List) for variables, rather than more specific implementation types (like ArrayList), but if we allow the compiler to infer the type, it will infer the more specific type. But, we shouldn’t worry too much about this, because this advice is much more important for APIs (such as method return types) than for local variables in the implementation — especially if you’ve followed the previous advice about keeping scopes small.
  • Beware of interactions between var and diamond inference. Both var and “diamond” ask the compiler to infer types for us, and it’s perfectly OK to use them together — if there’s enough type information present to infer the desired type, such as in the types of the constructor arguments.
  • Watch out for combining var with numeric literals. Numeric literals are poly expressions, meaning their type can be dependent on the type to which they are being assigned. (For example, we can write short x = 0, and the literal 0 is compatible with intlongshort, and byte.) But, without a target type, the standalone type of a numeric literal is int, so changing short s = 0 to var s = 0 will result in the type of s changing.
  • Use var to break up chained or nested expressions. When declaring a fresh local for a subexpression is burdensome, it increases the temptation to create complex expression with chaining and/or nesting, sometimes at the expense of readability. By lowering the cost of declaring a subexpression, local variable type inference makes it less tempting to do the wrong thing, thus increasing readability.

The last item illustrates an important point that is often missed in the debate over programming language features. When evaluating the consequences of a new feature, we often consider only the most superficial ways in which it might be used; in the case of local variable type inference, that would be replacing manifest types in existing programs with var. But, the programming style we adopt is influenced by many things, including the relative cost of various constructs. If we lower the cost of declaring local variables, it stands to reason that we might re-equilibrate in a place where we use more local variables, and this has the potential to make programs more readable. But, such second-order effects are rarely considered in the vocal debate over whether a feature will help or hurt.

Many of these guidelines are just good style advice anyway — picking good variable names is one of the most effective ways to make code more readable, with or without inference.

Wrap-up

As we’ve seen, type inference for local variables is not quite as simple a feature as its syntax suggests; while some might want it to let us ignore types, it actually requires us to have a better understanding of Java’s type system. But, if you understand how it works, and follow some reasonable style guidelines, it can help make your code both more concise and more readable.

[“source=infoq”]