Clojure is one of the most powerful programming languages out there. It’s unique approach to functional programming, being a dynamically typed language is very powerful. It’s a dialect of Lisp, one of the oldest and most powerful languages itself. Lisp was one of the first to introduce concepts like first class functions or anonymous functions, garbage collection and more that are still used to this day.

GraalVM is a Java virtual machine that was develpoed by the Oracle Corporation in 2019. One of the very cool features of GraalVM is Ahead-of-Time (AOT) compilation, which is what we’re gonna discuss here. Ahead of time compilation allows for a blazingly faster startup time and much less memory footprint. Startup time is crucial in cases where you need your services to restart with zero to no down time, or as fast as possible, and in many other cases.

Using GraalVM with Clojure is pretty easy and straightforward. Start by first downloading the GraalVM binaries for your selected Java version from https://github.com/graalvm/graalvm-ce-builds/releases, unpack it and make sure its in your system’s `PATH`

.
Once the GraalVM command line tools are recognized in the system, you can install `native-image`

using:

`gu install native-image`

After that, make sure that your `project.clj`

in your clojure project contains the `:main`

and that ahead of time (AOT) compilation is enabled:

```
(defproject graalvm_test "0.1.0-SNAPSHOT"
:description "FIXME: write description"
:url "http://example.com/FIXME"
:license {:name "EPL-2.0 OR GPL-2.0-or-later WITH Classpath-exception-2.0"
:url "https://www.eclipse.org/legal/epl-2.0/"}
:dependencies [[org.clojure/clojure "1.10.1"]]
:repl-options {:init-ns graalvm-test.core})
;; add the main namespace
:main graalvm-test.core
;; add aot to the build profile
:profiles {:uberjar {:aot :all}}
```

Finally, you’ll need to add a main function to your clojure project in the main namespace you specified.

```
(defn -main
[& args]
(println "Hello World"))
```

Now, in order to build a native image, we have to build an uberjar first:

`lein do clean, uberjar`

And now you can run the `native-image`

command:

```
native-image --initialize-at-build-time \
--no-server \
-jar ./target/hello-world-0.1.0-SNAPSHOT-standalone.jar \
```

You can now run your native image:

`./hello-world-0.1.0-SNAPSHOT-standalone`

And to put it in action, let’s compare the time difference between running the jar vs. the native image:

```
$ time java -jar target/hello-world-0.1.0-SNAPSHOT-standalone.jar
Hello World
java -jar target/hello-world-0.1.0-SNAPSHOT-standalone.jar 2.17s user 0.20s system 181% cpu 1.309 total
$ time ./hello-world-0.1.0-SNAPSHOT-standalone
Hello World
./hello-world-0.1.0-SNAPSHOT-standalone 0.00s user 0.01s system 29% cpu 0.034 total
```

from 2.17 seconds to less than a millisecond of startup speed, amazing!

Building a native image of your clojure projects isn’t always this straightforward, though. Some clojure libraries uses dynamic class loading to load some of its components, and for that you’ll need to supply a reflection configuration file to GraalVM in order for it to load the class on runtime.

Carmine is a very powerful Redis client for Clojure. It’s also one of the clojure libraries that uses dynamic class loading for some of its components (i.e: a class named `org.apache.commons.pool2.impl.EvictionPolicy`

). Let’s say how to build a native image of a clojure project that uses Carmine.

Let’s start by generating a new project:

`lein new carmine_graalvm`

We’ll add the carmine dependency to `project.clj`

, along with what we did initially; adding the main and enabling aot compilation:

```
:dependencies [[org.clojure/clojure "1.10.1"]
[com.taoensso/carmine "3.1.0"]]
:main carmine-graalvm.core
:profiles {:uberjar {:aot :all}}
```

Next, we’ll need to add some code that communicates with the redis server we want to test. Add this to your `core.clj`

:

```
(ns carmine-graalvm.core
(:require [taoensso.carmine :as car])
(:gen-class))
(defmacro wcar* [& body] `(car/wcar {} ~@body))
(defn -main
[& args]
(println (wcar* (car/ping)))
(println (wcar* (car/info "server"))))
```

This code sends the command `PING`

to the redis server which replies with `PONG`

and then asks redis for the server info and prints that out.

Before building the native image; let’s make sure our code runs by running `lein run`

. Make sure that the redis server is running in your local machine.

```
$ lein run
PONG
# Server
redis_version:6.2.5
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:ee85148efbe62cad
redis_mode:standalone
os:Linux 5.11.0-34-generic x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:c11-builtin
gcc_version:9.3.0
process_id:5119
process_supervised:no
run_id:b99a6e6efa94367fd92d1ec52c89c0c48d215f02
tcp_port:6379
server_time_usec:1631835822597434
uptime_in_seconds:119
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:4445870
executable:/home/amarnah/workspace/carmine_graalvm/redis-server
config_file:
io_threads_active:0
```

Now we can just build the source by running:

`lein do clean, uberjar`

and then build the native image by:

```
native-image --initialize-at-build-time \
--no-server \
-jar ./target/carmine_graalvm-0.1.0-SNAPSHOT-standalone.jar
```

now running `./carmine_graalvm-0.1.0-SNAPSHOT-standalone`

will give an exception:

```
Exception in thread "main" java.lang.IllegalArgumentException: Unable to create org.apache.commons.pool2.impl.EvictionPolicy instance of type org.apache.commons.pool2.impl.DefaultEvictionPolicy
...
```

This is because as we mentioned, carmine uses dynamic class loading. To successfully build a native image of our project, we’ll need to supply a reflection configuration file that’ll let GraalVM know that this is a dynamic class and is going to be loaded in the future. Add this to a file named `reflect-config.json`

:

```
[
{
"name":"org.apache.commons.pool2.impl.DefaultEvictionPolicy",
"allPublicConstructors" : true
}
]
```

and when building the native image we’ll use:

`-H:ConfigurationFileDirectories=./path/to/config/dir`

and that’ll make our final build command:

```
native-image --initialize-at-build-time \
--no-server \
-H:ConfigurationFileDirectories=./path/to/config/dir \
-jar ./target/carmine_graalvm-0.1.0-SNAPSHOT-standalone.jar
```

Comparing the startup time between the two variants

```
java -jar target/carmine_graalvm-0.1.0-SNAPSHOT-standalone.jar 5.81s user 0.34s system 237% cpu 2.591 total
./carmine_graalvm-0.1.0-SNAPSHOT-standalone 0.01s user 0.01s system 106% cpu 0.014 total
```

Amazing! From 5.81 seconds to less than a millisecond! You can also check the memory usage and compare it in the two processes.

]]>But what exactly is lazy evaluation?

Lazy evaluation of programming expressions is a way to tell the computer to not evaluate the expression until it is called. It also ensures that the expression is not run twice for the same input (that is sometimes called `memoization`

). There are a lot of terms for lazy evaluation, some call it `call by need`

because you only call the function or expression when you actually need the value of it. Some others call it `delayed evaluation`

and that’s because the evaluation of a certain expression is only delayed until you call it.

Let’s see a couple of examples on how lazy evaluation in **Clojure** works. No Clojure background is required to understand this, but it assumes basic knowledge of programming and computational concepts.

Firstly, here’s a list that contains all the numbers from 0 to 9 in Clojure.

```
user=> (def array (range 10))
```

As you can see, we defined the function range `(range 10)`

which returns all the numbers from 0 to 9. But did it really run? The answer is **no**! this only defined that we want -sometime in the near future- to get the values returned by that function. Only if we call the defined `array`

, will the function actually run.

```
user=> array
(0 1 2 3 4 5 6 7 8 9)
```

The very magical thing appears here:

```
user=> (def infinite-list (range))
```

We, ladies and gentlemen, just defined an **infinite list,** how interesting? I know this seems surreal, but let’s have a look at how actually very practical that would be.

This is called a `lazy sequence`

. How did that happen? How did we define an *infinite list*? It’s as we said because the function `(range)`

didn’t actually get executed. It is waiting for us to call it so that it gets executed. **If you call array at that point, it’ll cause an infinite loop that won’t stop.**

But you can do a lot of operations on lazy sequences. For example, here’s how we get the first 10 integer numbers of the infinite list we just defined:

```
user=> (def infinite-list (range))
user=> (take 10 infinite-list)
(0 1 2 3 4 5 6 7 8 9)
```

We can also define the infinite list of even integers:

```
user=> (def even-ints (filter even? (range)))
user=> (take 10 even-ints)
(0 2 4 6 8 10 12 14 16 18)
```

Here, we defined `even-ints`

to be the infinite list of integers - `(range)`

- filtered using Clojure’s filter function, which takes as a parameter a function `even?`

that checks if the number is even or not. And then we evaluated **only** the first 10 terms of the list, which returned the first 10 even integers.

Let’s have a deeper look on a little bit more complex example (and one that is always brought up when talking about infinite sequences), and that is how to define the infinite list of prime integers.

A prime is a number that is only divisible by itself and 1. In other words, a number `n`

is a prime iff the list of its factors are only `[1, n]`

.

So let’s a define a function `factors`

that takes a number `n`

and returns a list of all its factors.

```
user=> (defn factors [n]
(filter #(zero? (mod n %)) (range 1 (+ n 1))))
user=> (factors 15)
(1 3 5 15)
user=> (factors 7)
(1 7)
```

So this takes the list `(range 1 (+ n 1))`

that returns the integers from 1 → n exactly, and filters all these numbers that matches the criteria: `(n mod x == 0)`

.

Now we can define a function `prime?`

that determines if the number `n`

is a prime, and that would be only if the list of its factors is exactly `[1, n]`

.

```
user=> (defn prime? [n]
(= (factors n) [1 n]))
```

Now the only thing left for us is to define the infinite list of prime integers is to filter the infinite list of integers using our `prime?`

function.

```
user=> (def primes (filter prime? (range)))
user=> (take 10 primes)
(2 3 5 7 11 13 17 19 23 29)
```

And that’s it! We just defined the **infinite** list of all prime integers! Look at how short (and elegant) the code is when put together:

```
(defn factors [n]
(filter #(zero? (mod n %)) (range 1 (inc n))))
(defn prime? [n]
(= (factors n) [1 n]))
(def all-primes
(filter prime? (range)))
```

One thing that remains, though, is how long does it take to evaluate the first n prime numbers? Let’s have a look:

```
user=> (time (take 100 all-primes))
"Elapsed time: 0.025333 msecs"
(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541)
user=> (time (take 1000 all-primes))
"Elapsed time: 0.033803 msecs"
(2 3 5 7 .....)
```

0.03 msecs! That is actually really good! Note that this is the time taken to generate a lazy sequence since `take`

returns a lazy sequence. If we want to measure the actual running time over all `n`

primes, it would be something like this:

```
user=> (time (dorun (take 1000 all-primes)))
"Elapsed time: 336.496157 msecs"
```

This is called `lazy sequence realization`

and it is done by using `dorun`

(you can also use `doall`

).

So, 300 msecs. Not so bad, but we can actually make it much better.

A very long time ago, in the age of Greeks, there was an ancient (brilliant) mathematician who after his name *(Eratosthenes)* was named a very efficient prime finding algorithm that is still used to this day!

The algorithm basically goes like this:

- Define the list of all integers (up to n, deterministically, and infinitely, lazily).
- Mark the first value (Let’s call it
`x`

) in the list as a prime. - Remove all multiples of
`x`

by removing all the values between`2x -> n`

. That will be`(2x, 3x, 4x, ...)`

. - Take the new list and send it back to step 2.

You can find a very funny (and beautiful) visual explanation of the algorithm here:

https://www.youtube.com/watch?v=V08g_lkKj6Q

And a more formal technical discussion, and its implementation in C++ here:

https://cp-algorithms.com/algebra/sieve-of-eratosthenes.html

Here’s the code for sieve in Clojure, defined lazily.

```
(defn sieve [inf-list]
(cons
(first inf-list)
(lazy-seq
(sieve (filter #(not (zero? (mod % (first inf-list)))) (rest inf-list))))))
```

This defines a function `sieve`

that takes an infinite list (or a `lazy sequence`

) and returns a list that is `[ (first element of the infinite list), (sieve of the list filtered by what is not a multiple of the first element) ]`

. Notice that before recursively running the function, we specified for Clojure that the returning list is a `lazy-seq`

and that we don’t want you to evaluate everything but instead only what we ask you for.

And now when we run it:

```
user=> (take 10 (sieve (drop 2 (range))))
(2 3 5 7 11 13 17 19 23 29)
user=> (time (dorun (take 1000 (sieve (drop 2 (range))))))
"Elapsed time: 152.715236 msecs"
```

And that’s almost half the runtime of the first approach!

I would like to note something that was mentioned before, and that is Clojure is not entirely lazy. The only lazy functions are `map`

, `filter`

, `reduce`

, `repeatedly`

and a couple more functions. You can also always tell Clojure (explicitly) that a list is lazy by calling `lazy-seq`

on it.
Other languages like Haskell, for example, are entirely lazy, so by nature any returned sequence is a lazy sequence that won’t be executed until it is called.

We just saw some profound power of a different and special kind of programming techniques: Lazy Evaluation. The opposite of that is usually called Eager Evaluation, or Strict Evaluation.

What I find most interesting in the idea of lazy evaluation of expressions is that you can basically ask your program to save the ** abstraction** of the list instead of its values. You save the “formula” of a function instead of evaluating it. You separate how to generate the value — the code that you have to type in that generates that value — from when or whether you run it because you might not need to run that code. That, in my opinion, is higher-level functional programming at it’s finest.

Of course, you might think that an infinite list of primes is not that practical, and you might not need it in an actual production-level system, but you’d surprised at how practical the concept of lazy evaluation is, and how much it optimizes your code’s runtime and memory usage.

This blog is a way for me to strengthen my knowledge in a topic that I want to learn. Here are the resources that this blog post was written upon, and it would be greatly useful if you want to dive deeper in the subject and learn more.

]]>But what do we know, as programmers, so that we forget? I’ll assume that you’re coming from an object-oriented background, and if so, we can summarize the basic components of almost any program -that is, on a higher level- in the following chart:

So instead of forgetting about everything we know, how about we rebuild our idea on programming and what it means, and how it is constructed.

In 1966 a paper by Corrado Böhm and Giuseppe Jacopini, 2 great computer scientists, introduced the *Böhm–Jacopini theorem* [1] or the structured programming theorem. This theorem states that all programs can be constructed from just three structures: sequence, selection, and iteration.

So based on that, we definitely will have to have these three elementary structures in any functional program. Of course, we’ll also have data types; strings, integers, and booleans are all fundamental in the way we develop programs. Add to this the more advanced data structures like arrays, trees, hashes, and so on.

Turns out the only level of the programming pyramid that we need to rebuild, are the ideas that were brought by the Object-Oriented Programming paradigm. We need to find a more “functional” approach to doing programming.

One of the best places to go to when you want to look for a new paradigm of programming is Mathematicians! Mathematics and Computer Science share the same characteristics in problem-solving techniques, formulating and modeling problems, and a lot more. But most importantly: ** Abstractions**.

You might remember that moment when you were trying to build this complex system, and you’re having this abstraction of the system as a model, and then this other abstraction and the other and so on, all that combined together make up this functional goal of the system. Turns out, mathematicians share the same problem as us! They’re also trying to keep in their mind an abstraction over the other to derive this “functional” goal of a mathematical model. So, what did mathematicians use that we might find useful? ** Functions**.

If you are a programmer you have probably heard of functions (maybe methods, routines, etc.), but mathematical functions are not the same. In mathematics, a function is a relation from a set of inputs to a set of possible outputs ** where each input is related to exactly one output** [2]

Mathematical functions are different than programming methods/routines because of a very important thing and that is the mechanism of how they work. A simple ruby method can do a lot of things including updating the database, creating new files, sending emails. A mathematical function on the other side does only one thing and one thing only. Every input is mapped into the exact same output each time that function is called. So it’s not like f(x) is gonna be 24 today, and then 3 days later it’s 77.

So what if we borrow these ideas from mathematics and apply it to programming, in hope that our functions (programs) are easier to read and to write? There are a couple of rules and conditions that we have to follow in order for our functions to be “pure”.

A fundamental rule is that those mathematical functions are considered to be “values”. This is what we call a ** 1st class function** [4], and they’re actually in many programming languages (that are not even purely functional). If you look for example on this ruby code:

```
fn = lambda { |x| x + 2 }
puts fn.(2) # = 4
```

As you can see, we defined a variable `fn`

that holds a `function`

that takes an input `x`

and returns `x + 2`

. Nowadays, if you look over the documentation of almost all programming languages, you’ll find the idea of `pure`

functions developed and utilized. A function is a value that can be defined and passed as an argument and manipulated just like other variables.

```
# A function `add` that takes a function `fn`
# and a number `a` as parameters, applies
# `fn` to `a` and adds the result of
# `fn(a)` to `a` again.
def add(fn, a)
fn.(a) + a
end
fn = lambda { |x| x + 2 }
result = add(fn, 2)
puts result # = 2 + 2 + 2 = 6
```

The second most important rule for our functions to behave purely is to behave exactly like mathematical functions do: Take an input, produce an output. That is, without any side effects.

But, what are `side effects`

?

```
# A side effect
File.open("log.txt", "w") { |f| f.write "#{Time.now} - User logged in\n" }
```

Side effects are basically the change of the “outer” state of the function. A change that is outside the scope of the function’s inputs and outputs. Anything that deals with IO, or any external resource is considered to be a side effect. But what are our programs other than side effects?

So we need this bridge that connects our functions with the actual events and side effects that we want to do. The solution to this problem has been implemented differently in different programming languages, and we’ll take a Clojure look at that.

There’s also a huge discussion nowadays on reactive functional programming (bonus: Elm [5] is a compellingly beautiful example of that), where they deal with a lot of state change and UI elements in a functional way.

An example on this bridge that the programming language Clojure has taken is what they call an `Atom`

. An atom is the closest thing that someone can find to a mutable state. It is Clojure’s way of bridging the clean organized functional approach with the messy world of side effects.

Here’s how we define an atom in Clojure:

```
(def x (atom 0))
```

And then to change the value of this atom, you just throw it into a function that applies the function on that atom, and returns the value of the applied function as the new value of the atom. For example here’s how you can increment an atom:

```
(swap! x inc)
```

What this does is that it takes `x`

and applies `inc(x)`

and swaps the current value of `x`

with the returned value of the applied function.

Now the beautiful thing happens when two threads are trying to update `x`

at the same time.

Here, `inc(x)`

and `dec(x)`

both come at the same time, and they see the current value of the atom `X`

. What happens is that this atom runs the function and **notices** that the value of `X`

has changed and what happens? It runs the function that came last **again.**

Okay, what if I want to update a file or a database or something that is completely outside the scope of the program I’m dealing with? Here comes Clojure’s agents.

In that you basically send these side-effects into a queue specified for doing these tasks, and all of them run synchronously. The topic of agents is huge and shall be discussed in a separate article.

Based on the previously mentioned rule #2, our functions should behave purely and with no side-effects. But what if our variables are mutable? That is, its state can change. You find yourself back at the problem of side-effects! So to ensure that side-effects are completely demolished we need one more important rule: Variable Immutability.

```
x = [1, 2, 3]
y = f(x)
# what is that value of x at this point? x!
```

Notice that even if you make `y = f(g(h(...(x)))`

, the value of `x`

is still going to be the same. Nothing is changed! `f(x)`

takes a copy of `x`

and returns a result based on this copy. You don’t have to figure out why `x`

’s value is being weird when it is clearly defined in front of your eyes.

But a problem arises here. `y = f(x) + g(x) - h(x) / .... + f2(x)`

, if I define `y`

to be the result of this huge chain of functions over functions over functions, the compiler will make this copy over copy over copy. What if `x`

was a 1-million element list instead of 3? And with every function, we make a new copy? That doesn’t sound very memory friendly.

Turns out functional languages deal with this problem in a very beautiful way: ** Lazy data structures**.

Let’s have a look at this Clojure code:

```
;; (0, 1, 2, 3, 4, ...)
(def integers (range))
```

This defines an **infinite** list of all integer numbers in a list called `integers`

. Of course not the infinite list of integers will be saved in your memory, but instead only what you want to evaluate, gets evaluated.

```
;; gets first 10 integers and maps an
;; increment function on all of them.
(def first-ten-integers (take 10 integers))
(def incremented (map inc first-ten-integers))
```

The way lazy (or persistent) data structures are defined in memory is a very complex and interesting topic that will be discussed in the next article.

So taking the ideas of mathematicians into programming wasn’t very bad after all. We’ll still use the same fundamental concepts of programming, but we’ll only change the higher level way of modelling our programs. We added some rules in hope that our programs are cleaner, and more concise. But functional programming isn’t this magical box that will make all your dreams come true. You’ll still find yourself making errors, you’ll spend a couple of minutes looking for a missing bracket or to fix a simple bug. It is just a cleaner, more organized way of thinking about computer programs.

[1] - Flow diagrams, turing machines and languages with only two formation rules: https://dl.acm.org/doi/10.1145/355592.365646#CIT

[2] - Function: https://mathinsight.org/definition/function

[3] - Function Machine: https://mathinsight.org/function_machine

[4] - What are first-class functions? https://lispcast.com/what-are-first-class-functions/

[5] - Elm Programming Language: https://elm-lang.org/

This article was heavily based on the work of Bob Martin and his book Clean Architecture: https://blog.cleancoder.com/uncle-bob/2012/08/13/the-clean-architecture.html and Russ Olsen’s talk Functional Programming in 40 minutes https://www.youtube.com/watch?v=0if71HOyVjY.

]]>**Write a Neural Network that predicts f(x)=x².**

For any of you who have taken any AI or Machine Learning course, or even wrote any ANN code before, you’ll know that predicting the square function is easy. And it is. In fact ANNs can predict any polynomial function and of any degree. **But, only under one condition: A finite range of values on the testing set.**

What does that mean? If you trained the network on the range [-100, +100] and then tested it on [-50, +50], you’ll get pretty good results. However, if you tested it on a range of [-500, +500] the network will fail miserably.

Okay, before we dig into details, let us be on the same page. In 2014, 4 research scientists published a paper that cited a 1993 paper by a scientist named Barron that proved the same results but less general. They proved that “a two-layer neural network can represent any bounded degree polynomial, under certain (seemingly non-restrictive) conditions.”

Here’s a preview from the abstract of their paper:

“First we show that for a randomly initialized neural network with sufficiently many hidden units, the generic gradient descent algorithm learns any low degree polynomial, assuming we initialize the weights randomly. Secondly, we show that if we use complex-valued weights (the target function can still be real), then under suitable conditions, there are no “robust local minima”: the neural network can always escape a local minimum by performing a random perturbation. This property does not hold for real-valued weights. Thirdly, we discuss whether sparse polynomials can be learned with small neural networks, with the size dependent on the sparsity of the target function.”

- Learning Polynomials with neural networks: http://proceedings.mlr.press/v32/andoni14.pdf

So given all of the above, all the proofs and scientists talking, why does it still fail sometimes?
It’s because Neural Networks can approximate any continuous function only within a **compact set**.
So given a **continuous function f(x)**, and a **finite range of values [a, b]**, then there surely exists a neural network that can approximate the function with **an error ε > 0. For example, if we wanted to approximate f(x)=x³+3, on the range [-50, +50], then there exist a neural network that can do that pretty easily**. But ask the network to predict the value of the function on a point outside that range, you’ll find that the accuracy of your prediction will decrease linearly as you go farther outside the range.

Here is what that means in images. Here you’ll see a network trained on the range [-7, +7] to predict the square function f(x)=x²

As you see, the network was tested on the interval [-30, +30], and it produced decent results. **Now what happens if we increase the testing interval?** Here you go:

You can see now that the error increased drastically when we increased the testing interval. So why is that? Why is it that ANNs can do wonders but fails in finding the pattern of a basic square function? It’s because of extrapolation.

Interpolation vs Extrapolation Interpolation is an estimation of a value within two known values in a sequence of values (i.e [a,b] where a,b are finite numbers). Polynomial interpolation is a method of estimating values between known data points. When graphical data contains a gap, but data is available on either side of the gap or at a few specific points within the gap, interpolation allows us to estimate the values within the gap.

Extrapolation is an estimation of a value based on extending a known sequence of values or facts beyond the area that is certainly known. In a general sense, to extrapolate is to infer something that is not explicitly stated from existing information.

Approximating a polynomial function for an infinite range of values falls under Extrapolation, however, ANNs are suited for Interpolation. But why?

Let’s assume that we have a training set y with y∈ R. Can a neural network regression model extrapolate and return y_pred values outside the y range in a training set? Does it depend on the activation function or not?

The output neuron of the model is just ∑Θ[i]a[i], where Θ[i] — is the weight of i-th neuron on the previous hidden layer, and a[i] — is the value of the activation function of that neuron. If we use the logistic function then a ∈ [-1, +1]. Thus maximum possible y_pred=∑Θ[i], assuming that all “a”s reach their maximum value around 1. But if we use the linear activation function, which doesn’t have restrictions on the output values of a (a ∈ R) the model will return y_pred ∈ R, which can be outside the range of the training set. Does that mean the output is correct? No. Prediction will be less good in regions of the data that were not present in the training data. That’s why extrapolation is hard for Neural Networks.

So what now? Is that it? No, there are so many solutions to that problem. It won’t give you the accuracy of interpolating a neural network, but still, it’ll give you good results. One of the solutions is using a rectifier unit g(x)=max(0,x) as an activation function. It’s an unbounded but non-linear link function. It also keeps the nice “universal approximation” property of the neural network, but with unbounded outputs.

That was my first blog post. I really enjoyed writing this, and learned so much from it and I really hope you like it too. :)

]]>