Prolog totally missed the AI Boom

I’ve been thinking about the ongoing conversation, and I wanted to clarify something. Are you, Jan, suggesting that because LLMs have turned out to be such a good idea, maybe autoencoders (or a similar approach) could also be surprisingly effective in ways we haven’t fully explored yet?

From my perspective, LLMs work well because they index statistical relationships in massive token datasets, allowing for efficient next-token prediction. That doesn’t necessarily mean that autoencoders—or other compression-based methods—would benefit from the same kind of scaling and brute-force approach.

Are you thinking that autoencoders could play a bigger role in tasks like language modeling, retrieval, or structured representation learning? Or are you just drawing a general parallel between their unexpected effectiveness?

Would love to hear more about what you’re thinking!

Best,
Douglas

The problem I am trying to adress was already adressed here:

ILP and Reasoning by Analogy
Intuitively, the idea is to use what is already known to explain
new observations that appear similar to old knowledge. In a sense,
it is opposite of induction, where to explain the observations one
comes up with new hypotheses/theories.
Vesna Poprcova et al. - 2010
https://www.researchgate.net/publication/220141214

The problem consists in that ILP doesn’t try to learn and apply
analogies , whereas autoencoders and transformers typically try
to “Grok” analogies, so that with a fewer training data they can

perform well in certain domains. They will do some inferencing on the
part of the encoders also for unseen input data. And they will do
some generation on the part of the decoder also for unseen

latent space configurations from unseen input data. By unseen
data I mean data not in the training set. The full context window
may tune the inferencing and generation, which appeals to:

Analogy as a Search Procedure
Rumelhart and Abrahamson showed that when presented with
analogy problems like mokey:pig:gorilla:X, with rabbit, tiger, cow,
and elephant as alternatives for X, subjects rank the four options
following the parallelogram rule.
Matías Osta-Vélez - 2022
https://www.researchgate.net/publication/363700634

There are learning methods that work similarly like ILP, in
that they are based on positive and negative samples. And the
statistics can involve bilinear forms.

I just wanted to clarify, @j4n_bur53 (in the context of a message that was deleted) that I’m not angry at you, or at anyone in this conversation. I like to disagree robustly. I hope we can do this without having a fight.

Or I guess it’s the fault of the Mediterranean temperament. I also tend to wave my hands wildly and jump up and down when I really get into my stride. I kind of understand why this comes across as if I’m having some sort of crisis, but if you’re worried just watch the whites of my eyes: if they start rolling into my head, that’s when it starts to get dangerous :stuck_out_tongue:

Don’t worry I am more robust than that. Dealing
with something new or a paradigma change, often
starts that one lands in quadrant 1:


https://people.well.com/user/bbear/quadrant.html

The problem with connectionism is that its not
really new and that for those who used it already
for decades its also not a paradigma change.

I do not count myself as somebody well versed
in connectionism, I have only a basic artificial intelligence
literacy with inclusivity in that connectionism is

taken as a paradigma inside artificial intelligence.
The literacy of mine has become extremely rusty
and dusty, 40 years ago it was just part of the usual

curriculum for every computer science student, everybody
got an introduction into connectionism back then, and
there were for example financial institutes that used neural

networks for trading. So its not astonishing that part of modern
neural networks was invented in Lugano Switzerland,
which has a more mediterrane clima than for example Zurich.

But the application was rather speech:

Deep Learning in Neural Networks: An Overview
Jügen Schmidhuber - 2014 - The Swiss AI Lab IDSIA
The present survey, however, will focus on the narrower, but
now commercially important, subfield of Deep Learning (DL)
in Artificial Neural Networks (NNs)
https://arxiv.org/abs/1404.7828

The above stops at 2013, but it has already LSTM transformers.

1 Like

That’s a good paper. I always reach for that when I need a good review of neural nets.

Its doesn’t help me to implement something. Try this one, it has a
little Java code. But its a little ancient technologie using the sigmoid
activation function. And it seems to me it uses some graph datastructure:

Neural Networks
Rolf Pfieffer et al. - 2012
https://www.ifi.uzh.ch/dam/jcr:00000000-7f84-9c3b-ffff-fffffb34b58a/NN20120315.pdf

I guess it corresponds to this here, which is a SWI-Prolog and C
hybrid, when using FANN_SIGMOID:

FANN - Fast Artificial Neural Network
Package for SWI-Prolog - 2018
https://www.swi-prolog.org/pack/list?p=plfann

Translating the Java code to Prolog from the Pfeiffer paper into linear
algebra
using vectors and matrixes, I have now a little piece of pure
Prolog code, that runs also in the Browser, that can already learn an

AND, and its using the ReLU activation function, i.e. not the FANN_SIGMOID
activation function anymore. I simulated the bias by an extra input neuron
which is always 1, because I was too lazy to have bias in the model:

It can als learn an XOR:

I only use a simply update of the network via μ Δwij and the above is
from only 1000 iterations. So no momentum based method implemented
or otherwise advanced gradient search yet implemented:

% update(+Network, +Network, -Network)
update([V], _, [V])  :- !.
update([V,M|L], [_,M3|R], [V,M4|S]) :-
   maplist(maplist(muladd(0.1)), M3, M, M4),
   update(L, R, S).

The first network parameter is the forward evaluated network, and the
second network parameter is the error backward propagated network.
Libraries such as PyTorch cooperate with optimizer libraries

that provide a variety of gradient search methods. One needs
to study how these library are architectured so that they provide
plug and play. Maybe can bring the same architecture to Prolog:

A Gentle Introduction to torch.autograd

Next, we load an optimizer, in this case SGD with a
learning rate of 0.01 and momentum of 0.9. We register all
the parameters of the model in the optimizer.

optim = torch.optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)

https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html