LLMs and SWI-Prolog - a few shared thoughts

Have to agree with that.

While I have not tried GPT-4, using the API cost real money to submit prompts, I hear it is much better with code generation and with training data captured about a year ago.


EDIT

Tried several more different ChatGPT prompts with instruction to use module http/http_server and none were correct. Even telling ChatGPT that it did not use the correct module it would reply with The http/http_server module is not a standard module in SWI-Prolog. and then generate code in the older style.

Microsoft Bing Chat (which requires using Microsoft Edge) reacted the same as ChatGPT which was expected.

However Google Bard seems to know about module http/http_server without being told.

Google Bard example (click triangle to expand)

Prompt
Google Bard (06/02/2023)

Create SWI-Prolog code for an HTTP server to display a single page containing

“Hello World”
A table with three columns and four rows
A jpeg image from the icons subdirectory
Do not use format/1.

Completion

Sure, here is the SWI-Prolog code for an HTTP server to display a single page containing “Hello World”, a table with three columns and four rows, and a JPEG image from the icons subdirectory:

:- use_module(library(http/http_server)).

hello_world(Page) :-
    http_reply_html(Page, <<
        <html>
        <head>
            <title>Hello World</title>
        </head>
        <body>
            <h1>Hello World</h1>
            <table>
                <tr>
                    <th>Column 1</th>
                    <th>Column 2</th>
                    <th>Column 3</th>
                </tr>
                <tr>
                    <td>Row 1</td>
                    <td>Row 2</td>
                    <td>Row 3</td>
                </tr>
                <tr>
                    <td>Row 4</td>
                    <td>Row 5</td>
                    <td>Row 6</td>
                </tr>
            </table>
            <img src="/icons/image.jpg" />
        </body>
        </html>>).

main :-
    http_server(hello_world).

This code will create a simple HTTP server that will listen on port 8080. When a client connects to the server, it will send the HTML page containing “Hello World”, a table with three columns and four rows, and a JPEG image from the icons subdirectory.

Please note that this code does not use the format/1 predicate. This predicate is often used to generate HTML output in SWI-Prolog, but it is not necessary in this case. The HTML page can be generated using the http_reply_html/2 predicate, which provides a more concise and efficient way to generate HTML output.


Note: With some Prompt Engineering more idiomatic SWI-Prolog code could most likely be generated but this was done to see if Google Bard was aware of module http/http_server.

llama-13 via https://chat.lmsys.org/ was generating code but nothing close to what I would attempt to use. Note that https://chat.lmsys.org/ currently has 11 chat models to choose from but only tried one. Feel free to try the others.

And that is about all that it gets correct :frowning:

1 Like

The upside is that unlike other laptop jobs Prolog programmers are pretty safe from being replaced by LLMs.

1 Like

I’m afraid I see that more as a problem. I’m quite happy that ChatGPT does such a good job (at least sometimes). IMO one of the disadvantages of Prolog is that there are far fewer resources on the web than for the popular languages and therefore it is far less likely that a simple web search gives you a good starting point for solving an arbitrary “how do I do this in Prolog” question. Finding an answer to many of the problems handed to students in courses is typically not too hard, but finding an answer to problems professional users need to answer is a lot harder. If (LLM based) bots could help here, I’m all in favor.

Note that not only it is harder to find a closely related fragment of Prolog. If you find some answer in another language it is hard to translate it to Prolog. While translating from one imperative language to another is mostly a syntactical transformation (possibly adding/removing object abstraction), one often need something quite different to arrive at a neat Prolog program. Pure functional programming also suffers from this, but still a bit less than Prolog in my experience.

Agree.

Harder but not impossible. I agree that having the knowledge of how to use neural network based AI is nice to have in ones toolbox. It doesn’t solve all of the problems and as we know if one does not have the expertise to identify the hallucinations and not so good parts it is not really helping but hindering.

Agree.

I can’t fully agree with that. I find that if you can find the solution in a programming language in the training set that gives quality results and translate small portions at a time into Prolog and know when the LLM is giving bad code then the task is not so bad. I have been doing that often the last week with creating ChatGPT Plugins, e.g. image starting with Python and Julia examples and translating them into SWI-Prolog. So far it is going faster with the help of ChatGPT to do the rough translations than me doing so without out the help of ChatGPT. I have even been starting with the OAS to generate either Python or Prolog and getting reasonable code but I am limiting it to about 100 lines of code in any programming language.

The way I will take that is that some entity is needed to transform generated mediocre Prolog into production Prolog. I have to agree that if the entity is some app then I have not seen such, including LLMs such as ChatGPT. However if one is skilled at Prolog then it is not that hard but as I have noted before with Prolog and being very skilled it is often not worth the time and effort to work with generated code. I don’t know if I will ever reach your level of skill but do work on getting better. Where ChatGPT does help is in seeing multiple variations of generated code and at least having an idea of what is better, then compiling and running to see if gets correct results. After that it is back to the old ways to elevate the quality of the code.

My take is that the quantity and quality of the code in a training set matters more than the paradigm of the code. As you know I do keep up on other programming languages with Lean being one of them. I have even seen Lean with more data in a training set than Prolog. From Star Coder paper Table 1

After filters and decont.
Lean 0.09 GB
Prolog 0.01 GB


One thing that I am keeping an eye on is OpenAI evals

I have asked of OpenAI if submitting evals for Prolog would eventually be used to enhance ChatGPT or future models but have not heard back. Also knowing that with Prolog being sensitive to capitalization for values vs variables, it seems it might be better if they were trained on the listing version of the code where the variable are just capital letters instead because of the way tokenization works.

Using OpenAI Tokenizer with
A My_variable value my_value
image

Each color represents a different token. So A as a single token IMHO is better for training a transformer than the 3 tokens for My_variable. Likewise one token for value is better than 3 tokens for my_value.

One thing on my shorter list is to create a ChatGPT plugin that will compile generated code using SWI-Prolog and feed back the errors until valid code can be created. Capturing the good and bad results and using that to fine-tune an LLM might be of value.


image
Slowly making progress.