I think it would be a good idea to integrate swi-prolog to use large language models locally in your own machine (without a requirement to access a cloud service). There are several reasons for joining prolog and an llm together:
- llm’s are not good at hierarchical planning of tasks, prolog is excellent for this
- llm’s have a limited context for input data (a.k.a. prompt) so they often need external “memory” to be used to feed data into the prompt. Prolog is a great match for this given its database of rules and facts. With prolog we can easily have a “smart” memory for the llm as opposed to passive data.
- even if the model supported a very large context size, the inference becomes slower, so having access to “memory” by using prolog is a great advantage even with large context sizes
- llm’s are not grounded in factual information, prolog can be used to double check llm output by using rules.
- llm’s can produce data in prolog term format (either through few-shot examples -although this is flaky- or using a grammar that restricts output tokens to prolog facts (this is already implemented in llama.cpp)
Integrate llama.cpp
I think the best option out there is llama.cpp, MIT licensed, and it allows you to run open source llms on your local machine. It produces one executable (called main
) plus many other “example” executables which allow different features/functionality. To test it out do the following:
- clone the llama.cpp repo, and build it running
make -j 8 LLAMA_FAST=1
- download a model, like codellama-7b-instruct.Q4_K_S.gguf
- NOTE:
gguf
is the new quantized binary file format for models, ggml was the previous format (no longer supported).
- NOTE:
- run the model in interactive chat mode with
./main -m /tmp/codellama-7b-instruct.Q4_K_S.gguf -i --color --in-prefix '[INST] ' --in-suffix '\n\n[/INST]' --multiline-input --interactive-first -e -c 16384 -s 1333
- type
\<enter>
to tell the model to start generating after you type your prompt. In the example below I simply ask it to produce a quicksort implementation in C. You can see the ouput of the model bellow after[/INST]
.
sample output:
[...] llama_model_loader: loaded meta data with 17 key-value pairs and 291 tensors from /tmpd/huggingface/hub/models--llama/codellama-7b-instruct.Q4_K_S.gguf (version GGUF V1 (support until nov 2023)) [...] == Running in interactive mode. == - Press Ctrl+C to interject at any time. - To return control to LLaMa, end your input with '\'. - To return control without starting a new line, end your input with '/'. [INST] You are an expert C programmer. Please write a quicksort function. [/INST] Sure! Here's a simple implementation of the QuickSort algorithm in C: ```c void quicksort(int *arr, int low, int high) { if (low < high) { // Partition the array around its midpoint int mid = (low + high) / 2; int partition_index = partition(arr, low, high, mid); // Sort the left and right parts recursively quicksort(arr, low, partition_index - 1); quicksort(arr, partition_index + 1, high); } } int partition(int *arr, int low, int high, int mid) { // Move the pivot element to the end of the array int pivot = arr[mid]; swap(&arr[mid], &arr[high]); // Partition the array around the pivot int i = low - 1; for (int j = low; j < high; j++) { if (arr[j] <= pivot) { i++; swap(&arr[i], &arr[j]); } } // Move the pivot element to its final position swap(&arr[i + 1], &arr[high]); return i + 1; } ``` Note that this implementation uses a "middle-first" partitioning scheme, which is efficient for sorted arrays. The `partition` function moves the pivot element to the end of the array and then partitions the remaining elements around it using a simple swap algorithm. The `quicksort` function then recursively sorts the left and right parts of the array. [INST]
Advantages of llama.cpp to integrate into swi-prolog
- It runs on CPU and GPU
- it is fast
- it has an MIT license
- it does not require any dependencies
- although it needs C++, the API is a simple C api.
- most of the GUI’s that are being developed out there (for local llms) use llama.cpp
- it is heavily maintained with features added all the time
- it has support for sampling tokens by defining a BNF grammar, which could be used to generate prolog facts from the model.
- it lacks good documentation, but the code and API is easy to read for a C developer, and there are plenty of real examples and features in the
examples
directory.
All in all I think this would be a great addition to SWI-Prolog.