I think it would be a good idea to integrate swi-prolog to use large language models locally in your own machine (without a requirement to access a cloud service). There are several reasons for joining prolog and an llm together:
- llm’s are not good at hierarchical planning of tasks, prolog is excellent for this
- llm’s have a limited context for input data (a.k.a. prompt) so they often need external “memory” to be used to feed data into the prompt. Prolog is a great match for this given its database of rules and facts. With prolog we can easily have a “smart” memory for the llm as opposed to passive data.
- even if the model supported a very large context size, the inference becomes slower, so having access to “memory” by using prolog is a great advantage even with large context sizes
- llm’s are not grounded in factual information, prolog can be used to double check llm output by using rules.
- llm’s can produce data in prolog term format (either through few-shot examples -although this is flaky- or using a grammar that restricts output tokens to prolog facts (this is already implemented in llama.cpp)
Integrate llama.cpp
I think the best option out there is llama.cpp, MIT licensed, and it allows you to run open source llms on your local machine. It produces one executable (called main) plus many other “example” executables which allow different features/functionality. To test it out do the following:
- clone the llama.cpp repo, and build it running
make -j 8 LLAMA_FAST=1 - download a model, like codellama-7b-instruct.Q4_K_S.gguf
- NOTE:
ggufis the new quantized binary file format for models, ggml was the previous format (no longer supported).
- NOTE:
- run the model in interactive chat mode with
./main -m /tmp/codellama-7b-instruct.Q4_K_S.gguf -i --color --in-prefix '[INST] ' --in-suffix '\n\n[/INST]' --multiline-input --interactive-first -e -c 16384 -s 1333 - type
\<enter>to tell the model to start generating after you type your prompt. In the example below I simply ask it to produce a quicksort implementation in C. You can see the ouput of the model bellow after[/INST].
sample output:
[...]
llama_model_loader: loaded meta data with 17 key-value pairs and 291 tensors from /tmpd/huggingface/hub/models--llama/codellama-7b-instruct.Q4_K_S.gguf (version GGUF V1 (support until nov 2023))
[...]
== Running in interactive mode. ==
- Press Ctrl+C to interject at any time.
- To return control to LLaMa, end your input with '\'.
- To return control without starting a new line, end your input with '/'.
[INST] You are an expert C programmer. Please write a quicksort function.
[/INST] Sure! Here's a simple implementation of the QuickSort algorithm in C:
```c
void quicksort(int *arr, int low, int high) {
if (low < high) {
// Partition the array around its midpoint
int mid = (low + high) / 2;
int partition_index = partition(arr, low, high, mid);
// Sort the left and right parts recursively
quicksort(arr, low, partition_index - 1);
quicksort(arr, partition_index + 1, high);
}
}
int partition(int *arr, int low, int high, int mid) {
// Move the pivot element to the end of the array
int pivot = arr[mid];
swap(&arr[mid], &arr[high]);
// Partition the array around the pivot
int i = low - 1;
for (int j = low; j < high; j++) {
if (arr[j] <= pivot) {
i++;
swap(&arr[i], &arr[j]);
}
}
// Move the pivot element to its final position
swap(&arr[i + 1], &arr[high]);
return i + 1;
}
```
Note that this implementation uses a "middle-first" partitioning scheme,
which is efficient for sorted arrays. The `partition` function moves the
pivot element to the end of the array and then partitions the remaining
elements around it using a simple swap algorithm. The `quicksort`
function then recursively sorts the left and right parts of the array.
[INST]
Advantages of llama.cpp to integrate into swi-prolog
- It runs on CPU and GPU
- it is fast
- it has an MIT license
- it does not require any dependencies
- although it needs C++, the API is a simple C api.
- most of the GUI’s that are being developed out there (for local llms) use llama.cpp
- it is heavily maintained with features added all the time
- it has support for sampling tokens by defining a BNF grammar, which could be used to generate prolog facts from the model.
- it lacks good documentation, but the code and API is easy to read for a C developer, and there are plenty of real examples and features in the
examplesdirectory.
All in all I think this would be a great addition to SWI-Prolog.