I am asked to provide a demo of my work to a 3rd party.
This would require creating an easy to deploy demo system – possibly packaged as a virtual machine image running within VirtualBox.
The VirtualBox would act as a server for a browser based application, and internally there is a websocket based demo that interacts via websockets with the browser based application a well.
I am wondering how to best secure the provided Prolog demo, so that internal code can’t be accessed – even if access to the virtual box or the virtual hard drive is achieved.
I want to protect the code files included in the file system, as well as the running system, so it can’t be accessed by a top level only through a predefined API.
As well as any other kind of projection – necessary you can suggest …
Can this be achieved?
Edit:
I noticed the discussion on obfuscation but this option seems limited.
Perhaps, the way to go is to create a wrapper C/C++ program that includes in an embedded way a private key – and in a way that is hard to extract (perhaps encoded in some way) which decrypts program files after they are read and then internally loaded into a prolog process.
Does this make sense – can the internally loaded prolog program then be a websocket client as well …
Its a matter that came up today that I need to look into – i wanted to “distribute” the demo as a real web app – running off an AWS or Oracle VM in the cloud and provide browser based access only.
My assumption is that running a demo in the cloud enables protecting the source since the only access provided is a front-end web app via the 80 port – so the internal websocket communication is all local on the cloud VM. And the source code is there as well – secured behind an SSH login and a closed / disabled websocket port in the VM firewall.
On oracle (and probably AWS as well) i can even encrypt the virtual drive … although, not sure at what point its decryted when an application is run and accesses a file.
But, i was asked to provide the demo on prem so that it runs without internet connection as well – and now i am scratching my head if this can be done for an swi-prolog based demo
Good point, but no – it has to be deployed on a 3rd party laptop … without me around …
Essentially, i might give them a ova file they can import into a virtualbox – which when run, would start evertying internally – but would also be protected from access
The dongle is really a licensing enforcement device.
At this stage the demo is not really licensed – they can use it as much as they want – but, it seems protecting internal prolog code is a headache …
In the worst case I will have to rewrite internal code into a compiled language – C/C++ just to protect - at least – those parts I feel more strongly about …
Well, at the very least you could compile everything into a .qlf file. They could pull a listing/0 out of it, for sure, because swipl is by necessity capable of decompiling bytecode into Prolog, but it’d be annoying to do at least. Also, they would only recover the immediate implementation, not (for example) any raw DCG sources or term_expansion/2 inputs.
(It’d offer you some small measure of legal protection, too, depending on where you live. In the US, for example, you could make a reasonable claim that by extracting decompiled sources, they’re violating the DMCA’s prohibition against reverse engineering. But if it’s the algorithms itself that you’re trying to protect and not this actual implementation, then that wouldn’t do you a ton of good.)
You know – maybe term expansion can indeed be a “poor mans” way to create obfuscation … so that the compiled and de-compiled code is meaningless symbols …
I understand that Prolog has an obfuscation flag but i understand that its limited and not well worked out.
My plan and hope is to indeed go open source – but, this requires some exploring up front – and i am not there yet.
Right now its important to protect – if possible, the same way as compiled code - or, by use of some encryption approach -, or at least some good obfuscation approach.
re: enhancing ensure_loaded
If this can be done, then I really like this idea.
First, because if you do want to get swi prolog used commercially, then you do want to have a good built-in solution for source code protection.
And second, because, if i can hire someone to do this, i can readily release the results into open source so everyone can benefit from it …
Perhaps i found an easy to do approach that could work:
Essentially, I use Oracles Virtual Box to create an (say ubuntu) linux installation that can run the demo – quite like on the cloud – once the VM runs its a web server in a box.
In addition internally, i create a secured account that can only be logged in with a strong password.
I store the prolog source in an encrypted file system – essentially an encrypted image file – that can be mounted to become clear text.
Finally, i create an internal script that mounts an encrypted file during startup to make it clear text – i.e. the file system is mounted only so long the OS is running – hence the external file that holds the virtual hard disk remains encrypted – or at least the files stored there remain encrypted.
So, while the guest OS is running within the VM – the prolog files are clear text – and their access is protected by a strong password.
Could this work – or do i miss something crucial …
Right … there is physical access – but, that can be limited by:
by ensuring that data stored on the virtual hard disk is encrypted – so that the virtual hard disk can’t be mounted into a different VM its contents freely read
by creating an account with a strong password so that no logging can offur
by running a “hardened” linux system inside – which also manages the ports it opens to the outside world
So, you get the benefits on cloud – on prem.
I might be missing something trivial – which makes this house of cards fall apart – but, perhaps it is workable.
Right – so, the idea is that the private key is obfuscated and stored inside the VM – and is used by a startup script, to decrypt and mount an image file that holds the sources.
But, it needs more thinking – how to secure the startup script even if it can be seen in clear text – to not give away the private key … perhaps that’s why it doesnt work
So, nothing has to be given to the client – the client simply runs the VM and nothing more.
I don’t have an example, so you’d have to experiment a bit. In the description of term_expansion/4, it says "The output layout should be a variable if no layout information can be computed for the expansion; a sub-term can also be a variable to indicate “don’t know’'. Based on this, I think that if your term_expansion/4 leaves Layout2 as a variable, then no location or variable name information is recorded. But I could be wrong.
In the end, if your customer has physical access to the machine and must be capable to run the program it is not possible to protect your IP completely. Best you can probably achieve is to run the software as an (HTTP) service from a VM and protect the VM as good as you can. For Prolog, the best you can achieve is to create a saved state and use the obfuscate option. How effective this is depends on the structure of your code. You can prevent listing/0 using the Prolog flag protect_static_code. This works fine to protect code from people with access to the toplevel, but not for people with access to the state as they can recompile Prolog to bypass this protection.
Note that a saved state looses variable names. With obfuscation it will also obfuscate some of the predicate names. It will be quite a challenge to make sense of the code. Yes, Prolog decompiles a bit more reliably than compiled languages. On the other hand, there are far fewer people capable of understanding Prolog code, even more so if a lot of the clues such as variable names are lost.
There isn’t much that can be improved here. To some extend the obfuscating library can be changed to be less defensive. The current one pretty much guarantees it won’t break your program. It leaves a lot of names unchanged because it cannot prove the identifiers are not accessed as clear text somewhere in the program.
term_expansion/4 doesn’t help to obfuscate code. The source level debugger works on knowing the line and file of each clause and reconstruct the layout and variable names from the source on demand. If the source code is not present that no longer works. Variable names are never recorded in SWI-Prolog.
I wonder if prolog can be configured to start as a “server” process without requiring logon into a linux account, while having the saved state file located in a “root” folder that is accessible to the prolog process running as root only.
If this is possible, then by merely booting the VM box, could already make the prolog app accessible to the outside directly or indirectly via http …
Edit:
This only works if the root folder is encrypted and is automatically decrypted for server processes running as a root user. Otherwise, the saved state file can be accessed by inspecting / mounting the virtual harddrive on another VM
Sure. Just register it as (typically) a systemd service.
But it doesn’t really help. The key to decrypt cannot be on the encrypted drive, so it is somewhere in the startup scripts for the VM. Given the key, you can mount the drive. Given physical access to the machine you can always recover that key. You can have the key or even the decryption process on a USB key. In the end it must end up somewhere in memory and you can inject code into the process to examine the state.
Same holds for any attempt to make Prolog start from encrypted files: the decrypt key must be somewhere. Once you have access to the saved state you can make a modified version of Prolog that will load the state and provide a toplevel that allows you to examine the state. You can obfuscate that a little using encryption, generating a version of Prolog that renumbers the VM instructions such that a standard Prolog VM doesn’t work, etc. Nothing like that will stop a good hacker.
Closest you can get is probably locking the systems boot process and make sure the user has no access to a shell or something of comparable power. That should be the physical system though, not a VM running on top of an accessible host.
Someone mentioned to me to TPM chip – if i want to go all the way like this – i could perhaps figure out how to get the key into TPM and then read from by Prolog to decrypt the saved state during load.
Dan
There is also c('PL_set_resource_db_mem'), which allows you to start Prolog from an embedded string. This allows you to set the mode to 555 (execute, but non-readable). If make sure the executable is owned by root (or any other user not being the demo user) and you protect the boot process of the machine such that the demo user can only boot the target OS and cannot get an alternative way to mount the drive, it gets hard to bypass.