Protecting deployed on-prem demo

Hello,

I am asked to provide a demo of my work to a 3rd party.

This would require creating an easy to deploy demo system – possibly packaged as a virtual machine image running within VirtualBox.

The VirtualBox would act as a server for a browser based application, and internally there is a websocket based demo that interacts via websockets with the browser based application a well.

I am wondering how to best secure the provided Prolog demo, so that internal code can’t be accessed – even if access to the virtual box or the virtual hard drive is achieved.

I want to protect the code files included in the file system, as well as the running system, so it can’t be accessed by a top level only through a predefined API.

As well as any other kind of projection – necessary you can suggest …

Can this be achieved?

Edit:

I noticed the discussion on obfuscation but this option seems limited.

Perhaps, the way to go is to create a wrapper C/C++ program that includes in an embedded way a private key – and in a way that is hard to extract (perhaps encoded in some way) which decrypts program files after they are read and then internally loaded into a prolog process.

Does this make sense – can the internally loaded prolog program then be a websocket client as well …

thanks,

Dan

Question is, are you allowed to do that? Is this some
byproduct of your thesis? In the USA I read its like this.
But I don’t know about Canada or some other country.

In the US, most university students retain the copyright for their thesis. Often they are required to grant the university and/or ProQuest a non-exclusive license to distribute the thesis, but without giving up copyright.

https://academia.stackexchange.com/q/37796

https://about.proquest.com/en/dissertations/

So if others have anyway the right to distribute your work, it
doesn’t make much sense to protect it by some DRM, unless
it is some further derivative work.

Software made under some employer might have again
special rulings.

Thanks.

No connection to my doctoral work or thesis … and its code that i want to protect

Dan

Ok, I though its a reaction on posting a link to your PhD thesis.
But PhD thesis are supposed to be distributed? I did this in
good faith, the link was from Bibliothèque et Archives

Canada. I don’t know how much personal information it reveals. I didn’t
check, maybe it contains a sensitive CV, you already use the handle
@grossdan here. So after you posted a paper by some Eric, google was

all I needed. :slight_smile: But you could add it to the staple food of Prolog,
SWI-Prolog website has a publication corner, in case it were (SWI-)Prolog
related: For example Jan W. has put his PhD thesis here:

https://twitter.com/moustaki/status/1445107356

http://www.swi-prolog.org/download/publications/jan-phd.pdf

No, the question is unrelated …

Its a matter that came up today that I need to look into – i wanted to “distribute” the demo as a real web app – running off an AWS or Oracle VM in the cloud and provide browser based access only.

My assumption is that running a demo in the cloud enables protecting the source since the only access provided is a front-end web app via the 80 port – so the internal websocket communication is all local on the cloud VM. And the source code is there as well – secured behind an SSH login and a closed / disabled websocket port in the VM firewall.

On oracle (and probably AWS as well) i can even encrypt the virtual drive … although, not sure at what point its decryted when an application is run and accesses a file.

But, i was asked to provide the demo on prem so that it runs without internet connection as well – and now i am scratching my head if this can be done for an swi-prolog based demo

Dan

Good point, but no – it has to be deployed on a 3rd party laptop … without me around …

Essentially, i might give them a ova file they can import into a virtualbox – which when run, would start evertying internally – but would also be protected from access

The dongle is really a licensing enforcement device.

At this stage the demo is not really licensed – they can use it as much as they want – but, it seems protecting internal prolog code is a headache …

In the worst case I will have to rewrite internal code into a compiled language – C/C++ just to protect - at least – those parts I feel more strongly about …

(post deleted by author)

Well, at the very least you could compile everything into a .qlf file. They could pull a listing/0 out of it, for sure, because swipl is by necessity capable of decompiling bytecode into Prolog, but it’d be annoying to do at least. Also, they would only recover the immediate implementation, not (for example) any raw DCG sources or term_expansion/2 inputs.

(It’d offer you some small measure of legal protection, too, depending on where you live. In the US, for example, you could make a reasonable claim that by extracting decompiled sources, they’re violating the DMCA’s prohibition against reverse engineering. But if it’s the algorithms itself that you’re trying to protect and not this actual implementation, then that wouldn’t do you a ton of good.)

Thank you.

You know – maybe term expansion can indeed be a “poor mans” way to create obfuscation … so that the compiled and de-compiled code is meaningless symbols …

I understand that Prolog has an obfuscation flag but i understand that its limited and not well worked out.

Dan

My plan and hope is to indeed go open source – but, this requires some exploring up front – and i am not there yet.

Right now its important to protect – if possible, the same way as compiled code - or, by use of some encryption approach -, or at least some good obfuscation approach.

re: enhancing ensure_loaded

If this can be done, then I really like this idea.

First, because if you do want to get swi prolog used commercially, then you do want to have a good built-in solution for source code protection.

And second, because, if i can hire someone to do this, i can readily release the results into open source so everyone can benefit from it …

Dan

Perhaps i found an easy to do approach that could work:

Essentially, I use Oracles Virtual Box to create an (say ubuntu) linux installation that can run the demo – quite like on the cloud – once the VM runs its a web server in a box.

In addition internally, i create a secured account that can only be logged in with a strong password.

I store the prolog source in an encrypted file system – essentially an encrypted image file – that can be mounted to become clear text.

Finally, i create an internal script that mounts an encrypted file during startup to make it clear text – i.e. the file system is mounted only so long the OS is running – hence the external file that holds the virtual hard disk remains encrypted – or at least the files stored there remain encrypted.

So, while the guest OS is running within the VM – the prolog files are clear text – and their access is protected by a strong password.

Could this work – or do i miss something crucial …

Dan

1 Like

Right … there is physical access – but, that can be limited by:

  1. by ensuring that data stored on the virtual hard disk is encrypted – so that the virtual hard disk can’t be mounted into a different VM its contents freely read

  2. by creating an account with a strong password so that no logging can offur

  3. by running a “hardened” linux system inside – which also manages the ports it opens to the outside world

So, you get the benefits on cloud – on prem.

I might be missing something trivial – which makes this house of cards fall apart – but, perhaps it is workable.

Right – so, the idea is that the private key is obfuscated and stored inside the VM – and is used by a startup script, to decrypt and mount an image file that holds the sources.

But, it needs more thinking – how to secure the startup script :slight_smile: even if it can be seen in clear text – to not give away the private key … perhaps that’s why it doesnt work :slight_smile:

So, nothing has to be given to the client – the client simply runs the VM and nothing more.

I suggest using term_expansion/4 rather than term_expansion/2, to confuse SWI-Prolog’s decompiler about the names of variables.

Thank you Peter,

Could you perhaps show one example that i could study –

I don’t have an example, so you’d have to experiment a bit. In the description of term_expansion/4, it says "The output layout should be a variable if no layout information can be computed for the expansion; a sub-term can also be a variable to indicate “don’t know’’. Based on this, I think that if your term_expansion/4 leaves Layout2 as a variable, then no location or variable name information is recorded. But I could be wrong. :wink:

In the end, if your customer has physical access to the machine and must be capable to run the program it is not possible to protect your IP completely. Best you can probably achieve is to run the software as an (HTTP) service from a VM and protect the VM as good as you can. For Prolog, the best you can achieve is to create a saved state and use the obfuscate option. How effective this is depends on the structure of your code. You can prevent listing/0 using the Prolog flag protect_static_code. This works fine to protect code from people with access to the toplevel, but not for people with access to the state as they can recompile Prolog to bypass this protection.

Note that a saved state looses variable names. With obfuscation it will also obfuscate some of the predicate names. It will be quite a challenge to make sense of the code. Yes, Prolog decompiles a bit more reliably than compiled languages. On the other hand, there are far fewer people capable of understanding Prolog code, even more so if a lot of the clues such as variable names are lost.

There isn’t much that can be improved here. To some extend the obfuscating library can be changed to be less defensive. The current one pretty much guarantees it won’t break your program. It leaves a lot of names unchanged because it cannot prove the identifiers are not accessed as clear text somewhere in the program.

3 Likes

term_expansion/4 doesn’t help to obfuscate code. The source level debugger works on knowing the line and file of each clause and reconstruct the layout and variable names from the source on demand. If the source code is not present that no longer works. Variable names are never recorded in SWI-Prolog.

I wonder if prolog can be configured to start as a “server” process without requiring logon into a linux account, while having the saved state file located in a “root” folder that is accessible to the prolog process running as root only.

If this is possible, then by merely booting the VM box, could already make the prolog app accessible to the outside directly or indirectly via http …

Edit:

This only works if the root folder is encrypted and is automatically decrypted for server processes running as a root user. Otherwise, the saved state file can be accessed by inspecting / mounting the virtual harddrive on another VM

Dan