Building a port scanner

AFAIK with_mutex/2 is not needed in add_port_scan_result/2 because there should only ever be one fact for each port and once the facts are generated after the first time I don’t ever expect them to change. Also only one thread will be adding the new fact for the specific port once, that thread will not be competing with other threads and thus no lock is needed. Since this was created to test the open ports of a specific site if something changes then there is something very wrong somewhere.

Since there are so few examples of working library(persistency) code to be found, I am leaving with_mutex/2 for the write in because people are likely to copy this and if the with_mutex/2 were missing they would be wondering why on rare days their code does not work.

EDIT

Starting with SWI-Prolog 8.3.3 library(persistency) (GitHub Commit) was made more thread friendly by wrapping the four predicates with with_mutex/2.

So

add_port_scan_result(Request,Response) :-
    with_mutex(port_scan_result_journal, assert_port_scan_result(Request,Response)).

can be changed to

add_port_scan_result(Request,Response) :-
    assert_port_scan_result(Request,Response).

From documentation

This module requires the same thread-synchronization as the normal Prolog database. This implies that if each individual assert or retract takes the database from one consistent state to the next, no additional locking is required. If more than one elementary database operation is required to get from one consistent state to the next, both updating and querying the database must be locked using with_mutex/2.

The persistency library requires the same locking as normal query/assert/retract: as long as each individual assert/retract moves from one consistent state to the next, no locking is required. Only if moving from one consistent state to the next requires multiple retract/retract operations locking is required, both for the reader and writer. Eventually, transactions will require multiple modifications to be wrapped in a transaction and query to require nothing special.

1 Like

I don’t know if I am understanding that sentence correctly.

My take is that this will be like SQL databases where one wraps a series of statements in a transaction and once the transaction completes or is committed will the updates occur. But the use of the word eventually means it does not exist at present. That does make sense.

1 Like

format/3 is atomic, like all the core stream write predicates.

No I mean the ordering might get corrupted, so that when you re-run the journal from file, you don’t get what you had in memory before hand. This is result from pre-emptive concurrency

of threads, it could happend without locking that the library(persistency) commits the following error. Just look at db_assert/1 and db_retract/1:

  Thread 1:                 Thread 2:
  assert/1                    
                            retract/1
                            write_action/2
  write_action/2

So you did assert/1 retract/1, but while reading the journal it will do retract/1 assert/1. But I guess you will not see this interleaving for the port scanner, since it has only assert. You need to make another example.I did an example fill_unsafe on comp.lang.prolog and on twitter which also illustrates an interleaving problem. It might take thousands of operations until the unlucky interleaving happens.

2 Likes

Here are the results on the modified code. This is the latest version compiled with GCC 9 using PGO, running on Ubuntu 20.04, AMD 3950X, 16 cores, 32 threads. As you see, up to 16 threads the speedup is close to optimal (14 times). Hyperthreading apparently does very little in this case.
The code is attached.

18 ?- time(count(N)).
% 69,907,410 inferences, 3.583 CPU in 3.589 seconds (100% CPU, 19512169 Lips)
N = 78499.

19 ?- time(count2(N,1)).
% 4,104 inferences, 0.014 CPU in 3.536 seconds (0% CPU, 287142 Lips)
N = 78499.

20 ?- time(count2(N,2)).
% 4,132 inferences, 0.014 CPU in 1.786 seconds (1% CPU, 293717 Lips)
N = 78499.

21 ?- time(count2(N,4)).
% 4,202 inferences, 0.024 CPU in 0.899 seconds (3% CPU, 178360 Lips)
N = 78499.

22 ?- time(count2(N,8)).
% 4,392 inferences, 0.005 CPU in 0.468 seconds (1% CPU, 937750 Lips)
N = 78499.

23 ?- time(count2(N,16)).
% 4,950 inferences, 0.005 CPU in 0.250 seconds (2% CPU, 1055806 Lips)
N = 78499.

24 ?- time(count2(N,32)).
% 6,832 inferences, 0.007 CPU in 0.213 seconds (3% CPU, 1041262 Lips)
N = 78499.

25 ?- A is 3.536/0.250.
A = 14.144.

primes.pl (553 Bytes)

1 Like

Right. Added locks to the basic update predicates. This still requires the user to add another layer of locks if multiple related changes are needed. Updated the docs to explain this more clearly.

Thanks for spotting.

2 Likes

Today Jan W. released SWI-Prolog 8.3.3 which included a modified version of Jan B. balance/2,3 named concurrent_and/2,3.

So I gave it a try.

The first thing I learned the hard way is that you have to accept each answer, e.g. press the space bar after true is displayed.

concurrent_scan(IP_address,Low_port,High_port,Number_of_threads) :-
    concurrent_and(
        between(Low_port,High_port,Port),
        port_scan(IP_address,Port),
        [threads(Number_of_threads)]
    ).

port_scan(IP_address,Port) :-
    catch(
        setup_call_cleanup(
            tcp_socket(Socket),
            (
                % Open stream socket based on TCP/IP which uses IP address and port number, i.e. INET socket
                tcp_connect(Socket, IP_address:Port),
                format('Port ~w: open~n', [Port])
            ),
            tcp_close_socket(Socket)
        ),
        error(_,_),
        true
    ).

Example run

?- concurrent_scan('140.211.166.101',79,82,3).
Port 80: open
true ;
true ;
true ;
true ;
false.

Adding fail to the end of the query will fix having to press the space bar but this is not the way concurrent_and/2 is designed to work, e.g.

?- concurrent_scan('140.211.166.101',79,82,3),fail.
Port 80: open
false.

If one looks at the test cases in test_balance.pl one sees the use of setof/3. Incorporating setof/3 leads to

concurrent_scan_02(IP_address,Low_port,High_port,Number_of_threads,Filtered_results) :-
    setof(
        Port-Result,
        concurrent_and(
            between(Low_port,High_port,Port),
            port_scan_02(IP_address,Port,Result),
            [threads(Number_of_threads)]
        ),
        Results
    ),
    include(filter,Results,Filtered_results).

port_scan_02(IP_address,Port,Result) :-
    catch(
        setup_call_cleanup(
            tcp_socket(Socket),
            (
                % Open stream socket based on TCP/IP which uses IP address and port number, i.e. INET socket
                tcp_connect(Socket, IP_address:Port),
                Result = open
            ),
            tcp_close_socket(Socket)
        ),
        error(_,_),
        Result = closed
    ).

filter(Port-open).

Example run

?- concurrent_scan_02('140.211.166.101',79,82,3,Results).
Results = [80-open].

A more comprehensive example.

?- time(concurrent_scan_02('140.211.166.101',1,65536,8192,Results)).
% 135,209,226 inferences, 16.281 CPU in 187.520 seconds (9% CPU, 8304597 Lips)
Results = [22-open, 80-open, 443-open].

8192 threads checking 65536 ports in ~3 minutes.

Probably not. On the other hand I don’t think I’d use a lock. Worst case you have a couple of redundant entered/1 clauses. They should not affect the result and it is pretty unlikely there will be so many that it has significant impact on memory usage.