Building a port scanner

SWI-Prolog (threaded, 64 bits, version 8.3.2) on Windows 10

In creating a port scanner for a web site tcp_connect/2 is wrapped with setup_call_catcher_cleanup/4. The catcher is catching the error and the clause to the handle the error is getting called but the code is not completing successfully and the error message is being printed.

The code uses :- set_prolog_flag(report_error,false). to suppress the printing of the error message so I don’t know why the error message is being printed.

The code runs correctly for an open port, e.g. 80, and causes a timeout exception for a port that is not handled, e.g. 79, but is not completing successfully to display the desired result: Port 79: closed

Example run

?- port_scan_range(80,80).
Port 80: open
true.

?- port_scan_range(79,79).
ERROR: Unhandled exception: Socket error: Connection timed out
ERROR: In:
ERROR:   [13] socket:tcp_connect(<socket>(0000000006D179B0),'140.211.166.101':79)
ERROR:   [12] setup_call_catcher_cleanup(user:tcp_socket(<socket>(0000000006D179B0)),user:tcp_connect(<socket>(0000000006D179B0),...),_22258,user:(...,...)) at c:/program files/swipl/boot/init.pl:562
ERROR:   [10] port_scan_range(79,79) at c:/users/eric/documents/notes/discourse swi-prolog osu osl/osu osl prolog/tcp_test.pl:33
ERROR:    [9] <user>
ERROR: 
ERROR: Note: some frames are missing due to last-call optimization.
ERROR: Re-run your program in debug mode (:- debug.) to get more detail.
Timeout error caught
?- current_prolog_flag(report_error,Flag).
Flag = false.

Code

:- set_prolog_flag(debug_on_error,false).   % Do not drop into debugger on error. Errors and exceptions are different with errors being more severe.
:- set_prolog_flag(report_error,false).     % Do not print error messages to screen.

% -----------------------------------------------------------------------------

port_scan_range(Start,End) :-
    must_be(positive_integer,Start),
    must_be(positive_integer,End),
    Start =< End,
    IP_address = '140.211.166.101',
    between(Start,End,Port),
    port_scan(IP_address,Port,Result),
    format('Port ~w: ~w~n',[Port,Result]).

port_scan(IP_address,Port,Result) :-
    setup_call_catcher_cleanup(
        tcp_socket(Socket),
        % Open stream socket based on TCP/IP which uses IP address and port number, i.e. INET socket
        tcp_connect(Socket, IP_address:Port),
        Catcher,
        (
            catcher(Catcher,Result),
            tcp_close_socket(Socket)
        )
    ).

catcher(exit,open).
catcher(
    exception(
        error(socket_error(wsaetimedout, 'Connection timed out'),
        _Context)
    ),
    closed
) :-
    format('Timeout error caught~n',[]).
catcher(Catcher,unknown) :-
    format('Catcher default case~n',[]),
    format('~w~n',[Catcher]).

By changing setup_call_catcher_cleanup/4 to setup_call_cleanup/3 and using catch/3 for the goal it works as desired. Also changed the predicate catcher/2.

port_scan(IP_address,Port,Result) :-
    setup_call_cleanup(
        tcp_socket(Socket),
        catch(
            (
                % Open stream socket based on TCP/IP which uses IP address and port number, i.e. INET socket
                tcp_connect(Socket, IP_address:Port),
                Result = open
            ),
            Catcher,
            catcher(Catcher,Result)
        ),
        tcp_close_socket(Socket)
    ).

catcher(
    error(socket_error(wsaetimedout, 'Connection timed out'),_Context),
    closed
) :- !.
catcher(Catcher,unknown) :-
    format('Catcher default case~n',[]),
    format('~w~n',[Catcher]).

Example run

?- port_scan_range(79,79).
Port 79: closed
true.

?- port_scan_range(80,80).
Port 80: open
true.

Jan W., when you get a chance,

is tcp_close_socket/1 needed? The documentation notes two cases where is it is needed but since this is port scanning and not using tcp_open_socket/3 I am not sure.

Yes, you do not create streams for the socket, so you must close the socket yourself.

The simplest way is probably this:

    catch(setup_call_cleanup(
             tcp_socket(S),
             tcp_connect(S, IP_address:Port),
             tcp_close_socket(S)),
          error(_,_),
          fail).

If this succeeds, the port is open, else it is closed. You may be more restrictive on the errors under which you consider the port closed, but I’m not sure that is needed.

Note that the report_error flag is probably not even used anymore (should check) and the debug_on_error flag just deals with whether or not the debugger is started when an error is uncaught. In this case you need catch/3 (not that the various *_cleanup predicates do not catch the error, they just guarantee that resource cleanup is called regardless of success/error/fail/cut/… that terminates the goal that needs the resource.

2 Likes

Complete version after suggestions by Jan W. and making port_scan_range/3 a proper failure driven loop.

port_scan_range(Start,End) :-
    must_be(positive_integer,Start),
    must_be(positive_integer,End),
    Start =< End,
    IP_address = '140.211.166.101',
    between(Start,End,Port),
    port_scan(IP_address,Port,Result),
    format('Port ~w: ~w~n',[Port,Result]),
    fail.
port_scan_range(_,_).

port_scan(IP_address,Port,Result) :-
    catch(
        setup_call_cleanup(
            tcp_socket(Socket),
            (
                % Open stream socket based on TCP/IP which uses IP address and port number, i.e. INET socket
                tcp_connect(Socket, IP_address:Port),
                Result = open
            ),
            tcp_close_socket(Socket)
        ),
        error(_,_),
        Result = closed
    ).

Example run

?- port_scan_range(79,80).
Port 79: closed
Port 80: open
true.
1 Like

I would suggest using forall/2 rather than a failure-driven loop – forall/2 is easier to debug. With a failure-driven loop, you can’t tell if the range part fails (which is fine) or the port_scan part fails (which is a bug).
[BTW, there’s no need for the Start=<End test – between/3 subsumes that.]

port_scan_range(IP_address, Start, End) :-
    forall(port_range(Start, End, Port),
           (   port_scan(IP_address, Port, Result),
               format('Port ~w: ~w~n', [Port, Result])
           )).

port_range(Start, End, Port) :-
    must_be(positive_integer, Start),
    must_be(positive_integer, End),
    between(Start, End, Port).

port_scan(IP_address,Port,Result) :-
    catch(
        setup_call_cleanup(
            tcp_socket(Socket),
            (
                % Open stream socket based on TCP/IP which uses IP address and port number, i.e. INET socket
                tcp_connect(Socket, IP_address:Port),
                Result = open
            ),
            tcp_close_socket(Socket)
        ),
        error(_,_),
        Result = closed
    ).
2 Likes

For those that know about port scanning, a typical scan will scan a thousand or more ports. Also it is better to return no reply if the port is not available which on the client side typically results in a timeout when trying to open the port.

Since the timeouts can take a few seconds and multiply that by a thousand, it can take some time.

5 seconds * 1000 = 5,000 seconds
5,000 seconds * (1 minute / 60 seconds) = 83.3 minutes
83.3 minutes - 60 minutes/hour = 1 hour 23 minutes

Since port scanning can be done in parallel it is worth the effort to use multithreading.

Update of code using a pool thread.

Based on test case.

start_scan_pool(Number_of_threads,Options) :-
	thread_pool_create(scan_pool,Number_of_threads,Options).
stop_scan_pool :-
	thread_pool_destroy(scan_pool).

scan_pool(IP_address,Low_port,High_port,Number_of_threads) :-
    start_scan_pool(Number_of_threads,[]),
    time(
        (
            findall(Id,
                (   between(Low_port,High_port,Port),
                    thread_create_in_pool(scan_pool,port_scan(IP_address,Port),Id,[])
                ), Ids),
            join_all(Ids)
        )
    ),
    stop_scan_pool.

% -----------------------------------------------------------------------------

join_all([]).
join_all([H|T]) :-
	thread_join(H, Status),
	assertion(Status == true),
	join_all(T).

% -----------------------------------------------------------------------------

port_scan(IP_address,Port) :-
    catch(
        setup_call_cleanup(
            tcp_socket(Socket),
            (
                % Open stream socket based on TCP/IP which uses IP address and port number, i.e. INET socket
                tcp_connect(Socket, IP_address:Port),
                Result = open
            ),
            tcp_close_socket(Socket)
        ),
        error(_,_),
        Result = closed
    ),
    format('Port ~w: ~w~n', [Port, Result]).

Example run.

?- scan_pool('140.211.166.101',1,9,3).
Port 2: closed
Port 3: closed
Port 1: closed
Port 5: closed
Port 4: closed
Port 6: closed
Port 7: closed
Port 8: closed
Port 9: closed
% 40,303 inferences, 0.031 CPU in 63.044 seconds (0% CPU, 1289696 Lips)
true.

A more comprehensive example run.

?- scan_pool('140.211.166.101',1,1023,50).
...
% 37,861 inferences, 0.047 CPU in 441.204 seconds (0% CPU, 807701 Lips)
true.

EDIT

More threads means less time.

1024 threads checking 65536 ports in ~23 minutes.

?- scan_pool('140.211.166.101',1,65536,1024).
Port 22: open
Port 80: open
Port 443: open
% 2,464,802 inferences, 6.891 CPU in 1369.216 seconds (1% CPU, 357704 Lips)
true.

8192 threads checking 65536 ports in ~4.5 minutes.

?- scan_pool('140.211.166.101',1,65536,8192).
Port 80: open
Port 22: open
Port 443: open
% 2,424,842 inferences, 6.469 CPU in 259.009 seconds (2% CPU, 374855 Lips)
true.

@anniepoo At https://www.swi-prolog.org/pldoc/man?section=threadpool you asked for an example. :mage:

1 Like

Or just use concurrent_forall/3 :wink:

[Newly added: see Size limitations on qsave_program?]

1 Like

Thanks. :grinning:

I figured the next and last step was to use Thread communication to pass the status of the port back and remove the format/2 from port_scan/2. Will add concurrent_forall/3 to the todo list and see what happens.

A balance/1 predicate would be more useful here, than concurrent_forall/2.
balance/1 does a distributed generate and test:

?- balance((between(Low,High,Port), port_test(IP_address,Port,Result))), 
    format('Port ~w: ~w~n', [Port, Result]), fail; true.

This would avoid issues with concurrent logging, since the results are
nevertheless serialized. As a bonus you can nevertheless bootstrap
concurrent_forall/2:

concurrent_forall(G, T) :-
    \+ balance((G, \+T)).

But the above bootstrapping would abort all spawned threads as soon as T gets
false, since balance/1 is abortable. I don’t know whether the SWI-Prolog
concurrent_forall/2 has the same semantics.

1 Like

Looks like this is out.

In reading the documentation

The maximum number of threads defined is the amount of cores available.

Since the number of cores is now typically about 8,12 or 16 on laptops and my other example ran cleanly with 50 threads, concurrent_forall/3 will probably be much slower, but then again Jan W. is known to change the code based on forum post. :grinning:


EDIT

There are two versions of concurrent_forall/3. One is in the standard Prolog and one is in Package “xlibrary”

1 Like

Yeah, for a blocking test, a better balance/1 or concurrent_forall/2 is needed.
Something like the fork/join of Java, where there is provision for task that
can block. In the fork/join/blocking of Java, you can bound the active worker

pool by the number of cores, but it might generate extra threads for waiting tasks.
But it requires that the meta predicate balance/1 or concurrent_forall/2 gets
notified that the task is waiting. So you would not only have in your task:

tcp_connect(Socket, IP_address:Port)

But you need something like:

begin_blocking,
tcp_connect(Socket, IP_address:Port),
end_blocking

Mind boggling advanced stuff. I didn’t get my head yet around
how to bring it to Prolog, its something like this in Java ManagedBlocker.
As a work-around you can just allow larger number of threads, and

assume that probability is low that too much contention happens. A
second workaround I have in place is that balance/1 spawns a thread group,
and that blockers can spawn threads into the thread group.

The code seems to say something different – that the # of threads defaults to the # of CPUs but otherwise can be anything. @jan ?

    (   option(threads(Jobs), Options)
    ->  true
    ;   current_prolog_flag(cpu_count, Jobs)
    ),

1 Like

When I went looking for concurrent_forall/3 I found it first in Package “xlibrary” but in searching for your example code found the version Jan W. created. That explains the differences.

Now looking at Jan W. version. :wink:

Unlike Jan W. version, Menendez version uses a result queue. But
still the logic is not like balance/1. You only get the logic of balance/1
if you use the result queue for successful runs of the test goal as well.

With Menendez you can possibly run multiple port scanners at the
same time, like spawning on a higher level port scanners for HostA,
HostB, etc… at the same time. Not sure. But I guess this doesn’t work

currently for Jan W. code, since it uses dynamic database to communicate
between Master and Workers, without a master primary key in it.
So I guess Jan W. code is currently designed for only one Master.

While I am reading your replies in this post, for the particular current need of the port scanner there is no need to balance out the threads. Each thread is trying to open a port and return only one of two results, open or closed and when the port is open the response is instantaneous and the thread is available in the pool, when the port is closed it is due to a timeout which takes ~5 seconds. As only 3 out of 65536 ports are open, most of the threads wait for the timeout then return to the pool. If you start the program with 50 threads in the pool and 1000 ports you will see the results typically get dumped out in a batch of 50 then pause, a batch of 50 then pause, etc. The open ports get dumped out almost immediately. So there is nothing to balance as the threads are typically completing in sync.

I do agree that balancing is a desired trait for threading that should be considered when multithreading.

For some other tests that I have to write, they might benefit from balancing and I will revisit your replies. :grinning:

The name balance/1 suggests that some specially balancing is
involved in it. But you have the same balancing in concurrent_forall/2.
Since the master generator uses a queue to distribute the work

to the test workers. When the test is busy it cannot fetch an item from
the queue, only after it has completed its work it can do so again.
You see the fetch loop here in Jan W. code:

fa_worker(Queue, Main, Templ, Test) :-
    repeat,   /* loop */
    thread_get_message(Queue, Msg), /* fetch */
    (   Msg == done
    ->  !
    ;   Msg = job(Templ),
        debug(concurrent, 'Running test ~p', [Test]),
        (   catch_with_backtrace(Test, E, true) /* Worker */
        ->  (   var(E)
            ->  fail
            ;   fa_stop(Queue, Main, fa_worker_failed(Test, error(E)))
            )
        ;   !,
            fa_stop(Queue, Main, fa_worker_failed(Test, false))
        )
    ).

In fact its even a bounded queue, because you want the master generator
to be put on hold, if there is no worker thread around that can consume
from the queue and continue work. The balancing is a side effect

of the queue communication between master and workers.

1 Like

The next version makes use of message queues to pass the result of the port scan back to the main thread. This also adds debug messages from library(debug).

start_scan_pool_02(Number_of_threads,Options) :-
	thread_pool_create(scan_pool,Number_of_threads,Options).
stop_scan_pool_02 :-
	thread_pool_destroy(scan_pool).

scan_pool_02(IP_address,Low_port,High_port,Number_of_threads) :-
    message_queue_create(Result_queue),
    start_scan_pool_02(Number_of_threads,[]),
    time(
        (
            findall(Id,
                (   between(Low_port,High_port,Port),
                    thread_create_in_pool(scan_pool,worker(Result_queue,IP_address,Port),Id,[])
                ), Ids),
            join_all(Ids),
            gather_results(Result_queue,Port_results),
            format('~w~n',[Port_results])  % Peter once showed me a better predicate to print structures but I can't recall it at the moment.  Found it: print_term/2
        )
    ),
    stop_scan_pool_02.

worker(Result_queue,IP_address,Port) :-
    debug(threads, 'Worker: port ~w', [Port]),
    port_scan_02(IP_address,Port,Port_result),
    thread_send_message(Result_queue, result(Port,Port_result)).

port_scan_02(IP_address,Port,Result) :-
    catch(
        setup_call_cleanup(
            tcp_socket(Socket),
            (
                % Open stream socket based on TCP/IP which uses IP address and port number, i.e. INET socket
                tcp_connect(Socket, IP_address:Port),
                Result = open
            ),
            tcp_close_socket(Socket)
        ),
        error(_,_),
        Result = closed
    ).

gather_results(Result_queue,[result(Port,Port_result)|Port_results]) :-
    thread_get_message(Result_queue,result(Port,Port_result),[timeout(0)]),
    debug(threads, 'Result - Port: ~w - Result: ~w', [Port,Port_result]),
    gather_results(Result_queue,Port_results),
    !.
gather_results(_,[]).

Example run

?- debug(threads).
Warning: threads: no matching debug topic (yet)
true.

?- scan_pool_02('140.211.166.101',75,84,3).
% [Thread 4] Worker: port 75
% [Thread 5] Worker: port 76
% [Thread 6] Worker: port 77
% [Thread 7] Worker: port 78
% [Thread 8] Worker: port 79
% [Thread 9] Worker: port 80
% [Thread 10] Worker: port 81
% [Thread 11] Worker: port 82
% [Thread 12] Worker: port 83
% [Thread 13] Worker: port 84
% Result - Port: 77 - Result: closed
% Result - Port: 75 - Result: closed
% Result - Port: 76 - Result: closed
% Result - Port: 80 - Result: open
% Result - Port: 79 - Result: closed
% Result - Port: 78 - Result: closed
% Result - Port: 81 - Result: closed
% Result - Port: 82 - Result: closed
% Result - Port: 83 - Result: closed
% Result - Port: 84 - Result: closed
[result(77,closed),result(75,closed),result(76,closed),result(80,open),result(79,closed),result(78,closed),result(81,closed),result(82,closed),result(83,closed),result(84,closed)]
% 41,514 inferences, 0.016 CPU in 63.136 seconds (0% CPU, 2656896 Lips)
true.

@jan

For the debug messages % [Thread 4] Worker: port 75 I was not expecting the thread numbers to be all differenent as this is using a pool of threads. Is the code correctly reusing the threads from a pool, am I reading the debug messages incorrectly or something else?

I don’t really like the name balance much as its relation to concurrency is a bit far away to me and you can balance practically everything, but I have little clue what a balanced conjuction is.

I do think the behavior is useful and intuitive. I assume this also puts the generator in a separate thread and gives the main thread the task of collecting results?

Although you can bootstrap concurrent_forall/2 on balance/1 it might not be ideal to do so. To me, it seems balance is considerably more complicated and slower due to the fact that we must return and collect the results.

You can also call it concurrent_generate_and_test/2. I wouldn’t mind.
balance/1 is shorter and highlights the side effect that the tests get
trivially balanced over a couple of threads. Its not more and its not less.

Everything else is in your imagination, and contradicts occams razor.
Which is not the worst outcome of the naming, since it shows that
balancing is often associated with more complex solutions.

There is an optimization for balance/1 available, which would make it
more suitable for a bootstrapping of concurrent_forall/2. But I didn’t
implement it yet, think I had a prototype, and I don’t know whether

it really works. My more general form is:

balance(V1^..Vn^(G, T))

The existential variables are there to regulate what goes into the result queue.
If you want to blend out some extra variables in the test. It can happen that the
result queue degenerates to returning only a true token and errors/completion.

To have something more lightweight than a queue in this situation would speed
up balance/1 and make it more suitable to bootstrap concurrent_forall/2, since
concurrent_forall/2 also only needs this minimal information.

1 Like