Building a port scanner

SWI-Prolog (threaded, 64 bits, version 8.3.2) on Windows 10

In creating a port scanner for a web site tcp_connect/2 is wrapped with setup_call_catcher_cleanup/4. The catcher is catching the error and the clause to the handle the error is getting called but the code is not completing successfully and the error message is being printed.

The code uses :- set_prolog_flag(report_error,false). to suppress the printing of the error message so I don’t know why the error message is being printed.

The code runs correctly for an open port, e.g. 80, and causes a timeout exception for a port that is not handled, e.g. 79, but is not completing successfully to display the desired result: Port 79: closed

Example run

?- port_scan_range(80,80).
Port 80: open
true.

?- port_scan_range(79,79).
ERROR: Unhandled exception: Socket error: Connection timed out
ERROR: In:
ERROR:   [13] socket:tcp_connect(<socket>(0000000006D179B0),'140.211.166.101':79)
ERROR:   [12] setup_call_catcher_cleanup(user:tcp_socket(<socket>(0000000006D179B0)),user:tcp_connect(<socket>(0000000006D179B0),...),_22258,user:(...,...)) at c:/program files/swipl/boot/init.pl:562
ERROR:   [10] port_scan_range(79,79) at c:/users/eric/documents/notes/discourse swi-prolog osu osl/osu osl prolog/tcp_test.pl:33
ERROR:    [9] <user>
ERROR: 
ERROR: Note: some frames are missing due to last-call optimization.
ERROR: Re-run your program in debug mode (:- debug.) to get more detail.
Timeout error caught
?- current_prolog_flag(report_error,Flag).
Flag = false.

Code

:- set_prolog_flag(debug_on_error,false).   % Do not drop into debugger on error. Errors and exceptions are different with errors being more severe.
:- set_prolog_flag(report_error,false).     % Do not print error messages to screen.

% -----------------------------------------------------------------------------

port_scan_range(Start,End) :-
    must_be(positive_integer,Start),
    must_be(positive_integer,End),
    Start =< End,
    IP_address = '140.211.166.101',
    between(Start,End,Port),
    port_scan(IP_address,Port,Result),
    format('Port ~w: ~w~n',[Port,Result]).

port_scan(IP_address,Port,Result) :-
    setup_call_catcher_cleanup(
        tcp_socket(Socket),
        % Open stream socket based on TCP/IP which uses IP address and port number, i.e. INET socket
        tcp_connect(Socket, IP_address:Port),
        Catcher,
        (
            catcher(Catcher,Result),
            tcp_close_socket(Socket)
        )
    ).

catcher(exit,open).
catcher(
    exception(
        error(socket_error(wsaetimedout, 'Connection timed out'),
        _Context)
    ),
    closed
) :-
    format('Timeout error caught~n',[]).
catcher(Catcher,unknown) :-
    format('Catcher default case~n',[]),
    format('~w~n',[Catcher]).

By changing setup_call_catcher_cleanup/4 to setup_call_cleanup/3 and using catch/3 for the goal it works as desired. Also changed the predicate catcher/2.

port_scan(IP_address,Port,Result) :-
    setup_call_cleanup(
        tcp_socket(Socket),
        catch(
            (
                % Open stream socket based on TCP/IP which uses IP address and port number, i.e. INET socket
                tcp_connect(Socket, IP_address:Port),
                Result = open
            ),
            Catcher,
            catcher(Catcher,Result)
        ),
        tcp_close_socket(Socket)
    ).

catcher(
    error(socket_error(wsaetimedout, 'Connection timed out'),_Context),
    closed
) :- !.
catcher(Catcher,unknown) :-
    format('Catcher default case~n',[]),
    format('~w~n',[Catcher]).

Example run

?- port_scan_range(79,79).
Port 79: closed
true.

?- port_scan_range(80,80).
Port 80: open
true.

Jan W., when you get a chance,

is tcp_close_socket/1 needed? The documentation notes two cases where is it is needed but since this is port scanning and not using tcp_open_socket/3 I am not sure.

Yes, you do not create streams for the socket, so you must close the socket yourself.

The simplest way is probably this:

    catch(setup_call_cleanup(
             tcp_socket(S),
             tcp_connect(S, IP_address:Port),
             tcp_close_socket(S)),
          error(_,_),
          fail).

If this succeeds, the port is open, else it is closed. You may be more restrictive on the errors under which you consider the port closed, but I’m not sure that is needed.

Note that the report_error flag is probably not even used anymore (should check) and the debug_on_error flag just deals with whether or not the debugger is started when an error is uncaught. In this case you need catch/3 (not that the various *_cleanup predicates do not catch the error, they just guarantee that resource cleanup is called regardless of success/error/fail/cut/… that terminates the goal that needs the resource.

2 Likes

Complete version after suggestions by Jan W. and making port_scan_range/3 a proper failure driven loop.

port_scan_range(Start,End) :-
    must_be(positive_integer,Start),
    must_be(positive_integer,End),
    Start =< End,
    IP_address = '140.211.166.101',
    between(Start,End,Port),
    port_scan(IP_address,Port,Result),
    format('Port ~w: ~w~n',[Port,Result]),
    fail.
port_scan_range(_,_).

port_scan(IP_address,Port,Result) :-
    catch(
        setup_call_cleanup(
            tcp_socket(Socket),
            (
                % Open stream socket based on TCP/IP which uses IP address and port number, i.e. INET socket
                tcp_connect(Socket, IP_address:Port),
                Result = open
            ),
            tcp_close_socket(Socket)
        ),
        error(_,_),
        Result = closed
    ).

Example run

?- port_scan_range(79,80).
Port 79: closed
Port 80: open
true.
1 Like

I would suggest using forall/2 rather than a failure-driven loop – forall/2 is easier to debug. With a failure-driven loop, you can’t tell if the range part fails (which is fine) or the port_scan part fails (which is a bug).
[BTW, there’s no need for the Start=<End test – between/3 subsumes that.]

port_scan_range(IP_address, Start, End) :-
    forall(port_range(Start, End, Port),
           (   port_scan(IP_address, Port, Result),
               format('Port ~w: ~w~n', [Port, Result])
           )).

port_range(Start, End, Port) :-
    must_be(positive_integer, Start),
    must_be(positive_integer, End),
    between(Start, End, Port).

port_scan(IP_address,Port,Result) :-
    catch(
        setup_call_cleanup(
            tcp_socket(Socket),
            (
                % Open stream socket based on TCP/IP which uses IP address and port number, i.e. INET socket
                tcp_connect(Socket, IP_address:Port),
                Result = open
            ),
            tcp_close_socket(Socket)
        ),
        error(_,_),
        Result = closed
    ).
2 Likes

For those that know about port scanning, a typical scan will scan a thousand or more ports. Also it is better to return no reply if the port is not available which on the client side typically results in a timeout when trying to open the port.

Since the timeouts can take a few seconds and multiply that by a thousand, it can take some time.

5 seconds * 1000 = 5,000 seconds
5,000 seconds * (1 minute / 60 seconds) = 83.3 minutes
83.3 minutes - 60 minutes/hour = 1 hour 23 minutes

Since port scanning can be done in parallel it is worth the effort to use multithreading.

Update of code using a pool thread.

Based on test case.

start_scan_pool(Number_of_threads,Options) :-
	thread_pool_create(scan_pool,Number_of_threads,Options).
stop_scan_pool :-
	thread_pool_destroy(scan_pool).

scan_pool(IP_address,Low_port,High_port,Number_of_threads) :-
    start_scan_pool(Number_of_threads,[]),
    time(
        (
            findall(Id,
                (   between(Low_port,High_port,Port),
                    thread_create_in_pool(scan_pool,port_scan(IP_address,Port),Id,[])
                ), Ids),
            join_all(Ids)
        )
    ),
    stop_scan_pool.

% -----------------------------------------------------------------------------

join_all([]).
join_all([H|T]) :-
	thread_join(H, Status),
	assertion(Status == true),
	join_all(T).

% -----------------------------------------------------------------------------

port_scan(IP_address,Port) :-
    catch(
        setup_call_cleanup(
            tcp_socket(Socket),
            (
                % Open stream socket based on TCP/IP which uses IP address and port number, i.e. INET socket
                tcp_connect(Socket, IP_address:Port),
                Result = open
            ),
            tcp_close_socket(Socket)
        ),
        error(_,_),
        Result = closed
    ),
    format('Port ~w: ~w~n', [Port, Result]).

Example run.

?- scan_pool('140.211.166.101',1,9,3).
Port 2: closed
Port 3: closed
Port 1: closed
Port 5: closed
Port 4: closed
Port 6: closed
Port 7: closed
Port 8: closed
Port 9: closed
% 40,303 inferences, 0.031 CPU in 63.044 seconds (0% CPU, 1289696 Lips)
true.

A more comprehensive example run.

?- scan_pool('140.211.166.101',1,1023,50).
...
% 37,861 inferences, 0.047 CPU in 441.204 seconds (0% CPU, 807701 Lips)
true.

EDIT

More threads means less time.

1024 threads checking 65536 ports in ~23 minutes.

?- scan_pool('140.211.166.101',1,65536,1024).
Port 22: open
Port 80: open
Port 443: open
% 2,464,802 inferences, 6.891 CPU in 1369.216 seconds (1% CPU, 357704 Lips)
true.

8192 threads checking 65536 ports in ~4.5 minutes.

?- scan_pool('140.211.166.101',1,65536,8192).
Port 80: open
Port 22: open
Port 443: open
% 2,424,842 inferences, 6.469 CPU in 259.009 seconds (2% CPU, 374855 Lips)
true.

@anniepoo At https://www.swi-prolog.org/pldoc/man?section=threadpool you asked for an example. :mage:

1 Like

Or just use concurrent_forall/3 :wink:

[Newly added: see Size limitations on qsave_program?]

1 Like

Thanks. :grinning:

I figured the next and last step was to use Thread communication to pass the status of the port back and remove the format/2 from port_scan/2. Will add concurrent_forall/3 to the todo list and see what happens.

Looks like this is out.

In reading the documentation

The maximum number of threads defined is the amount of cores available.

Since the number of cores is now typically about 8,12 or 16 on laptops and my other example ran cleanly with 50 threads, concurrent_forall/3 will probably be much slower, but then again Jan W. is known to change the code based on forum post. :grinning:


EDIT

There are two versions of concurrent_forall/3. One is in the standard Prolog and one is in Package “xlibrary”

1 Like

The code seems to say something different – that the # of threads defaults to the # of CPUs but otherwise can be anything. @jan ?

    (   option(threads(Jobs), Options)
    ->  true
    ;   current_prolog_flag(cpu_count, Jobs)
    ),

1 Like

When I went looking for concurrent_forall/3 I found it first in Package “xlibrary” but in searching for your example code found the version Jan W. created. That explains the differences.

Now looking at Jan W. version. :wink:

While I am reading your replies in this post, for the particular current need of the port scanner there is no need to balance out the threads. Each thread is trying to open a port and return only one of two results, open or closed and when the port is open the response is instantaneous and the thread is available in the pool, when the port is closed it is due to a timeout which takes ~5 seconds. As only 3 out of 65536 ports are open, most of the threads wait for the timeout then return to the pool. If you start the program with 50 threads in the pool and 1000 ports you will see the results typically get dumped out in a batch of 50 then pause, a batch of 50 then pause, etc. The open ports get dumped out almost immediately. So there is nothing to balance as the threads are typically completing in sync.

I do agree that balancing is a desired trait for threading that should be considered when multithreading.

For some other tests that I have to write, they might benefit from balancing and I will revisit your replies. :grinning:

The next version makes use of message queues to pass the result of the port scan back to the main thread. This also adds debug messages from library(debug).

start_scan_pool_02(Number_of_threads,Options) :-
	thread_pool_create(scan_pool,Number_of_threads,Options).
stop_scan_pool_02 :-
	thread_pool_destroy(scan_pool).

scan_pool_02(IP_address,Low_port,High_port,Number_of_threads) :-
    message_queue_create(Result_queue),
    start_scan_pool_02(Number_of_threads,[]),
    time(
        (
            findall(Id,
                (   between(Low_port,High_port,Port),
                    thread_create_in_pool(scan_pool,worker(Result_queue,IP_address,Port),Id,[])
                ), Ids),
            join_all(Ids),
            gather_results(Result_queue,Port_results),
            format('~w~n',[Port_results])  % Peter once showed me a better predicate to print structures but I can't recall it at the moment.  Found it: print_term/2
        )
    ),
    stop_scan_pool_02.

worker(Result_queue,IP_address,Port) :-
    debug(threads, 'Worker: port ~w', [Port]),
    port_scan_02(IP_address,Port,Port_result),
    thread_send_message(Result_queue, result(Port,Port_result)).

port_scan_02(IP_address,Port,Result) :-
    catch(
        setup_call_cleanup(
            tcp_socket(Socket),
            (
                % Open stream socket based on TCP/IP which uses IP address and port number, i.e. INET socket
                tcp_connect(Socket, IP_address:Port),
                Result = open
            ),
            tcp_close_socket(Socket)
        ),
        error(_,_),
        Result = closed
    ).

gather_results(Result_queue,[result(Port,Port_result)|Port_results]) :-
    thread_get_message(Result_queue,result(Port,Port_result),[timeout(0)]),
    debug(threads, 'Result - Port: ~w - Result: ~w', [Port,Port_result]),
    gather_results(Result_queue,Port_results),
    !.
gather_results(_,[]).

Example run

?- debug(threads).
Warning: threads: no matching debug topic (yet)
true.

?- scan_pool_02('140.211.166.101',75,84,3).
% [Thread 4] Worker: port 75
% [Thread 5] Worker: port 76
% [Thread 6] Worker: port 77
% [Thread 7] Worker: port 78
% [Thread 8] Worker: port 79
% [Thread 9] Worker: port 80
% [Thread 10] Worker: port 81
% [Thread 11] Worker: port 82
% [Thread 12] Worker: port 83
% [Thread 13] Worker: port 84
% Result - Port: 77 - Result: closed
% Result - Port: 75 - Result: closed
% Result - Port: 76 - Result: closed
% Result - Port: 80 - Result: open
% Result - Port: 79 - Result: closed
% Result - Port: 78 - Result: closed
% Result - Port: 81 - Result: closed
% Result - Port: 82 - Result: closed
% Result - Port: 83 - Result: closed
% Result - Port: 84 - Result: closed
[result(77,closed),result(75,closed),result(76,closed),result(80,open),result(79,closed),result(78,closed),result(81,closed),result(82,closed),result(83,closed),result(84,closed)]
% 41,514 inferences, 0.016 CPU in 63.136 seconds (0% CPU, 2656896 Lips)
true.

@jan

For the debug messages % [Thread 4] Worker: port 75 I was not expecting the thread numbers to be all differenent as this is using a pool of threads. Is the code correctly reusing the threads from a pool, am I reading the debug messages incorrectly or something else?

I don’t really like the name balance much as its relation to concurrency is a bit far away to me and you can balance practically everything, but I have little clue what a balanced conjuction is.

I do think the behavior is useful and intuitive. I assume this also puts the generator in a separate thread and gives the main thread the task of collecting results?

Although you can bootstrap concurrent_forall/2 on balance/1 it might not be ideal to do so. To me, it seems balance is considerably more complicated and slower due to the fact that we must return and collect the results.

With multithreaded code I lean toward having the worker threads knowing next to nothing of the outside world. In my earlier versions the worker threads would use format/2 which in my mind says the workers know about the display device. By passing the results back in a message queue, this removes that dependency/knowledge. In reading your reply it seems that there is another view to be learned.

The final goal of this specific port scanner is to collect the results and persistent them to a file using library(persistency). My plan was to collect the results from the message queue then pass and persist them to a file in the main thread but in reading your reply it seems it should be done in the worker thread. Feedback desired?

A thread_pool could be a misleading name. It is a wrapper around thread_create/3 that limits the number of threads you can create and make subsequent calls to create a thread either fail with an error or wait until some thread in the pool stopped. Thread IDs can only be reused after you join the ended thread.

I’d use concurrent_forall/3 using the threads(Count) option and assert the port status in the database to be collected later. Properly managing a set of threads and ensuring they are properly reclaimed is quite complex. Since some time SWI-Prolog will GC forgotten threads, but it can take a while before it realises they are forgotten.

1 Like

The next version switches to using concurrent_forall/3. Since this predicate is so new (3 days old) and I run on Windows, I installed the Windows 64-bit version of the daily build.

SWI-Prolog (threaded, 64 bits, version 8.3.2-198-gd839164c7)

Note: This code also moved executing debug/1 on the command line into a Prolog directive
:- debug(concurrent). which can easily be commented out. If you peruse the SWI-Prolog source code on GitHub you will often find these lines commented out.

Also notice how much simpler the code becomes when using concurrent_forall/3.

The hardest part about writing this was trying to understand concurrent_forall(:Generate, :Test), how did Generate and Test align with my existing code. So instead of trying to understand the code from the top down I looked at the critical predicate common to all of this which is thread_create/2 and in concurrent_forall/2 is in the line maplist(thread_create(fa_worker(Q, Me, Templ, Test)), Workers) and just figured out which variables I could set and what they needed. The only one that can be set by calling concurrent_forall/3 is Test which needs to be port_scan(IP_address,Port), after that identifying what the rest of concurrent_forall needed was easy.

For this use of concurrent_forall instead of concurrent_forall(:Generate, :Test) I think of it as concurrent_forall(:Generate unique threads values, :Call thread with unique values).

:- debug(concurrent).

concurrent_scan(IP_address,Low_port,High_port,Number_of_threads) :-
    concurrent_forall(
        between(Low_port,High_port,Port),
        port_scan(IP_address,Port),
        [threads(Number_of_threads)]
    ).

port_scan(IP_address,Port) :-
    catch(
        setup_call_cleanup(
            tcp_socket(Socket),
            (
                % Open stream socket based on TCP/IP which uses IP address and port number, i.e. INET socket
                tcp_connect(Socket, IP_address:Port),
                format('Port ~w: open~n', [Port])
            ),
            tcp_close_socket(Socket)
        ),
        error(_,_),
        true
    ).

Example run

?- concurrent_scan('140.211.166.101',75,84,3).
% [Thread 5] Running test user:port_scan('140.211.166.101',77)
% [Thread 3] Running test user:port_scan('140.211.166.101',76)
% [Thread 4] Running test user:port_scan('140.211.166.101',75)
% [Thread 5] Running test user:port_scan('140.211.166.101',78)
% [Thread 4] Running test user:port_scan('140.211.166.101',79)
% [Thread 3] Running test user:port_scan('140.211.166.101',80)
Port 80: open
% [Thread 3] Running test user:port_scan('140.211.166.101',81)
% [Thread 5] Running test user:port_scan('140.211.166.101',82)
% [Thread 4] Running test user:port_scan('140.211.166.101',83)
% [Thread 3] Running test user:port_scan('140.211.166.101',84)
true.

NB The threads are being reused this time.

A more comprehensive example. (debug/1 was commented out.)

?- time(concurrent_scan('140.211.166.101',1,65536,8192)).
Port 80: open
Port 22: open
Port 443: open
% 196,643 inferences, 0.500 CPU in 174.439 seconds (0% CPU, 393286 Lips)
true.

8192 threads checking 65536 ports in ~3 minutes.


The comment Jan W. made about

and that I asked about now makes more sense. If you read the code for concurrent_forall you will notice that to pass messages back would require adding more complexity to something that is already very complex. So I take it to mean that if you want to use concurrent_forall then use it as designed, even it is breaking some rules of thumb such as have the threads know as little as possible about the outside world.

Just wondering what it would mean to write this in SWI-Prolog, I got to these two files.

test_balance.pl (1.1 KB) balance.pl (4.2 KB)

The implementation is rather tricky. I did decide for an arity 2 version for now to make clean what the generator and tester are. Not yet sure what to do with it. When matured and with a proper name, add it to the library(thread), I guess.

1 Like

The next version adds library(persistency).

:- use_module(library(persistency)).

:- working_directory(_,'C:\\Users\\Eric\\Documents\\Port Scans').

:- persistent
    port_scan_result(port:integer,result:atom).

:- initialization(db_attach('port_scan_result.journal', [])).

exists_port_scan_result(Request,Response) :-
    port_scan_result(Request,Response).

add_port_scan_result(Request,Response) :-
    with_mutex(port_scan_result_journal, assert_port_scan_result(Request,Response)).

concurrent_scan_02(IP_address,Low_port,High_port,Number_of_threads) :-
    concurrent_forall(
        between(Low_port,High_port,Port),
        port_scan_02(IP_address,Port),
        [threads(Number_of_threads)]
    ).

port_scan_02(IP_address,Port) :-
    catch(
        setup_call_cleanup(
            tcp_socket(Socket),
            (
                % Open stream socket based on TCP/IP which uses IP address and port number, i.e. INET socket
                tcp_connect(Socket, IP_address:Port),
                (
                    exists_port_scan_result(Port,open), !
                ;
                    add_port_scan_result(Port,open)
                )
            ),
            tcp_close_socket(Socket)
        ),
        error(_,_),
        (
            exists_port_scan_result(Port,closed), !
        ;
            add_port_scan_result(Port,closed)
        )
    ).

Example run.

NB halt. is needed so that the data is written to the file. Until halt all of the data resides as facts in the Prolog database.

?- concurrent_scan_02('140.211.166.101',75,84,3).
true.

?- halt.

File: port_scan_result.journal

created(1593621240.60769).
assert(port_scan_result(77,closed)).
assert(port_scan_result(76,closed)).
assert(port_scan_result(75,closed)).
assert(port_scan_result(80,open)).
assert(port_scan_result(78,closed)).
assert(port_scan_result(79,closed)).
assert(port_scan_result(81,closed)).
assert(port_scan_result(83,closed)).
assert(port_scan_result(82,closed)).
assert(port_scan_result(84,closed)).

A more comprehensive example.

?- time(concurrent_scan_02('140.211.166.101',1,65536,8192)).
% 196,642 inferences, 0.547 CPU in 175.149 seconds (0% CPU, 359574 Lips)
true.

8192 threads checking 65536 ports in ~3 minutes.