前回に引き続き、boofuzzのコード読みをします。

前回までは静的にソースコードを読んでいましたが、あまり理解が進まなかったので、今回からはvscodeのデバッグ機能を用いて順に処理を追っていきます。

簡易httpサーバの起動

boofuzzを実際に動かす都合上、ターゲットのhttpサーバをあらかじめ起動しておきます。

適当なディレクトリを用意し、以下の構成にします。

py_httpserver
├ server.py
└ index.html

server.pyのコードは以下の通りです。

from http.server import SimpleHTTPRequestHandler, HTTPServer

server = HTTPServer(('', 80), SimpleHTTPRequestHandler)
server.serve_forever()

server.pyを実行すれば、一応のhttpサーバが起動します。（実行にはsudo権限が必要です。）

fuzz()のデバッグ

fuzz()の処理をデバッグしていきます。

_main_fuzz_loop()（１）

self._start_target(self.targets[0])

_start_target()を改めて見ます。

_start_target()

def _start_target(self, target):
    started = False
    for monitor in target.monitors:
        if monitor.start_target():
            started = True
            break
    if started:
        for monitor in target.monitors:
            monitor.post_start_target(target=target, fuzz_data_logger=self._fuzz_data_logger, session=self)

前々回の記事では target.monitorsには空のリストであると誤った説明をしましたが、実際には、リストにCallbackMonitor型のインスタンスが1つ格納されていました。

従って、for文は1回実行されます。

monitor._start_target()は、CallbackMonitor型のスーパークラスであるBaseMonitorのメソッドであり、Falseを返すのみの関数です。そのため、monitor._start_target()では特に処理は行われませんでした。

_main_fuzz_loop()（２）

for mutation_context in fuzz_case_iterator:

この行では、ジェネレータ関数_generate_mutations_indefinitely()がイテレータとして呼ばれます。

_generate_mutations_indefinitely()

def _generate_mutations_indefinitely(self, max_depth=None, path=None):
    """Yield MutationContext with n mutations per message over all messages, with n increasing indefinitely."""
    depth = 1
    while max_depth is None or depth <= max_depth:
        valid_case_found_at_this_depth = False
        for m in self._generate_n_mutations(depth=depth, path=path):
            valid_case_found_at_this_depth = True
            yield m
        if not valid_case_found_at_this_depth:
            break
        depth += 1

max_depthがNoneなので、while文が実行されます。

depth=1, path=Noneの引数を取って、ジェネレータ関数_generate_n_mutations()が実行されます。

_generate_n_mutations()

def _generate_n_mutations(self, depth, path):
    """Yield MutationContext with n mutations per message over all messages."""
    for path in self._iterate_protocol_message_paths(path=path):
        for m in self._generate_n_mutations_for_path(path, depth=depth):
            yield m

path=Noneの引数を取って、_iterate_protocol_message_paths()が実行されます。

_iterate_protocol_message_paths()

def _iterate_protocol_message_paths(self, path=None):
    """
    Iterates over protocol and yields a path (list of Connection) leading to a given message).
     Args:
        path (list of Connection): Provide a specific path to yield only that specific path.
     Yields:
        list of Connection: List of edges along the path to the current one being fuzzed.
     Raises:
        exception.SulleyRuntimeError: If no requests defined or no targets specified
    """
    # we can't fuzz if we don't have at least one target and one request.
    if not self.targets:
        raise exception.SullyRuntimeError("No targets specified in session")
     if not self.edges_from(self.root.id):
        raise exception.SullyRuntimeError("No requests specified in session")
     if path is not None:
        yield path
    else:
        for x in self._iterate_protocol_message_paths_recursive(this_node=self.root, path=[]):
            yield x

path=Noneなので、_iterate_protocol_message_paths_recursive(this_node=self.root, path=[])がイテレータとして実行されます。なお、self.rootはクラスpgraph.Node()のインスタンスです。

_iterate_protocol_message_paths_recursive()

def _iterate_protocol_message_paths_recursive(self, this_node, path):
    """Recursive helper for _iterate_protocol.
     Args:
        this_node (node.Node): Current node that is being fuzzed.
        path (list of Connection): List of edges along the path to the current one being fuzzed.
     Yields:
        list of Connection: List of edges along the path to the current one being fuzzed.
    """
    # step through every edge from the current node.
    for edge in self.edges_from(this_node.id):
        # keep track of the path as we fuzz through it, don't count the root node.
        # we keep track of edges as opposed to nodes because if there is more then one path through a set of
        # given nodes we don't want any ambiguity.
        path.append(edge)
        message_path = self._message_path_to_str(path)
        logging.debug("fuzzing: {0}".format(message_path))
        self.fuzz_node = self.nodes[path[-1].dst]
         yield path
        # recursively fuzz the remainder of the nodes in the session graph.
        for x in self._iterate_protocol_message_paths_recursive(self.fuzz_node, path):
            yield x
    # finished with the last node on the path, pop it off the path stack.
    if path:
        path.pop()

後述しますが、edges_from()関数はConnectionクラスのインスタンスを含むリストを返す関数です。

edges_from()

def edges_from(self, edge_id):
    """
    Enumerate the edges from the specified node.

    @type  edge_id: Mixed
    @param edge_id: Identifier of node to enumerate edges from

    @rtype:  list
    @return: List of edges from the specified node
    """

    return [edge_value for edge_value in list(self.edges.values()) if edge_value.src == edge_id]

特定のノードからエッジを返す関数です。（直訳）

self.edgesはkeyが数値、valueがConnection型のインスタンスである辞書型のオブジェクトです。

Connectionはpgraph.Edgeのサブクラスです。

このreturn文のワンライナーは、Sessionクラスのインスタンスが持つ辞書型のアトリビュートedgesからvalueであるConnectionクラスのインスタンスを取り出し、そのインスタンスのうちアトリビュートsrcがedge_idと同一であるものをリスト化したものを返す文であるといえます。

この時点では、Sessionインスタンスが持つアトリビュートedgesは1つのみのようなので、返り値も1つのConnectionインスタンスのみが格納されたリストになります。

_iterate_protocol_message_paths_recursive()（２）

path.append(edge)

message_path = self._message_path_to_str(path)

Connectionクラスedgeが空のリストpathにappendされます。

_message_path_to_str()の内容は以下の通りです。

def _message_path_to_str(self, message_path):
    return "->".join([self.nodes[e.dst].name for e in message_path])

返り値として'HTTP-Request'が返ってきます。

logging.debug("fuzzing: {0}".format(message_path))
self.fuzz_node = self.nodes[path[-1].dst]

yield path

現状のコードを読み続ける方針でboofuzzを理解するのに困難を感じてきたため、一旦ここまでにします。

おまけ

雑にデバッグを続けていたところ、boofuzz/primitives/string.pyにfuzzingのシード値を見つけました。

_fuzz_library = [
    "!@#$%%^#$%#$@#$%$$@#$%^^**(()",
    "",  # strings ripped from spike (and some others I added)
    "$(reboot)",
    "$;reboot",
    "%00",
（中略）

boofuzzは脆弱性を突けそうな文字列を事前に用意し、片っ端から送り込む方法でファジングを行っていると推測できました。（boofuzzに限った話では無いかもしれませんが）

感想と今後の方針

現状、個々の関数が何をやっているのかすら満足に理解できていない状態なので、漠然とコードを読み進めるのは方針として良くないように思えました。

ドキュメントを参照したい所ですが、boofuzzは開発者向けのドキュメントがあまり充実していないので、今後どうするべきかは悩ましい所です。

個人的には、

ファザーはどのようにしてプロトコルを認識しているか
ファザーはデータをどのように変異させているか
ファザーはファジングの有効性をどのように判断しているか

あたりを知りたいので、まずは「ファザーはどのようにしてプロトコルを認識しているか」を理解するのを目的として、 http_simple.pyのsession.connect(req)辺りからコードを読み直したいと思います。

また、boofuzzの主要なクラスはpgraphを継承しているので、グラフを意識してコードを読むともう少し理解できそうな気がします。（下図参照）

参考サイト

【Python】VSCode で外部モジュール・外部ライブラリも含めてデバッグする方法

ソースコードを読むための技術(チートシート)

satumaimoの備忘録

個人的なメモ中心

boofuzzコード読み（３）