REAPOFF design documentation

scudette (scudette@users.sourceforge.net)

Introduction

REAPOFF is a proxying - content filtering firewall. Proxying firewalls are sometimes called application level gateways (ALGs). ALGs are different from packet filtering firewalls because they allow the enforcement of a detailed security policy at a high level.

ALGs help in this case because they ensure that communications occur in accordance with a specified protocol. ALGs are able to understand the protocols and deny connections which are not using the connect protocols. In addition, ALGs can enforce very fine grain control over transactions. This control is possible because the ALG is able to parse the protocol.

Example

Suppose the organizational security policy allowed outgoing web access, but not out going SMTP access. The packet filtering firewall will allow outbound connections on port 80 (HTTP) and forbid outbound connections on port 25 (SMTP). However the port allocation is only recommended. In practice there is nothing stopping a malicious user from setting up a mail server on port 80 and sending outgoing mail to this port, thus bypassing the security policy.

If an ALG is used in this case, port 80 connections must comply with the HTTP protocol, while port 25 must comply with the SMTP protocol. If the malicious user attempts to talk to their mail server listening on port 80 using SMTP, the ALG will terminate the connection due to a protocol violation.

In addition, the ALG can enforce fine grain controls, over the use of the HTTP protocol. For example, forbid POST operations or restrict access to certain sites. REAPOFF is also capable of filtering the HTML pages to remove javascript or Active X. REAPOFF can look into the data stream and stop transfer of forbidden content, such as executables for example.

Project goals

REAPOFF was designed to achieve the following technical goals:

Programmable - REAPOFF is built around a programmable core module and can be made to support arbitrary protocols. This is important because new functionality can be added at any time by adding new rules, for example in order to defend against a new vulnerability.
Regular Expression Support - RE support is really the heart behind the power of REAPOFF. The RE engine is used for making decisions about network traffic, by using conditions. In addition REs are used to substitute data within the data stream, and hence provide filtering of dangerous constructs, rather than merely denying those requests.
Small footprint - REAPOFF consists of a small number of very generic proxies. By writing these proxies in C, and optimizing their memory footprint it is possible to run REAPOFF on almost any machine. The resources required are very limited. In fact the ultimate aim is to run REAPOFF on a mini-distribution such as Trinux, from a ramdisk or a floppy disk¹.
Fast - Many other proxy products have large CPU overheads in processing the request. Because REAPOFF is written in C and has compiled internal list structures for storing configuration data, its performance is very high indeed. However, since REAPOFF does use regular expressions it is possible to write rules which severely slow down its performance. Hopefully by offering a flexible configuration language it should be possible to optimize the performance of the rules. (See section 7).
Generic and Flexible - REAPOFF relies heavily on code reuse. The idea is to write the actual proxy executable as a programmable proxy, controlled entirely from the configuration file. This way it is possible to easily tailor individual functionality to the required set by editing the configuration file. It is also possible to extend the functionality of each proxy by adding more rules. The aim is to support a full featured configuration language, whilst still maintaining an efficient run time response and a small memory footprint.
Ease of use - REAPOFF aims to be easy to use and intuitive. Although the configuration language is fully capable, there is an easier way to configure REAPOFF. The GUI provides a set of pre-defined configuration rules which administrators can use to quickly and easily construct new firewall configurations. REAPOFF's GUI presents configurations in an intuitive and clear policy based fashion, making auditing the firewall simple.

Installation

After obtaining the source distribution, unpack into a directory and type make to compile the source.

If you also want to use the GUI (recommended), change directory into the gui directory and type ./configure then make.

The deploy directory will contain all the executables and configuration files you will need. You can copy this directory to the target machine if you intend running the firewall on a different machine than the one used to compile on.

Graphical User Interface

Despite the fact that REAPOFF is very efficient and powerful, most users prefer to configure their firewalls from some sort of graphical user interface. The GUI is there simply to help the user in configuring the firewall and is generally not a required feature. The following sections will examine how to edit configuration files by hand, but in the present section we will explain the use of the GUI as the most convenient way of generating REAPOFF configuration files.

The basic idea is that the GUI is completely separate from the configuration of the actual firewall. Thus it is possible to use a GUI on a workstation and generate a REAPOFF installation for a completely different system. By default the GUI will create a complete, pre-configured installation of REAPOFF in the ./deploy directory.

The GUI uses a template file called ``template.xml''. This file contains all the rules currently written in XML format. The rules are classified according to a family. Thus for example all rules pertaining to HTTP belong to a family called HTTP. This classification is a guide only, although its probably not a good idea to mix and match rules between families. The family ``general'' contains rules which can generally be used on any proxy.

**Figure 1:** Screen shot of REAPOFF GUI
$\begin{figure}\centering \mbox{\epsfxsize =0.9\columnwidth \epsffile{gui.eps.gz}} \end{figure}$

Figure 1 shows the main GUI screen. There are a number of parts in this screen. The left most pane shows the current proxies which are configured in a tree view. The rules defined for each proxy are also shown. The top right pane shows a description of the currently selected rule or proxy. This description helps the user decide if they want this rule present. A better explanation is given of the GUI structure in the following sections.

Rules can take on variables sometimes. This allows the user to configure the rule specifically for their situation. For example the above screen shot shows a rule which accepts three variables. A variable may contain more than one instance, by having multiple lines of text. For example in the above screen shot the variable string matches accepts multiple arguments each match is present on its own line.

**Figure 2:** Template dialog box
$\begin{figure}\centering \mbox{\epsfxsize =0.9\columnwidth \epsffile{template.eps.gz}} \end{figure}$

Figure 2 shows the template dialog box which is presented whenever a new rule is added. The user can then select which rule they wish to add to the current proxy. Multiple selections are allowed in this dialog box.

Policies

Many firewalls on the market to date consist of a single table of rules. Each packet is compared against each rule in the table until a match is made. When a rule is matched, rule evaluation stops and the packet is either allowed or denied. This structure leads to a configuration which has all the commonly matched rules at the top of the table, while less often matched rules are at the bottom, in order to increase the efficiency and speed of the firewall.

Usually rule dumps from such firewalls are difficult to read, since the rules don't necessarily correspond to a particular policy and its difficult to work out what each rule is supposed to achieve. On the other hand, a firewall's configuration is supposed to reflect a security policy which is a higher level, clearly defined document stating what restrictions and privileges are applied to different groups of users,computers and times. In a sense there is a separation between the actual policy and the conditions under which these policies are applied.

This type of separation makes auditing the configuration of the firewall an easy task, one simply needs to establish which policies apply to whom and then look at how these policies are implemented. An easier to understand firewall configuration process leads to a more secure installation and more likely to be configured as per the policy.

Modern firewall installation are leaning toward this type of configuration, for example IPTables, supports the concept of chains, which are a collection of rules. Then by selecting which chain applies and when, it is possible to delegate chains to policies. REAPOFF uses the same principle when producing IPTables rules as well.

**Figure 3:** The use of policies in a HTTP proxy
$\begin{figure}\centering \mbox{\epsfxsize =0.9\columnwidth \epsffile{policy.eps}} \end{figure}$

A Policy is therefore defined as a collection of rules. Rule execution can be diverted to different policies depending on certain conditions. An example can illustrate this principle best:

Consider Figure 3 above which shows a screenshot of the HTTP proxy. This configuration may be found in the examples directory. Rule evaluation proceeds from the top to the bottom in order. When the rule evaluation reaches the ``Policy Selection by Authentication'' rule, the authentication parameters within the request are compared to the username and password list specified within the rule configuration variables (in this case username=''username'' and password of password'). If these match, the policy is selected as ``Policy Name'' which in this case is power_users. Execution then continues from the power_users policy. Note that the path of execution does not actually change until the ``Execute selected policy'' rule is executed. This allows to put in several policy selection rules in succession with each subsequent rule overriding the previous one.

The following sections will describe some of the currently available proxies and how they should be deployed in practice:

general

Rules in this family are designed to be used with any proxy. These are access control rules, controlling those IP addresses which are allowed to access the proxy.

If a connection is made to the proxy from a denied IP address, the proxy will immediately tear down the connection and log the connection attempt.

HTTP proxies

This proxy is designed to protect the privacy and security of an internal network while allowing external browsing to the internet. REAPOFF's HTTP proxy is different from SQUID because REAPOFF can rewrite content on the fly and has a sophisticated blocking mechanism to finely control access to certain sites. REAPOFF does not cache web pages.

Transparent HTTP proxy: Use this to enable transparent HTTP proxy. You must have Linux 2.4 kernel for this to work. Note that the client must have the firewall configured as a gateway as well, and you will most likely need to allow DNS into the internal network as well.

Handoff Proxy: Sometimes a caching proxy is required in addition to REAPOFF. This rule allows REAPOFF to hand off all connections to a separate caching proxy for further processing and possibly authentication.

Limit Post Size: Often the security policy forbids the uploading of files via HTTP. This is done to protect intellectual properties for example. It is difficult, however, to enforce this policy because any site can accept a file upload, for example a web based email system. The easiest way to stop this is to limit the maximum size of a POST directive. Thus if the POST is too large, the connection will be terminated and the user will be warned. A log is also generated.

Block Active X: Active X is a problematic technology since it is basically an executable downloaded from the internet allowed to run on the clients machine. There is no ``sand box'' environment like JAVA for example. Thus it is common in many security policies to deny Active X. This might break some sites, but Active X is not really used much on the internet so its not a great loss. Note that this method is not full proof, because an attacker can always craft malicious javasctipt that creates an Active X object on the fly without allowing REAPOFF to inspect it.

Deny HTTP methods: HTTP has quite a number of different methods, some are extensions. For example, file upload can also be done via the PUT method. This rule allows you to restrict the HTTP method to a specific set of allowed methods. Note that in order for WebDav to work, many other methods must be allowed, so this rule helps to stop WebDav.

CONNECT support: In order to allow SSL communications through a HTTP proxy, the CONNECT method must be allowed. This method creates an end-to-end tunnel from client and server, over which encrypted traffic can be exchanged. This represents a significant threat since it allows any internal user to completely bypass the firewall. If you need to provide SSL support for clients, you must enable this rule. Alternatively, REAPOFF will have an intercepting SSL proxy available in the next release (There is a pre-alpha version you can play with).

Block advertisers optionally: Advertising is a pain on the net. However, many Ad blocking rules make mistakes sometimes and accidentally block sites which are not ads. To make life easier, you can use this rule to allow people to bypass the ad blocking and get the page anyway.

FTP services: This allows the HTTP proxy to service FTP urls over HTTP. It is probably better to use the transparent FTP support instead though. Note that you cant use this option if you want the proxy to be transparent. Transparent proxies need to have a proper transparent FTP proxy configured instead.

Command line options

REAPOFF supports a variety of command line options. To find out what command line options a particular component supports use the -h or -help directive:

plug proxy

The plug proxy supports the following command line options:

-l,-license - Prints out the License terms for REAPOFF. Also see section 12.
-p,-port=INT - Port to listen on. Currently only a single port to listen on can be specified. The plug proxy will bind to all addresses with this port.
-v,-verbose - Verbosity (twice for more). This chats along as it runs. Notice that if verbosity is enable plug will not daemonize, and continue to run in the foreground.
-s,-source=IP RANGE - Allowed source IP range, e.g. 192.168.1.0/255.255.255.0. Currently only a single source IP address restriction can be specified. In the future REAPOFF will support groups and policies. Note that if a connection from an IP address which does not fall in this range occurs, REAPOFF will accept the connection but then immediately close it. The result is that the port still looks open to a port scanner, but REAPOFF will simply refuse to communicate on it.
-d,-destination=STRING - Destination IP address. This specifies the destination IP address the plug will connect to. Note that destination can be changed programmatically within the rules later, and so sometimes does not need to be set initially. When set to 0.0.0.0 (default) plug will never make an outbound connection and all outbound data will be lost. This is useful when you need REAPOFF to act as a server rather than a proxy.
-r,-remote=INT - Remote port to connect to on the destination host.
-i,-inbound=FILENAME - RE definition file for inbound traffic. For file format see later sections.
-o,-outbound=FILENAME - RE definition file for outbound traffic. In the future the inbound and outbound files will be combined into policies and will reside within the same file.
-m,-mode CHAR - default buffering mode. This can be line mode or char mode. For more information about buffering see buffering later on.
-q,-quiet - Do not log anything. This improves the speed somewhat, but not recommended for production environments.
-u,-uid=INT - User ID to change to after listening. This is very important when binding REAPOFF to a privileged port (<1024). In this case REAPOFF must be started as root so that it can bind to the port. It is a really bad idea to run REAPOFF as root. REAPOFF is designed to run as an unprivileged uid/gid inside a chroot prison.
-g,-gid=INT - Group ID to change to after listening. Same as above, but for group ids.
-c,-chroot=STRING - Directory to chroot into after initialization. REAPOFF goes into the chroot prison by itself after the process already started executing. This means that REAPOFF does not need to have copies of any libraries inside the chroot prison. This makes it very easy to use, all that is required is that a path be given to the chroot option. Note that if you make use of the exec directive you will need to copy libraries into the prison so that other processes may be executed.
-t,-timeout=INT - Number of seconds allowed for inactivity (30). REAPOFF will terminate the connection if no traffic is flowing in either direction within this period.

Secure Sockets Layer (SSL)

SSL is a protocol designed to add security to network communications. Its designed to stop two types of attacks from taking place:

Interception attack
Man in the Middle attack

SSL is an encrypted end-to-end protocol. This fact raises problems for network security devices, such as firewalls and IDS:

IDS are unable to examine traffic within the SSL connection because it is encrypted. This reduces the IDS effectiveness against server borne attacks.
Firewalls and application level proxies are unable to examine the actual requests within the SSL connection and are unable to enforce a security policy against the transaction. This is especially true for REAPOFF's HTTP proxy which is able to enforce a very tight level of control on transactions. This control is lost when SSL is allowed to take place.

SSL also represents a major threat for networks because proxies typically need to support the CONNECT directive which allows a tunnel to be established between the client and servers on the Internet. Since the tunnel is generally encrypted, proxies are forced to allow any traffic through. A large proliferation of software packages has recently become available to exploit this flaw and allow outbound tunnels through the HTTP proxy to carry arbitrary network traffic. It is also possible to route arbitrary traffic over the SSL tunnel via pppd and effectively form a VPN terminating inside the network.

WebDav is also a major threat. Since WebDav allows the sharing of folders using the HTTP protocol, and is widely available and supported under windows platforms, its use is very difficult to stop in real networks over SSL. The author was very surprised to discover how easy it was for malicious insiders to use webdav to connect out to an external HTTP server on the Internet which allows WebDav connections over SSL. Almost any windows client with Internet Explorer versions greater than about 5.5 can easily connect out over SSL and copy files in either direction unchecked.

Clearly allowing your clients to use SSL represents a major threat to your network. Also allowing your web server to communicate directly with clients using SSL negates the IDS that may be used. How can SSL be properly managed in the network? Clearly, it is very difficult to deny SSL outright, since many popular sites now require it. Users do not necessarily appreciate the dangers and are commonly focused on the need to do business on line in an e-banking or e-commerce situation.

There are currently 3 ways in which SSL can be managed on the network securely. REAPOFF supports all three, but the following sections describe the details. It is important to select the most appropriate strategy for the situation. Which strategy is chosen depends mainly on the security policy and the amount of computational capacity available on the gateway.²

Restricted SSL connections

This method restricts SSL connections to a number of known hosts. This is usually done by using a rule in the HTTP proxy. Note that SSL tunnels are still allowed between client and server, so you really need to trust the hosts that you allow here.

Note that this is the usual method for doing this in less capable firewalls (e.g. packet filtering firewalls) and older application level firewalls. REAPOFF offers much more powerful methods for controlling SSL and you should only select this method if you don't have enough processing capacity on the gateway machine for the network load.

Semi-complete SSL tunnels

This scheme creates SSL tunnels on the gateway machine. The traffic is then sent to the client unencrypted. The client really has no idea that the traffic is meant to be encrypted at all, and so does not generally show the locked padlock many users associate with SSL connections. The traffic is in fact not encrypted on the network at all and any IDS deployed along the way will have full access to the communications. The architecture is shown in figure 4.

**Figure 4:** Semi-complete SSL tunnels
$\begin{figure}\centering \mbox{\epsfxsize =0.6\columnwidth \epsffile{semissl.eps.gz}} \end{figure}$

There are a number of key steps in this architecture:

The SSL master manages an array of SSL tunnels. These Tunnels encrypt communications to the remote servers on the Internet. The SSL master records which tunnel is connected to which host and ensures that tunnels are not idle for too long or they are terminated. If a tunnel does not exist for the desired host the SSL master creates a tunnel.
The HTTP proxy rewrites all references to an SSL enabled URL into a HTTP enabled URL on port 443. So for example if a HTML page carries the URL: https://www.test.com/secure/link.html this is rewritten as http://www.test.com:443/secure/link.html. This effectively tells the browser that its to use standard HTTP to obtain this address.
The HTTP proxy then interprets connection requests to port 443 as SSL requests in disguise. In order to find what tunnel is used for this server, the proxy connects to the SSL master, and queries it. The SSL master returns the current port number for the relevant tunnel. After this, the HTTP proxy is able to directly connect to the tunnel.
Note that the client thinks its talking HTTP. In fact the HTTP proxy should not support the CONNECT directive in this case since SSL is completely not allowed on the internal network segment.

Full SSL proxying

Since in the previous scheme, data which is usually encrypted is transmitted in the clear, some concerns may be raised about the privacy and confidentiality of this data. In particular there is a concern that a rogue system on the internal network may be able to sniff the traffic and obtain sensitive information this way.

The Full SSL proxying architecture is shown in figure 5. This architecture requires REAPOFF to decrypt the SSL traffic, inspect it via the usual HTTP proxy rules and then re-encrypt the traffic to the client. Note that effectively REAPOFF is performing a Man in the Middle attack against the SSL connection stream. However, SSL is designed to prevent this type of attack from taking place, by requiring server certificates to be signed by trusted certificate authorities. In order for REAPOFF to transparently perform this function, REAPOFF must be trusted as a certification authority by the client. Otherwise the client will constantly issue a ``This certificate is not trusted'' message.

**Figure 5:** Full SSL proxying
$\begin{figure}\centering \mbox{\epsfxsize =0.6\columnwidth \epsffile{fullssl.eps.gz}} \end{figure}$

The main steps used in this case are:

The standard HTTP proxy gets a CONNECT directive from the client asking to form an SSL connection to the server. The HTTP proxy then queries the SSL master as to the port number of the relevant tunnel. A variation here is to allow create a transparent SSL proxy to intercept all connections to port 443.
The SSL master keeps a record of all tunnels, where they currently go to. If a tunnel does not exist, the SSL master creates this tunnel. In order to initialize the server (encrypting) component of the tunnel, the master needs to assign a valid certificate to this server. The master checks its list of certificates and issues a new certificate if one does not already exist. Note that a valid certificate is required with a server name the same as what the client had asked for. If the server name is not the same, the client browser will raise an error claiming the certificate is not issued to the server its connected to. The act of assigning new certificates and signing those is done transparently by the master, and once the new certificate is issued and signed, the tunnel will be initialized.
A new SSL proxy is created to inspect the content of the traffic between the encrypting and decrypting tunnels. This is required since the original proxy is unable to inspect the encrypted traffic. It is therefore recommended that the SSL proxy be configured in a similar manner to the HTTP proxy with similar restrictions placed on the communications.

An example of a full SSL proxy can be found in the examples directory.

Importing CA certificate into web clients

In order for REAPOFF to transparently sign certificates, it is necessary for the clients to fully trust it. For this purpose the clients need to have the CA certificate inserted into the trusted ROOT CA store. Use the following procedure for netscape like browsers (Mozilla, Galeon, Netscape):

Copy the cacert.crt to an apache web server directory
Type the URL of the cert, e.g. http://www.example.com/cacert.crt
Accept the CA certificate for signing web sites.

The following procedure should be used for IE browsers:

open the tools/internet options in IE.
Select Contents/Certificates.
Click import certificate.
browse to the cacert.crt or cacert.pem.
import that into your trusted root store.

If you do not have a suitably configured apache web server, or you would like to make the certificate permanently available to many machines, there is a small REAPOFF configuration file which will serve out the certificate over any chosen port (for example 8000):

Launch the REAPOFF CA certificate mini-server on a port of your choice:
```
plug -p 8000 -o cacert.outbound
```
Navigate your browsers to this port, i.e. http://www.gateway_example.com:8000/

Once REAPOFF is trusted by the client, it is possible to completely remove all other CA's from the trusted store, since REAPOFF will automatically intercept and change the certification of each site. In a large installation, it may be wise to configure the SOE to automatically trust the gateway to sign certificates.

Proxy Rule Writing

Currently there are two separate configuration files, given by the -i and -o command line directives. In the next version these may be merged into a single file. The configuration file fully defines the behavior of the proxy under a specific protocol. The use of the GUI shields users from changes in the format of the rules. If users use the gui to configure their proxies, then when a new version of REAPOFF becomes available, their rules may be synced with the new template file to transparently convert their installation to the new format of the configuration files. This allows the configuration file format to be quite fluid.

Basically the proxy will listen on a particular port for incoming connections. When a connection is received from a client, the proxy will read some data from the listening socket. This data will be processed through the set of rules, and any relevant actions will be executed. After the data is processed, it will be passed to the connecting socket. Outbound rules apply to traffic from the listening side to the connecting side, whereas inbound traffic applies from the connecting socket to the listening socket.

Note also that the configuration files fully define the behaviour of the proxy. This means that if the authors make an error in writing their configuration files directly, a vulnerability may result. Hence inexperienced users may want to restrict themselves to using the GUI rather than write their own configuration files.

Buffering

Buffering is the single most important aspect of the plug proxy, and it is essential to understand how it works before writing a new proxy. If the buffering is not implemented correctly the proxy may not work at all, or it may allow attackers to bypass the filtering rules.

char mode

The proxy will typically perform a read operation on both the client socket or the server socket. When in char mode, as soon as the read operation returns some data, this data is processed through the relevant rules and the result passed on to the other end. This presents a problem in that the attacker can simply send very small packets - thus preventing any of the REs from matching properly. An example will serve to illustrate the problem:

Suppose we want to block incoming object tags from the HTTP proxy:

if str  object
s/<\s*object/<_no_object_allowed/ig

Now suppose the attacker was running their own site and wanted to slip the object tag past REAPOFF. They could first send the '' character and then the 'o', then the 'b' etc. Each of these characters will cause REAPOFF to process the read buffer (which is 1 character big). This will fail to match the string object and nothing will happen.

Therefore char mode should not be considered for security critical filtering. This mode does not guarantee that the matching engine will work properly, although in most cases it will (because typically packets contain very large buffers). A similar problem also exists with distinguishing FTP port commands from error messages with the word PORT in them (A common attack against firewalls).

line mode

In order to avoid the problems inherent within char mode, we have line mode. In this mode the proxy will not process the rule set until it has at least 1 $\backslash$ n new line within the buffer. Thus sending one character at the time will not work for the attacker in the above examples, because the buffer will not be processed until the entire line has been sent.

It must be noted that in line mode, multi-line REs are not guaranteed to work for exactly the same reason that REs are not guaranteed in char mode. If you need to make a multi line match you should use the smart mode. Line mode is the default mode with REAPOFF, unless specified otherwise.

You might find that binary protocols do not work properly at all using line mode. This is because there may not be a new line separating communications between the client and server. In this case one end will send their message across without having a new line, and wait for the server to return its message. In the meantime REAPOFF will be waiting for a new line and not process the buffer. In this case a deadlock may occur and things will not work.

In order to solve this deadlock, try to identify which mode to use where and switch to it using "set mode line" or "set mode char" where necessary.

smart buffering

Sometimes it is necessary to buffer large quantities of data before running the rules over those. This is useful for example to examine the entire header of a HTTP request, or to have multi-line matches. To do this use the startbuff and endbuff actions:

if /... start of buffering ../
startbuff

if /... end of buffering ../
endbuff

if /... some rule ../
actions.....

In this case the proxy will start buffering when the first rule matches. Each new read, will cause the rules to be evaluated, but no other actions will be performed until the end of buffering rule matches, and the endbuff directive is executed. After that happens other rules are evaluated and actions are executed.

If is usually best to have the startbuff/endbuff directives at the beginning of the rule set to allow all the other rules an opportunity to match after buffering ends.

The following actions will be executed while buffering: KILL , SET , GOTO , END, EVAL. The idea is that it should still be possible for the proxy to apply the right policies while buffering.

Rule file syntax

The general syntax of the rule file is described in this section. When the proxy is started up, the configuration file is read and parsed. The proxy interprets the file and compiles internal lists representing the configuration (for more info about this see section "internal data types"). This is done for speed and efficiency reasons, since the parsing phase is quite expensive, and REAPOFF aims to be very fast in operation. Commands are allowed any amount of white space before and after, but must be on their own lines. Currently placing two commands on the same line is not allowed.

Config

By preceding an action statement with the configuration directive, it will be executed during the initial parsing of the configuration file (i.e. when REAPOFF is first started). Currently only actions are allowed with a configuration directive and not conditions. Example:

        config set mode char
        config set port 8080
        config set remote 3127
        config startbuff

Conditions

Conditions are tests that REAPOFF performs on the read buffer. If the condition matches then the actions related to this block are executed. Conditions are evaluated in turn from the first condition in the file to the last. Note that conditions are evaluated on the results of previous actions, so for example:

if.....
s/hello/hello world/ig

if str world
log got world

The second condition will match if the first condition is true and the word hello is present in the buffer.

string matches

Currently 2 types of conditions are supported, Regular Expressions or string matches. String matches are very quick, but only match on case insensitive exact string matches. String matches are primarily used to optimize performance where many substitutions are performed, for example when we want to block domains of servers from the HTTP proxy:

s/((GET|POST|PUT)\s*http:\/\/www\.someserver\.com)/BLOCK $1/ig

This substitution can be conditioned by a string match for optimized performance:

if str www.someserver.com
s/((GET|POST|PUT)\s*http:\/\/www\.someserver\.com)/BLOCK $1/ig

This way the RE does not need to be executed when it obviously has no chance of matching.

Regular expressions

RE conditions are very powerful ways of breaking down the protocols. The full suite of Perl style regular expressions are supported thanks to the pcre library. Do a man pcre for more information about pcre. REs are capable of extracting particular strings from within the buffer. These strings can then be used in further action statements (obviously only those statements belonging to the relevant condition). Example:

if /(GET|POST|PUT)\s(\S*)/
log Got a $1 request for URL $2

The above parses the HTTP header and extracts two captures strings, the first representing the method, and the second representing the URL requested. The log action is then invoked with those captures substrings expanded appropriately.

conditional variables

Conditions are also allowed to operate on variables. This allows for a very flexible way to specify regular expressions in a very organized and consistent way. For example:

#First extract content type from headers
if /(^|\n)Content-type:\s*(\S*)/
set content_type $2

..... (some more rules)

#Now do something special for particular content types:
if $content_type /(text|html)/
log This is a text or html page

In this way it is possible to break protocols down in steps and make decisions about actions in a more systematic way.

Cascading conditions together

Sometimes it is useful to connect conditions using logical Operators. Logical expressions are only supported currently by cascading together normal rules. The difference between cascading rules and ordinary logical operators as found in programming languages is that cascading conditionals operate on the previous conditional, regardless of precedence. This is similar to the Reverse Polish Notation of writing expressions. An example will serve to illustrate best:

if  str hello
    printin 1
or  str world
    printin 2
and str nice
    printin 3

The following table summarized what the result would be in each of the following cases:

hello world 1,2

thats a nice world 2,3

nice program none

hello, nice people 1,2,3 Pay particular attention to this one

The following logical operators are supported: and, or, and_not, or_not

variables

Variables are currently only of string type. The user may define any number of variables and call them anything they like, except for a small subset of predefined variable names. The reserved variable names serve as ways with which actions can specify conditions to REAPOFF and control REAPOFF's behavior. For example:

#Find port specification in URL (e.g. http://www.server.com:8000/)
if $url /http:\/\/[^\/:]*:(\d*)/
#Set remote port to connect to:
set remote $1
log Will connect to port $remote.

The following are built in variables:

port - port proxy will listen on. Probably most useful in config directives because socket is already connected otherwise.
source - Allowed source IP address for connections in this format 1.2.3.4/255.255.255.0 ip/netmask. Probably only useful in config directives.
destination - IP address REAPOFF will connect to. Very useful in conjunction with the connection action.
remote - remote port to connect to. May also be given in comma notation ala FTP port (e.g. 235,43)
mode - Buffering mode to use, can be char or line.
timeout - inactivity timeout in seconds. If no packets arrive from either the server or client within this long the connection is terminated.
transparent - This readonly variable contains the IP address of the intended destination as used in the transparent proxying support. (see section 11).
state - Current state of connection, can be preprocess, running, and postprocess. Mostly useful for running actions on these occasions. e.g.
```
if $state str pre
action.. 

if $state str post
log Connection terminated.
```

Note that variables are persistent across the inbound and outbound chains so they may be used as ways to communicate and synchronize the inbound and outbound rule sets.

Actions

Actions are executed whenever the condition within a rule is found to be true. Actions are executed in order and operate on previous actions results. Actions all take a single argument which is taken to be the rest of the line. The following actions are currently supported:

startbuff,endbuff - smart buffering control, see buffering above.
linger,nolinger - If this action is executed, it ensures that the rule that triggered it will always match on all future buffers.
log - Causes a log entry to be added. Logs are currently done using syslog. Be aware that logging is a performance overhead so try to be concise in your logs. Useful for debugging though.
kill - Current connection is terminated.
exec - This action executes an external child process. The exec command does not use the shell but instead supports rudimentary tokenization. Thus exec will work even without /bin/sh which is suitable for chroot environments. Be careful with this command.
expectin - This action will read data from the inbound socket and try to match the given RE on this data. If the RE matches, actions will be processed normally, and new captured strings will override previous captured strings. If within a given time period the RE has not matched, the actions are skipped until the next "onerror" action, and execution will continue from there.
expectout - Same as expectin only reading from the outbound socket. The expect directives are useful for scripting interactions with other machines in a similar way to the expect command.
flush - Data read by expect* commands will be placed into the buffer and will actually be passed to the other end after the execution of all the rules has finished. If you don't want that, use flush to wipe the buffer clean.
onerror/done - This block signifies error handlers for expect*, during normal execution these actions are ignored, and are only executed when an action fails.
wait - This pauses for the specified number of msec. Usually needed after exec to give the forked process time for initialization.
set - Allows to set variables. Note that the argument to set consists of a word which is taken as the variable name (without the $) and then either an = or a white space and the value.
filereadin - Reads the contents of a file and dumps those into the inbound socket.
filereadout - Same for outbound socket.
fileappend - takes two parameters separated by whitespace. The first is the filename while the second is a string to append to this file. If the file does not exist, the file is created.
connect - closes the outbound socket and reconnects. Most useful after altering the destination or remote port via the set command, and then transparently reconnecting. The client has no idea anything happened, except data will start coming from a different source.
printout - prints a message to the outbound socket. You can use $\backslash$ n, $\backslash$ r, $\backslash$ t for escaping certain characters, as well as expand variables here.
printin - Same thing for inbound socket.
goto/label - continues execution after the label named in the goto. If no such label is found execution resumes from this action onward.
eval - Evaluates a function and sets a variable to the result of the function. See functions below.

functions

Although REs are very powerful some things can not be done very well with REs alone. Perhaps it may be inefficient to do things using REs. For this purpose we have internal functions. The idea is that the user is able to call on arbitrary functions to do a specific task by using the eval construct:

if something
eval variable = function arg

This will assign the result from running the arg string through the function named. The function must be compiled into the plug proxy. Functions should be modular so users may easily write their own functions. Functions are included from function.h. To see which functions are supported use plug -h. Currently the following functions are supported:

b64decode - Decodes from base64 to a binary string. This is used for HTTP based authentication and SMTP attachment filtering for example.
b64encode - Encodes to Base 64.
now - This function is mostly used to specify time based ACLs, or to make decisions based on time. Returns the current time in a number of formats: week day - returns the day of the week. month day - returns the day of the month 1-31. hour - returns the hour in 24 hour format. minutes - returns the current minute within the hour. seconds - returns the current second in the minute.

More functions to come in the next release as they become available.

Transparent Proxy Support

A transparent proxy is a proxy running on the gateway, which can connect transparently to a remote destination on behalf of a client. The client does not know that the connection is performed through the proxy, and therefore does not need to have anything configured. It is very common to use transparent proxying to lighten the administrative overhead of configuring a large number of clients.

Linux supports transparent proxying via the firewalling modules, ipchains, and iptables. In this case the packet filtering engine within the kernel rewrites the packets as though they were actually destined to the local host with the specified port. An example of this is:

iptables -t nat -A PREROUTING -i eth0 -p tcp \
              --dport 80 -j REDIRECT --to-port 3128

Here the kernel will redirect all packets coming on eth0 and going to port 80 into the localhost with port 3128. The proxy will accept connections on 3128 and service the request.

In order for the proxy to know the original destination, a getsockname call must be performed. REAPOFF makes this call available via the special read only variable $transparent.

Hence if you want to allow transparent connections, do this:

if $state str pre
set destination $transparent

Note that transparent proxying increases the security risk since clients do not need to be especially configured to use the gateway. In addition you need to allow DNS for internal clients so they can resolve their own addresses.

License

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

Note that the behavior of REAPOFF is controlled by the rules used. These rules are also considered to be modification to REAPOFF for the purpose of licensing. If you write your own rules for whatever reason, you must also distribute those rules in accordance with the GPL. If you require a special exception to these rules you may contact the author for a special licensing arrangement. It goes without saying that any additional functions (see 10.1) written into reapoff constitute a modification of the source and must have a compatible license.

GPL

  
                    GNU GENERAL PUBLIC LICENSE
   TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION

  0. This License applies to any program or other work which contains
a notice placed by the copyright holder saying it may be distributed
under the terms of this General Public License.  The "Program", below,
refers to any such program or work, and a "work based on the Program"
means either the Program or any derivative work under copyright law:
that is to say, a work containing the Program or a portion of it,
either verbatim or with modifications and/or translated into another
language.  (Hereinafter, translation is included without limitation in
the term "modification".)  Each licensee is addressed as "you".

Activities other than copying, distribution and modification are not
covered by this License; they are outside its scope.  The act of
running the Program is not restricted, and the output from the Program
is covered only if its contents constitute a work based on the
Program (independent of having been made by running the Program).
Whether that is true depends on what the Program does.

  1. You may copy and distribute verbatim copies of the Program's
source code as you receive it, in any medium, provided that you
conspicuously and appropriately publish on each copy an appropriate
copyright notice and disclaimer of warranty; keep intact all the
notices that refer to this License and to the absence of any warranty;
and give any other recipients of the Program a copy of this License
along with the Program.

You may charge a fee for the physical act of transferring a copy, and
you may at your option offer warranty protection in exchange for a fee.

  2. You may modify your copy or copies of the Program or any portion
of it, thus forming a work based on the Program, and copy and
distribute such modifications or work under the terms of Section 1
above, provided that you also meet all of these conditions:

    a) You must cause the modified files to carry prominent notices
    stating that you changed the files and the date of any change.

    b) You must cause any work that you distribute or publish, that in
    whole or in part contains or is derived from the Program or any
    part thereof, to be licensed as a whole at no charge to all third
    parties under the terms of this License.

    c) If the modified program normally reads commands interactively
    when run, you must cause it, when started running for such
    interactive use in the most ordinary way, to print or display an
    announcement including an appropriate copyright notice and a
    notice that there is no warranty (or else, saying that you provide
    a warranty) and that users may redistribute the program under
    these conditions, and telling the user how to view a copy of this
    License.  (Exception: if the Program itself is interactive but
    does not normally print such an announcement, your work based on
    the Program is not required to print an announcement.)

These requirements apply to the modified work as a whole.  If
identifiable sections of that work are not derived from the Program,
and can be reasonably considered independent and separate works in
themselves, then this License, and its terms, do not apply to those
sections when you distribute them as separate works.  But when you
distribute the same sections as part of a whole which is a work based
on the Program, the distribution of the whole must be on the terms of
this License, whose permissions for other licensees extend to the
entire whole, and thus to each and every part regardless of who wrote it.

Thus, it is not the intent of this section to claim rights or contest
your rights to work written entirely by you; rather, the intent is to
exercise the right to control the distribution of derivative or
collective works based on the Program.

In addition, mere aggregation of another work not based on the Program
with the Program (or with a work based on the Program) on a volume of
a storage or distribution medium does not bring the other work under
the scope of this License.

  3. You may copy and distribute the Program (or a work based on it,
under Section 2) in object code or executable form under the terms of
Sections 1 and 2 above provided that you also do one of the following:

    a) Accompany it with the complete corresponding machine-readable
    source code, which must be distributed under the terms of Sections
    1 and 2 above on a medium customarily used for software interchange; or,

    b) Accompany it with a written offer, valid for at least three
    years, to give any third party, for a charge no more than your
    cost of physically performing source distribution, a complete
    machine-readable copy of the corresponding source code, to be
    distributed under the terms of Sections 1 and 2 above on a medium
    customarily used for software interchange; or,

    c) Accompany it with the information you received as to the offer
    to distribute corresponding source code.  (This alternative is
    allowed only for noncommercial distribution and only if you
    received the program in object code or executable form with such
    an offer, in accord with Subsection b above.)

The source code for a work means the preferred form of the work for
making modifications to it.  For an executable work, complete source
code means all the source code for all modules it contains, plus any
associated interface definition files, plus the scripts used to
control compilation and installation of the executable.  However, as a
special exception, the source code distributed need not include
anything that is normally distributed (in either source or binary
form) with the major components (compiler, kernel, and so on) of the
operating system on which the executable runs, unless that component
itself accompanies the executable.

If distribution of executable or object code is made by offering
access to copy from a designated place, then offering equivalent
access to copy the source code from the same place counts as
distribution of the source code, even though third parties are not
compelled to copy the source along with the object code.

  4. You may not copy, modify, sublicense, or distribute the Program
except as expressly provided under this License.  Any attempt
otherwise to copy, modify, sublicense or distribute the Program is
void, and will automatically terminate your rights under this License.
However, parties who have received copies, or rights, from you under
this License will not have their licenses terminated so long as such
parties remain in full compliance.

  5. You are not required to accept this License, since you have not
signed it.  However, nothing else grants you permission to modify or
distribute the Program or its derivative works.  These actions are
prohibited by law if you do not accept this License.  Therefore, by
modifying or distributing the Program (or any work based on the
Program), you indicate your acceptance of this License to do so, and
all its terms and conditions for copying, distributing or modifying
the Program or works based on it.

  6. Each time you redistribute the Program (or any work based on the
Program), the recipient automatically receives a license from the
original licensor to copy, distribute or modify the Program subject to
these terms and conditions.  You may not impose any further
restrictions on the recipients' exercise of the rights granted herein.
You are not responsible for enforcing compliance by third parties to
this License.

  7. If, as a consequence of a court judgment or allegation of patent
infringement or for any other reason (not limited to patent issues),
conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License.  If you cannot
distribute so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you
may not distribute the Program at all.  For example, if a patent
license would not permit royalty-free redistribution of the Program by
all those who receive copies directly or indirectly through you, then
the only way you could satisfy both it and this License would be to
refrain entirely from distribution of the Program.

If any portion of this section is held invalid or unenforceable under
any particular circumstance, the balance of the section is intended to
apply and the section as a whole is intended to apply in other
circumstances.

It is not the purpose of this section to induce you to infringe any
patents or other property right claims or to contest validity of any
such claims; this section has the sole purpose of protecting the
integrity of the free software distribution system, which is
implemented by public license practices.  Many people have made
generous contributions to the wide range of software distributed
through that system in reliance on consistent application of that
system; it is up to the author/donor to decide if he or she is willing
to distribute software through any other system and a licensee cannot
impose that choice.

This section is intended to make thoroughly clear what is believed to
be a consequence of the rest of this License.

  8. If the distribution and/or use of the Program is restricted in
certain countries either by patents or by copyrighted interfaces, the
original copyright holder who places the Program under this License
may add an explicit geographical distribution limitation excluding
those countries, so that distribution is permitted only in or among
countries not thus excluded.  In such case, this License incorporates
the limitation as if written in the body of this License.

  9. The Free Software Foundation may publish revised and/or new versions
of the General Public License from time to time.  Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.

Each version is given a distinguishing version number.  If the Program
specifies a version number of this License which applies to it and "any
later version", you have the option of following the terms and conditions
either of that version or of any later version published by the Free
Software Foundation.  If the Program does not specify a version number of
this License, you may choose any version ever published by the Free Software
Foundation.

  10. If you wish to incorporate parts of the Program into other free
programs whose distribution conditions are different, write to the author
to ask for permission.  For software which is copyrighted by the Free
Software Foundation, write to the Free Software Foundation; we sometimes
make exceptions for this.  Our decision will be guided by the two goals
of preserving the free status of all derivatives of our free software and
of promoting the sharing and reuse of software generally.

                            NO WARRANTY

  11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW.  EXCEPT WHEN
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.  THE ENTIRE RISK AS
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.  SHOULD THE
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
REPAIR OR CORRECTION.

  12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.

                     END OF TERMS AND CONDITIONS

Michael 2002-11-09

hello world	1,2
thats a nice world	2,3
nice program	none
hello, nice people	1,2,3	Pay particular attention to this one