netsarlacc is a high performance enterprise HTTP (and SMTP) sinkhole designed to be used by corporate SOC or IR teams. netsarlacc was designed with several purpose-specific goals in mind:
netsarlacc is meant to work in conjunction with existing blocking / captive portal / quarantining / redirecting technologies like DNS RPZ. In a typical deployment, netsarlacc is the target IP / CNAME provided to clients that look domain names being blocked by your DNS security infrastructure such as DNS RPZ or Cisco's Umbrella. The logs produced by netsarlacc go beyond the logs available from a typical webserver and were specifically designed with incident response and network monitoring in mind.
At this time, netsarlacc was primarily built for HTTP but basic support for non-interactive SMTP is available too. Later more interactivity for SMTP and other protocols may be added.
netsarlacc requires:
netsarlacc can be built with go or gccgo. Testing of Go 1.8 vs GCC 5.4.0 shows Go is significantly faster. This is likely due to gccgo using and older version of the go language specification.
You can build netsarlacc with the provided makefile:
# make
go build netsarlacc.go dispatcher.go logger.go worker.go
For initial testing a small configuration file like the following is a good start:
$ cat lambda.json
{
"Daemonize" : false,
"Workers" : 4,
"WorkingDirectory": "/home/brenrigh/projects/github/netsarlacc",
"LogDirectory" : ".",
"ListenList" : [{"Host":"0.0.0.0", "Port":"8080", "Proto":"tcp", "App":"http", "TLS":false},
{"Host":"0.0.0.0", "Port":"8443", "Proto":"tcp", "App":"http", "TLS":true}
]
}
Then you can start the sinkhole using this configuration file:
$ ./netsarlacc -c /home/brenrigh/projects/github/netsarlacc/config.json
Starting sinkhole instance netsarlacc-lambda
Started with 8 max threads
Now running with with 8 max threads
Starting 512 readers
Starting 4 workers
Listening on 0.0.0.0 tcp/8080
Listening on 0.0.0.0 tcp/8443
Wrapped 0.0.0.0 tcp/8443 with TLS
Listening on 0.0.0.0 tcp/8444
Wrapped 0.0.0.0 tcp/8444 with TLS
To test the performance of netsarlacc, ApacheBench is one of the easier tools to use:
$ ab -l -q -n 200000 -c 80 http://127.0.0.1:8080/netsarlacc_test
Once setup, tuned, and tested, netsarlacc is meant to run as a Unix daemon out of an init script or as a service.
netsarlacc logs use JSON which is easily parsed by many tools. Logs entries are written as requests come in and the log files are rotated every 10 minutes.
For human consumption, the tool jq
can be used to make easier to read:
$ cat sinkhole-2017-06-12-22-50-UTC.log | jq . | less
Which will look something like:
{
"timestamp": "2017-06-12 22:57:45.259795 +0000 UTC",
"bytes_client": "97",
"http_method": "GET",
"url_path": "/netsarlacc_test",
"http_version": "HTTP/1.0",
"http_user_agent": "ApacheBench/2.3",
"dst_name": "127.0.0.1:8080",
"src_ip": "127.0.0.1",
"src_port": "39670",
"sinkhole_instance": "netsarlacc-lambda",
"sinkhole_addr": "0.0.0.0",
"sinkhole_port": "8080",
"sinkhole_proto": "tcp",
"sinkhole_app": "http",
"sinkhole_tls": false,
"raw_data": "474554202f6e65747361726c6163635f7465737420485454502f312e300d0a486f73743a203132372e302e302e313a383038300d0a557365722d4167656e743a2041706163686542656e63682f322e330d0a4163636570743a202a2f2a0d0a0d0a",
"request_error": false
}
Benchmarking and profiling netsarlacc shows CPU usage roughly breaks down like so:
Testing shows that if there are too few workers the bottleneck is in filling out the templates and if there are too many workers, they starve the other tasks of CPU time. The ideal number of workers will vary from machine to machine but the best performance seems to come from matching up the number of workers with the number of physical CPU cores. If you have a hyperthreaded machine (SMT) this will be half the number of CPUs your operating system sees. For maximum performance the number of readers should be between 1x and 2x the number of workers. In a real-world deployment though, you want many more readers than workers (around 64x) to handle abusive / broken clients that hold onto sockets instead of making a valid HTTP request. If you don't have enough readers, a single machine performing a Slowloris-style attack can tie up all of the readers.
Because so much time is spent in the kernel, it's important that the kernel is configured to balance TCP connections across multiple CPUs. Your Ethernet card may have multiple receive queues and you can try to distribute incoming connections across those receive queues and then assign each queue to a specific CPU. An easier alternative is to just tell each receive queues that they can use all the CPUs.
Suppose your Ethernet card is eth1
then you can run:
# find /sys/class/net/eth1/queues/ | egrep rps_cpus | while read LINE; do echo ffff > $LINE; done
Occasionally Go must run through a garbage collection sweep which can cause short pauses in calling accept() for new connections. At very high connection rates it is possible to fill up the operating system's outstanding connection queue and additional connections will get an RST back from the operating system until netsarlacc can catch up. This will likely only come up during benchmarking or at maximum load but if tuning is needed, check out the documentation for:
/proc/sys/net/core/netdev_max_backlog
/proc/sys/net/ipv4/tcp_max_syn_backlog
Owner
Contributors
Categories
Programming Languages
GoLicense
Code Exchange Community
Get help, share code, and collaborate with other developers in the Code Exchange community.View Community