The route_localnet parameter does not exist for ipv6.
DNAT to localhost (:: 1) is possible only in the OUTPUT chain.
In the PREROUTING DNAT chain, it is possible to any global address or to the link local address of the same interface
the packet came from.
NFQUEUE works without changes.
## nftables
nftables are fine except one very big problem.
nft requires tons of RAM to load large nf sets (ip lists) with subnets/intervals. Most of the home routers can't afford that.
For example, even a 256 Mb system can't load a 100K ip list. nft process will OOM.
nf sets do not support overlapping intervals and that's why nft process applies very RAM consuming algorithm to merge intervals so they don't overlap.
There're equivalents to iptables for all other functions. Interface and protocol anonymous sets allow not to write multiple similar rules.
Flow offloading is built-in into new linux kernels and nft versions.
nft version `1.0.2` or higher is recommended.
## When it will not work
* If DNS server returns false responses. ISP can return false IP addresses or not return anything
when blocked domains are queried. If this is the case change DNS to public ones, such as 8.8.8.8 or 1.1.1.1.Sometimes ISP hijacks queries to any DNS server. Dnscrypt or dns-over-tls help.
* If blocking is done by IP.
* If a connection passes through a filter capable of reconstructing a TCP connection, and which
follows all standards. For example, we are routed to squid. Connection goes through the full OS tcpip stack, fragmentation disappears immediately as a means of circumvention. Squid is correct, it will find everything as it should, it is useless to deceive him. BUT. Only small providers can afford using squid, since it is very resource intensive. Large companies usually use DPI, which is designed for much greater bandwidth.
## nfqws
This program is a packet modifier and a NFQUEUE queue handler.
For BSD systems there is dvtws. Its built from the same source and has almost the same parameters (see bsd.eng.md).
--dpi-desync-udplen-increment=<int> ; increase or decrease udp packet length by N bytes (default 2). negative values decrease length.
--dpi-desync-udplen-pattern=<filename>|0xHEX ; udp tail fill pattern
--dpi-desync-cutoff=[n|d|s]N ; apply dpi desync only to packet numbers (n, default), data packet numbers (d), relative sequence (s) less than N
--hostlist=<filename> ; apply dpi desync only to the listed hosts (one host per line, subdomains auto apply, gzip supported, multiple hostlists allowed)
--hostlist-exclude=<filename> ; do not apply dpi desync to the listed hosts (one host per line, subdomains auto apply, gzip supported, multiple hostlists allowed)
Otherwise raw sending SYN,ACK frame will cause error stopping the further processing.
If you realize you don't need the synack mode it's highly suggested to restore drop INVALID rule.
### Virtual Machines
Most of nfqws packet magic does not work from VMs powered by virtualbox and vmware when network is NATed.
Hypervisor forcibly changes ttl and does not forward fake packets.
Set up bridge networking.
### CONNTRACK
nfqws is equipped with minimalistic connection tracking system (conntrack)
It's enabled if some specific DPI circumvention methods are involved.
Currently these are `--wssize` and `--dpi-desync-cutoff` options.
Conntrack can track connection phase : SYN,ESTABLISHED,FIN , packet counts in both directions , sequence numbers.
It can be fed with unidirectional or bidirectional packets.
A SYN or SYN,ACK packet creates an entry in the conntrack table.
That's why iptables redirection must start with the first packet although can be cut later using connbytes filter.
First seen UDP packet creates UDP stream. It defines the stream direction. Then all packets with the same
`src_ip,src_port,dst_ip,dst_port` are considered to belong to the same UDP stream. UDP stream exists till inactivity timeout.
A connection is deleted from the table as soon as it's no more required to satisfy nfqws needs or when a timeout happens.
There're 3 timeouts for each connection state. They can be changed in `--ctrack-timeouts` parameter.
`--wssize` changes tcp window size for the server to force it to send split replies.
In order for this to affect all server operating systems, it is necessary to change the window size in each outgoing packet
before sending the message, the answer to which must be split (for example, TLS ClientHello).
That's why conntrack is required to know when to stop applying low window size.
If you do not stop and set the low wssize all the time, the speed will drop catastrophically.
Linux can overcome this using connbytes filter but other OS may not include similar filter.
In http(s) case wssize stops after the first http request or TLS ClientHello.
If you deal with a non-http(s) protocol you need `--wssize-cutoff`. It sets the threshold where wssize stops.
Threshold can be prefixed with 'n' (packet number starting from 1), 'd' (data packet number starting from 1),
's' (relative sequence number - sent by client bytes + 1).
If a http request or TLS ClientHello packet is detected wssize stops immediately ignoring wssize-cutoff option.
If your protocol is prone to long inactivity, you should increase ESTABLISHED phase timeout using `--ctrack-timeouts`.
Default timeout is low - only 5 mins.
Don't forget that nfqws feeds with redirected packets. If you have limited redirection with connbytes
ESTABLISHED entries can remain in the table until dropped by timeout.
To diagnose conntrack state send SIGUSR1 signal to nfqws : `killall -SIGUSR1 nfqws`.
nfqws will dump current conntrack table to stdout.
Typically, in a SYN packet, client sends TCP extension **scaling factor** in addition to window size.
scaling factor is the power of two by which the window size is multiplied : 0=>1, 1=>2, 2=>4, ..., 8=>256, ...
The wssize parameter specifies the scaling factor after a colon.
Scaling factor can only decrease, increase is blocked to prevent the server from exceeding client's window size.
To force a TLS server to fragment ServerHello message to avoid hostname detection on DPI use `--wssize=1:6`
The main rule is to set scale_factor as much as possible so that after recovery the final window size
becomes the possible maximum. If you set `scale_factor` 64:0, it will be very slow.
On the other hand, the server response must not be large enough for the DPI to find what it is looking for.
Hostlist filter does not affect `--wssize` because it works since the connection initiation when it's not yet possible
to extract the host name.
`--wssize` may slow down sites and/or increase response time. It's desired to use another methods if possible.
`--dpi-desync-cutoff` allows you to set the threshold at which it stops applying dpi-desync.
Can be prefixed with 'n', 'd', 's' symbol the same way as `--wssize-cutoff`.
Useful with `--dpi-desync-any-protocol=1`.
If the connection falls out of the conntrack and --dpi-desync-cutoff is set, dpi desync will not be applied.
Set conntrack timeouts appropriately.
### UDP support
UDP attacks are limited. Its not possible to fragment UDP on transport level, only on network (ip) level.
Only desync modes `fake`,`hopbyhop`,`destopt`,`ipfrag1` and `ipfrag2` are applicable.
`fake`,`hopbyhop`,`destopt` can be used in combo with `ipfrag2`.
`fake` can be used in combo with `udplen` and `tamper`.
`udplen` increases udp payload size by `--dpi-desync-udplen-increment` bytes. Padding is filled with zeroes by default but can be overriden with a pattern.
This option can resist DPIs that track outgoing UDP packet sizes.
Requires that application protocol does not depend on udp payload size.
QUIC initial packets are recognized. Decryption and hostname extraction is supported so `--hostlist` parameter will work.
Wireguard handshake initiation and DHT packets are also recognized.
For other protocols desync use `--dpi-desync-any-protocol`.
Conntrack supports udp. `--dpi-desync-cutoff` will work. UDP conntrack timeout can be set in the 4th parameter of `--ctrack-timeouts`.
Fake attack is useful only for stateful DPI and useless for stateless dealing with each packet independently.
By default fake payload is 64 zeroes. Can be overriden using `--dpi-desync-fake-unknown-udp`.
### IP fragmentation
Modern network is very hostile to IP fragmentation. Fragmented packets are often not delivered or refragmented/reassembled on the way.
Frag position is set independently for tcp and udp. By default 24 and 8, must be multiple of 8.
Offset starts from the transport header.
There are important nuances when working with fragments in Linux.
ipv4 : Linux allows to send ipv4 fragments but standard firewall rules in OUTPUT chain can cause raw send to fail.
ipv6 : There's no way for an application to reliably send fragments without defragmentation by conntrack.
Sometimes it works, sometimes system defragments packets.
Looks like kernels <4.16havenosimplewaytosolvethisproblem.Unloadingof`nf_conntrack`module
and its dependency `nf_defrag_ipv6` helps but this severely impacts functionality.
Kernels 4.16+ exclude from defragmentation untracked packets.
See `blockcheck.sh` code for example.
Sometimes it's required to load `ip6table_raw` kernel module with parameter `raw_before_defrag=1`.
In openwrt module parameters are specified after module names separated by space in files located in `/etc/modules.d`.
In traditional linux check whether `iptables-legacy` or `iptables-nft` is used. If legacy create the file
`/etc/modprobe.d/ip6table_raw.conf` with the following content :
```
options ip6table_raw raw_before_defrag=1
```
In some linux distros its possible to change current ip6tables using this command: `update-alternatives --config ip6tables`.
If you want to stay with `nftables-nft` you need to patch and recompile your version.
In `nft.c` find :
```
{
.name = "PREROUTING",
.type = "filter",
.prio = -300, /* NF_IP_PRI_RAW */
.hook = NF_INET_PRE_ROUTING,
},
{
.name = "OUTPUT",
.type = "filter",
.prio = -300, /* NF_IP_PRI_RAW */
.hook = NF_INET_LOCAL_OUT,
},
```
and replace -300 to -450.
It must be done manually, `blockcheck.sh` cannot auto fix this for you.
Or just move to `nftables`. You can create hooks with any priority there.
Looks like there's no way to do ipfrag using iptables for forwarded traffic if NAT is present.
`MASQUERADE` is terminating target, after it `NFQUEUE` does not work.
nfqws sees packets with internal network source address. If fragmented NAT does not process them.
This results in attempt to send packets to internet with internal IP address.
You need to use nftables instead with hook priority 101 or higher.
--bind-addr=<v4_addr>|<v6_addr>; for v6 link locals append %interface_name : fe80::1%br-lan
--bind-iface4=<interface_name> ; bind to the first ipv4 addr of interface
--bind-iface6=<interface_name> ; bind to the first ipv6 addr of interface
--bind-linklocal=no|unwanted|prefer|force
; no : bind only to global ipv6
; unwanted (default) : prefer global address, then LL
; prefer : prefer LL, then global
; force : LL only
--bind-wait-ifup=<sec> ; wait for interface to appear and up
--bind-wait-ip=<sec> ; after ifup wait for ip address to appear up to N seconds
--bind-wait-ip-linklocal=<sec> ; accept only link locals first N seconds then any
--bind-wait-only ; wait for bind conditions satisfaction then exit. return code 0 if success.
--port=<port> ; port number to listen on
--socks ; implement socks4/5 proxy instead of transparent proxy
--local-rcvbuf=<bytes> ; SO_RCVBUF for local legs
--local-sndbuf=<bytes> ; SO_SNDBUF for local legs
--remote-rcvbuf=<bytes> ; SO_RCVBUF for remote legs
--remote-sndbuf=<bytes> ; SO_SNDBUF for remote legs
--skip-nodelay ; do not set TCP_NODELAY for outgoing connections. incompatible with split.
--no-resolve ; disable socks5 remote dns
--maxconn=<max_connections> ; max number of local legs
--maxfiles=<max_open_files> ; max file descriptors (setrlimit). min requirement is (X*connections+16), where X=6 in tcp proxy mode, X=4 in tampering mode.
; its worth to make a reserve with 1.5 multiplier. by default maxfiles is (X*connections)*1.5+16
--max-orphan-time=<sec> ; if local leg sends something and closes and remote leg is still connecting then cancel connection attempt after N seconds
--hostlist=<filename> ; only act on hosts in the list (one host per line, subdomains auto apply, gzip supported, multiple hostlists allowed)
--hostlist-exclude=<filename> ; do not act on hosts in the list (one host per line, subdomains auto apply, gzip supported, multiple hostlists allowed)
--split-http-req=method|host ; split http request at specified logical position.
--split-pos=<numeric_offset> ; split at specified pos. split-http-req takes precedence over split-pos for http reqs.
--split-any-protocol ; split not only http and https
--disorder ; when splitting simulate sending second fragment first
--hostcase ; change Host: => host:
--hostspell ; exact spelling of "Host" header. must be 4 chars. default is "host"
--hostdot ; add "." after Host: name
--hosttab ; add tab after Host: name
--hostnospace ; remove space after Host:
--hostpad=<bytes> ; add dummy padding headers before Host:
--domcase ; mix domain case after Host: like this : TeSt.cOm
--methodspace ; add extra space after method
--methodeol ; add end-of-line before method
--unixeol ; replace 0D0A to 0A
--tlsrec=sni ; make 2 TLS records. split at SNI. don't split if SNI is not present.
--tlsrec-pos=<pos> ; make 2 TLS records. split at specified pos
--daemon ; daemonize
--pidfile=<filename> ; write pid to file
--user=<username> ; drop root privs
--uid=uid[:gid] ; drop root privs
```
The manipulation parameters can be combined in any way.
`split-http-req` takes precedence over split-pos for http reqs.
split-pos works by default only on http and TLS ClientHello. use `--split-any-protocol` to act on any packet
tpws can bind to multiple interfaces and IP addresses (up to 32).
Port number is always the same.
Parameters `--bind-iface*` and `--bind-addr` create new bind.
Other parameters `--bind-*` are related to the last bind.
link local ipv6 (`fe80::/8`) mode selection :
```
--bind-iface6 --bind-linklocal=no : first selects private address fc00::/7, then global address
--bind-iface6 --bind-linklocal=unwanted : first selects private address fc00::/7, then global address, then LL
--bind-iface6 --bind-linklocal=prefer : first selects LL, then private address fc00::/7, then global address
--bind-iface6 --bind-linklocal=force : select only LL
```
To bind to all ipv4 specify `--bind-addr "0.0.0.0"`, all ipv6 - `::`.
`--bind-addr=""` - mean bind to all ipv4 and ipv6.
If no binds are specified default bind to all ipv4 and ipv6 addresses is created.
To bind to a specific link local address do : `--bind-iface6=fe80::aaaa:bbbb:cccc:dddd%iface-name`
The `--bind-wait*` parameters can help in situations where you need to get IP from the interface, but it is not there yet, it is not raised
or not configured.
In different systems, ifup events are caught in different ways and do not guarantee that the interface has already received an IP address of a certain type.
In the general case, there is no single mechanism to hang oneself on an event of the type "link local address appeared on the X interface."
To bind to a specific ip when its interface may not be configured yet do : `--bind-addr=192.168.5.3 --bind-wait-ip=20`
It's possible to bind to any nonexistent address in transparent mode but in socks mode address must exist.
In socks proxy mode no additional system privileges are required. Connections to local IPs of the system where tpws runs are prohibited.
tpws supports remote dns resolving (curl : `--socks5-hostname` firefox : `socks_remote_dns=true`) , but does it in blocking mode.
tpws uses async sockets for all activity but resolving can break this model.
if tpws serves many clients it can cause trouble. also DoS attack is possible against tpws.
if remote resolving causes trouble configure clients to use local name resolution and use
`--no-resolve` option on tpws side.
`--disorder` is an additional flag to any split option.
It tries to simulate `--disorder2` option of `nfqws` using standard socket API without the need of additional privileges.
This works fine in Linux and MacOS but unexpectedly in FreeBSD and OpenBSD
(system sends second fragment then the whole packet instead of the first fragment).
`--tlsrec` and `--tlsrec-pos` allow to split TLS ClientHello into 2 TLS records in one TCP segment.
`--tlsrec=sni` splits between 1st and 2nd chars of the hostname. No split occurs if SNI is not present.
`--tlsrec-pos` splits at specified position. If TLS data block size is too small pos=1 is applied.
`--tlsrec` can be combined with `--split-pos` and `--disorder`.
`--tlsrec` breaks significant number of sites. Crypto libraries on end servers usually accept fine modified ClientHello
but middleboxes such as CDNs and ddos guards - not always.
Use of `--tlsrec` without filters is discouraged.
## Ways to get a list of blocked IP
nftables can't work with ipsets. Native nf sets require lots of RAM to load large ip lists with subnets and intervals.
In case you're on a low RAM system and need large lists it may be required to fall back to iptables+ipset.
1. Enter the blocked domains to `ipset/zapret-hosts-user.txt` and run `ipset/get_user.sh`
At the output, you get `ipset/zapret-ip-user.txt` with IP addresses.
2.`ipset/get_reestr_*.sh`. Russian specific
3.`ipset/get_antifilter_*.sh`. Russian specific
4.`ipset/get_config.sh`. This script calls what is written into the GETLIST variable from the config file.
If the variable is not defined, then only lists for ipsets nozapret/nozapret6 are resolved.
So, if you're not russian, the only way for you is to manually add blocked domains.
Or write your own `ipset/get_iran_blocklist.sh` , if you know where to download this one.
On routers, it is not recommended to call these scripts more than once in 2 days to minimize flash memory writes.
If one of `NFQWS_OPT_DESYNC_HTTP`/`NFQWS_OPT_DESYNC_HTTPS` is not defined it takes value of NFQWS_OPT_DESYNC.
If one of `NFQWS_OPT_DESYNC_HTTP6`/`NFQWS_OPT_DESYNC_HTTPS6` is not defined it takes value from
`NFQWS_OPT_DESYNC_HTTP`/`NFQWS_OPT_DESYNC_HTTPS`.
It means if only `NFQWS_OPT_DESYNC` is defined all four take its value.
If a variable is not defined, the value `NFQWS_OPT_DESYNC` is taken.
Separate QUIC options for ip protocol versions :
```
NFQWS_OPT_DESYNC_QUIC="--dpi-desync=fake"
NFQWS_OPT_DESYNC_QUIC6="--dpi-desync=hopbyhop"
```
If `NFQWS_OPT_DESYNC_QUIC6` is not specified `NFQWS_OPT_DESYNC_QUIC` is taken.
flow offloading control (if supported)
```
donttouch : disable system flow offloading setting if selected mode is incompatible with it, dont touch it otherwise and dont configure selective flow offloading
none : always disable system flow offloading setting and dont configure selective flow offloading
software : always disable system flow offloading setting and configure selective software flow offloading
hardware : always disable system flow offloading setting and configure selective hardware flow offloading
```
`FLOWOFFLOAD=donttouch`
The GETLIST parameter tells the install_easy.sh installer which script to call
to update the list of blocked ip or hosts.
Its called via `get_config.sh` from scheduled tasks (crontab or systemd timer).
Put here the name of the script that you will use to update the lists.
If not, then the parameter should be commented out.
You can individually disable ipv4 or ipv6. If the parameter is commented out or not equal to "1",
use of the protocol is permitted.
```
#DISABLE_IPV4=1
DISABLE_IPV6=1
```
The number of threads for mdig multithreaded DNS resolver (1..100).
The more of them, the faster, but will your DNS server be offended by hammering ?
`MDIG_THREADS=30`
temp directory. Used by ipset/*.sh scripts for large lists processing.
/tmp by default. Can be reassigned if /tmp is tmpfs and RAM is low.
TMPDIR=/opt/zapret/tmp
ipset and nfset options :
```
SET_MAXELEM=262144
IPSET_OPT="hashsize 262144 maxelem 2097152
```
Kernel automatically increases hashsize if ipset is too large for the current hashsize.
This procedure requires internal reallocation and may require additional memory.
On low RAM systems it can cause errors.
Do not use too high hashsize. This way you waste your RAM. And dont use too low hashsize to avoid reallocs.
The best way to start is to put zapret dir to `/tmp` and run `/tmp/zapret/install_easy.sh` from there.
After installation remove `/tmp/zapret` to free RAM.
The absolute minimum for openwrt is 64/8 system, 64/16 is comfortable, 128/extroot is recommended.
### Android
Its not possible to use nfqws and tpws in transparent proxy mode without root privileges.
Without root tpws can run in --socks mode.
Android has NFQUEUE and nfqws should work.
There's no ipset support unless you run custom kernel. In common case task of bringing up ipset
on android is ranging from "not easy" to "almost impossible", unless you find working kernel
image for your device.
Android does not use /etc/passwd, `tpws --user` won't work. There's replacement.
Use numeric uids in `--uid` option.
Its recommended to use gid 3003 (AID_INET), otherwise tpws will not have inet access.
Example : `--uid 1:3003`
In iptables use : `! --uid-owner 1` instead of `! --uid-owner tpws`.
Nfqws should be executed with `--uid 1`. Otherwise on some devices or firmwares kernel may partially hang. Looks like processes with certain uids can be suspended. With buggy chineese cellular interface driver this can lead to device hang.
Write your own shell script with iptables and tpws, run it using your root manager.
Autorun scripts are here :
magisk : `/data/adb/service.d`
supersu : `/system/su.d`
How to run tpws on root-less android.
You can't write to `/system`, `/data`, can't run from sd card.
Selinux prevents running executables in `/data/local/tmp` from apps.