Air Date:
Latest update:
Date: Fri, 1 Mar 2024 10:49:42 -0500
From: Douglas McIlroy <douglas.mcilroy@dartmouth.edu>
Newsgroups: gmane.org.unix-heritage.general
Subject: Of flags and keyletters
Message-ID: <CAKH6PiV3ixuwoZ-d31JNXpQpHxAAcfpRKreUcn11msW1yjboLg@mail.gmail.com>
> why did AT&T refer to "flags" as "keyletters" in its SysV documentation?
Bureaucracies beget bureaucratese--polysyllabic obfuscation, witness
APPLICATION USAGE in place of BUGS.
One might argue that replacing "flag" by "option", thus doubling the number
of syllables, was a small step in that direction. In fact it was a
deliberate attempt to discard jargon in favor of normal English usage.
Tags: quote, ойті
Authors: ag
Air Date:
Latest update:
I recently saw a tweet where a guy was asking how to download curl
within a minimal Debian container that didn't have any scripting
language installed except for Bash, no wget, or anything like that.
If such a container has apt-get, but you lack permission to run it,
there is a reliable way to force apt-get to download a .deb file with
all its dependencies under a regular user, but we won't discuss that
here.
I got curious about how hard it would be to write a primitive HTTP
get-only client in Bash, as Bash is typically compiled with "network"
redirection support:
$ exec 3<> /dev/tcp/www.gnu.org/80
$ printf "%s\r\n" 'HEAD /robots.txt HTTP/1.1' >&3
$ printf "%s\r\n\r\n" 'Host: www.gnu.org' >&3
$ cat <&3
HTTP/1.1 200 OK
Date: Sun, 11 Feb 2024 07:02:40 GMT
Server: Apache/2.4.29
Content-Type: text/plain
Content-Language: non-html
…
This could've been useful before the days of TLS everywhere, but it
won't suffice now: to download a statically compiled curl binary from
Github, we need TLS support and proper handling of 302
redirections. Certainly, it's possible to cheat: put the binary on our
web server and serve it under plain HTTP, but that would be too easy.
What if we use ncat+openssl as a forward TLS proxy? ncat may serve as
an initd-like super-server, invoking "openssl s_client" on each
connection:
$ cat proxy.sh
#!/bin/sh
read -r host
openssl s_client -quiet -no_ign_eof -verify_return_error "$host"
$ ncat -vk -l 10.10.10.10 1234 -e proxy.sh
The 1st thing we need in the bash-http-get client is URL parsing. It
wouldn't have been necessary if Github served files directly from
"Releases" pages, but it does so through redirects. Therefore, when we
grab Location
header from a response, we need to disentangle its
hostname from a pathname.
Ideally, it should work like URL()
constructor in JavaScript:
$ node -pe 'new URL("https://q.example.com:8080/foo?q=1&w=2#lol")'
URL {
href: 'https://q.example.com:8080/foo?q=1&w=2#lol',
origin: 'https://q.example.com:8080',
protocol: 'https:',
username: '',
password: '',
host: 'q.example.com:8080',
hostname: 'q.example.com',
port: '8080',
pathname: '/foo',
search: '?q=1&w=2',
searchParams: URLSearchParams { 'q' => '1', 'w' => '2' },
hash: '#lol'
}
StackOverflow has various examples of how to achieve that using
regular expressions, but none of them were able to parse the example
above. I tried asking ChatGPT to repair the regex, but it only made it
worse. Miraculously, Google's Gemini supposedly fixed the regex on the
second try (I haven't tested it extensively).
$ cat lib.bash
declare -A URL
url_parse() {
local pattern='^(([^:/?#]+):)?(//((([^:/?#]+)@)?([^:/?#]+)(:([0-9]+))?))?(/([^?#]*))?(\?([^#]*))?(#(.*))?'
[[ "$1" =~ $pattern ]] && [ "${BASH_REMATCH[2]}" ] && [ "${BASH_REMATCH[4]}" ] || return 1
URL=(
[proto]=${BASH_REMATCH[2]}
[host]=${BASH_REMATCH[4]}
[hostname]=${BASH_REMATCH[7]}
[port]=${BASH_REMATCH[9]}
[pathname]=${BASH_REMATCH[10]:-/}
[search]=${BASH_REMATCH[12]}
[hash]=${BASH_REMATCH[14]}
)
}
Next, we need to separate headers from a response body. This means
looking for the 1st occurrence of \r\n\r\n
. Sounds easy,
grep -aobx $'\r' file | head -1
until you decide to port the client to a BusyBox-based system like
Alpine Linux. The latter has grep that doesn't support -ab
options. There are some advices on employing od(1), but no
examples. If we print a file using a 2-column format:
0000000 68
0000001 20
0000002 3a
…
where the left column is a decimal offset, we can convert the 1st 32KB
of the response into a single line and search for the pattern using
grep -o
:
od -N $((32*1024)) -t x1 -Ad -w1 -v "$tmp" | tr '\n' ' ' | \
grep -o '....... 0d ....... 0a ....... 0d ....... 0a' | \
awk '{if (NR==1) print $7+0}'
Here's the full version of the client that supports only URLs with the
https
protocol. It saves the response in a temporary file and looks
for the \r\n\r\n
offset. If the HTTP status code was 200, it prints
the body to stdout. If it was 302, it extracts the value of the
Location
header and recursively calls itself with a new URL.
#!/usr/bin/env bash
set -e -o pipefail
. "$(dirname "$(readlink -f "$0")")/lib.bash"
tmp=`mktemp fetch.XXXXXX`
trap 'rm -f $tmp' 0 1 2 15
eh() { echo "$*" 1>&2; exit 2; }
[ $# = 3 ] || eh Usage: fetch.bash proxy_host proxy_port url
proxy_host=$1
proxy_port=$2
url=$3
get() {
url_parse "$1"; [ "${URL[proto]}" = https ] || return 1
exec 3<> "/dev/tcp/$proxy_host/$proxy_port" || return 1
echo "${URL[hostname]}:${URL[port]:-443}" >&3
printf "GET %s HTTP/1.1\r\n" "${URL[pathname]}${URL[search]}${URL[hash]}" >&3
printf '%s: %s\r\n' Host "${URL[hostname]}" Connection close >&3
printf '\r\n' >&3
cat <&3
}
get "$url" > "$tmp" || eh ':('
[ -s "$tmp" ] || eh 'Empty reply, TLS error?'
offset_calc() {
if echo 1 | grep -aobx 1 >/dev/null 2>&1; then # gnu-like grep
grep -aobx $'\r' "$tmp" | head -1 | tr -d '\r\n:' | \
xargs -r expr 1 +
else # busybox?
od -N $((32*1024)) -t x1 -Ad -w1 -v "$tmp" | tr '\n' ' ' | \
grep -o '....... 0d ....... 0a ....... 0d ....... 0a' | \
awk '{if (NR==1) print $7+0}'
fi || echo -1
}
offset=`offset_calc`
headers() { head -c "$offset" "$tmp" | tr -d '\r'; }
hdr() { headers | grep -m1 -i "^$1:" | cut -d' ' -f2; }
status=`head -1 "$tmp" | cut -d' ' -f2`
case "$status" in
200) [ "$offset" = -1 ] && offset=-2 # invalid responce, dump all
tail -c+$((offset + 2)) "$tmp"
[ "$offset" -gt 0 ] ;;
302) headers 1>&2; echo 1>&2
hdr location | xargs "$0" "$1" "$2" ;;
*) headers 1>&2; exit 1
esac
It should work even on Alpine Linux of FreeBSD:
$ ./fetch.bash 10.10.10.10 1234 https://github.com/stunnel/static-curl/releases/download/8.6.0/curl-linux-arm64-8.6.0.tar.xz > curl.tar.xz
HTTP/1.1 302 Found
Location: https://objects.githubusercontent.com/…
…
$ file curl.tar.xz
curl.tar.xz: XZ compressed data, checksum CRC64
Tags: ойті
Authors: ag
Air Date:
Latest update:
Do you benchmark compression tools (like xz or zstd) on your own data,
or do you rely on common wisdom? The best result for an uncompressed
300MB XFS image from the previous post was achieved by bzip2, which is
rarely used nowadays. How does one quickly check a chunk of data
against N popular compressors?
E.g., an unpacked tarball of Emacs 29.2 source code consists of 6791
files with a total size of 276MB. If you were to distribute it as a
.tar.something archive, which compression tool would be the optimal
choice? We can easily write a small utility that answers this
question.
$ ./comprtest ~/opt/src/emacs/emacs-29.2 | tee table
tar: Removing leading `/' from member names
szip 0.59 56.98 126593557
gzip 9.21 72.70 80335332
compress 3.57 57.45 125217137
bzip2 17.28 78.08 64509672
rzip 17.61 79.50 60336377
lzip 113.61 81.67 53935898
lzop 0.67 57.14 126121462
xz 111.03 81.89 53295220
brotli 13.10 78.14 64336399
zstd 1.13 73.77 77179446
comprtest
is a 29 LOC long shell script. The 2nd column here
indicates time in seconds, the 3rd column displays
, representing space saving in % (higher % is better), & the 4th column
shows the final result in bytes.
Then we can sort the table by the 3rd column & draw a bar chart:
$ sort -nk3 table | cpp -P plot.gp | gnuplot -persist
If you're wondering why all of a sudden the C preprocessor becomes
part of it, read on.
comprtest
expects either a file as an argument or a directory (in
which case it creates a plain .tar of it first). Additional optional
arguments specify which compressors to use:
$ ./comprtest /usr/libexec/gdb gzip brotli
gzip 0.60 61.17 6054706
brotli 1.17 65.84 5325408
The gist of the script involves looping over a list of
compressors:
archivers='szip gzip compress bzip2 rzip lzip lzop xz brotli zstd'
…
for c in ${@:-$archivers}; do
echo $c
case $c in
szip ) args='< "$input" > $output' ;;
rzip ) args='-k -o $output "$input"' ;;
brotli ) args='-6 -c "$input" > $output' ;;
* ) args='-c "$input" > $output'
esac
eval "time -p $c $args" 2>&1 | awk '/real/ {print $2}'
osize=`wc -c < $output`
echo $isize $osize | awk '{print 100*(1-$2/($1==0?$2:$1))}'
echo $osize
rm $output
done | xargs -n4 printf "%-8s %11.2f %6.2f %15d\n"
- Not every archive tool has gzip-compatible CLI.
- We are using a default compression level for each tool with the
exception of
brotli
, as its default level 11 is excruciatingly
slow.
- szip is an interface to the Snappy algorithm. Your distro probably
doesn't have it in its repos, hence run
cargo install szip
. Everything else should be available via dnf/apt.
Bar charts are generated by a gnuplot script:
$ cat plot.gp
$data <<E
#include "/dev/stdin"
E
set key tmargin
set xtics rotate by -30 left
set y2tics
set ylabel "Seconds"
set y2label "%"
set style data histograms
set style fill solid
plot $data using 2 axis x1y1 title "Time", \
"" using 3:xticlabels(1) axis x1y2 title "Space saving"
Here is where the C preprocessor comes in handy: without an injected
"datablock" it won't be possible to draw a graph with 2 ordinates when
reading data from stdin.
In an attempt to demonstrate that xz is not always the best choice, I
benchmarked a bunch of XML files (314MB):
$ ./comprtest ~/Downloads/emacs.stackexchange.com.tar
szip 0.59 63.70 119429565
gzip 7.18 77.59 73724710
compress 4.03 67.17 108015563
bzip2 21.37 83.36 54751478
rzip 17.42 85.93 46304199
lzip 119.70 85.06 49151518
lzop 0.67 63.63 119667058
xz 125.80 85.55 47559464
brotli 13.56 82.52 57509978
zstd 1.07 79.40 67766890
Tags: ойті
Authors: ag
Air Date:
Latest update:
As a prank, how do you create an archive in Linux that ⓐ cannot be
opened in Windows (without WSL2 or Cygwin), ⓑ can be opened in MacOS
of FreeBSD?
Creating an .cpio or .tar.xz won't cut it: file archivers such as
7-Zip are free & easy to install. Furthermore, sending an ext4
image, generated as follows:
$ truncate -s 10M file.img
$ mkfs.ext4 file.img
$ sudo mount -o loop file.img /somewhere
$ sudo cp something /somewhere
$ sudo umount /somewhere
doesn't help nowadays, for 7-Zip opens them too. Although disk cloning utils like
FSArchiver can produce an image file from a directory, they are
exclusive to Linux.
It boils down to this: which filesystems can be read across
Linux/MacOS/FreeBSD that Windows file archivers don't recognise? This
rules out fat/ntfs/udf, for they are too common, or f2fs/nilfs2, for
they are Linux-only.
The only viable candidate I found is XFS. Btrfs was a contender, but
I'm unsure how to mount it on Mac.
Below is a script to automate the creation of prank archives. It takes
any zip/tar.gz (or anything else that bsdtar is able to parse) &
outputs an image file in the format specified by the output file
extension:
sudo ./mkimg file.zip file.xfs
It requires sudo, for mount -o loop
can't be done under a regular
user.
#!/bin/sh
set -e
input=$1
output=$2
type=${2##*.}
[ -r "$input" ] && [ "$output" ] && [ "`id -u`" = 0 ] || {
echo Usage: sudo mkimg file.zip file.ext2 1>&2
exit 1
}
mkfs=mkfs.$type
cmd() { for c; do command -v $c >/dev/null || { echo no $c; return 1; }; done; }
cmd bsdtar "$mkfs"
cleanup() {
set +e
umount "$mnt" 2>/dev/null
rm -rf "$mnt" "$log"
[ "$ok" ] || rm -f "$output"
}
trap cleanup 0 1 2 15
usize=`bsdtar tvf "$input" | awk '{s += $5} END {print s}'`
mnt=`mktemp -d`
log=`mktemp`
case "$type" in
msdos|*fat) size=$((1024*1024 + usize*2)); opt_tar=--no-same-owner ;;
ext*|udf ) size=$((1024*1024 + usize*2)) ;;
f2fs ) size=$((1024*1024*50 + usize*2)) ;;
btrfs ) size=$((114294784 + usize*2)) ;;
nilfs2 ) size=$((134221824 + usize*2)) ;;
xfs ) size=$((1024*1024*300 + usize*2)) ;;
jfs ) size=$((1024*1024*16 + usize*2)); opt=-q ;;
hfsplus )
size=$((1024*1024 + usize*2))
[ $((size % 4096)) != 0 ] && size=$((size + (4096-(size % 4096)))) ;;
*) echo "$type is untested" 1>&2; exit 1
esac
rm -f "$output"
truncate -s $size "$output"
$mkfs $opt "$output" > "$log" 2>&1 || { cat "$log"; exit 1; }
mount -o loop "$output" "$mnt"
bsdtar -C "$mnt" $opt_tar --chroot -xf "$input"
[ "$SUDO_UID" ] && chown "$SUDO_UID:$SUDO_GID" "$output"
ok=1
.xfs files start at a size of 300MB, even if you place a
0-length file in it, but bzip2 compresses such an image into 6270
bytes.
To mount an .xfs under a regular user, use
libfsxfs.
7z -i
prints all supported formats.
Tags: ойті
Authors: ag