awk Cheat Sheet

1. Mental model

What awk does

awk reads input record by record, splits each record into fields, tests patterns, runs actions, and prints or computes results.

When awk is a good fit

Use awk for row and column processing, numeric summaries, simple reports, associative arrays, joins, grouping, and log analysis in shell pipelines.

When to switch tools

Use sed for simple stream substitutions, jq for JSON, xq/xmlstarlet for XML, yq for YAML, and a real CSV parser for complex quoted CSV unless your awk supports CSV mode.

Default cycle: read a record, set $0, split fields into $1 through $NF, update counters, run matching actions, repeat. If a pattern has no action, awk prints the matching record.

2. Quick start

Task	Command	Notes
Print whole file	`awk '{ print }' file`	Same as `awk '{ print $0 }'`.
Print first column	`awk '{ print $1 }' file`	Fields are 1-based.
Print columns 1 and 3	`awk '{ print $1, $3 }' file`	Comma uses `OFS`, a space by default.
Use comma delimiter	`awk -F, '{ print $2 }' file.csv`	Good only for simple comma-separated text.
Filter matching rows	`awk '/error/' app.log`	No action means print matching records.
Filter by column	`awk '$3 >= 500 { print $1, $3 }' access.log`	Numeric comparison when operands look numeric.
Skip header	`awk 'NR > 1 { print }' table.tsv`	`NR` is the global record number.
Sum a column	`awk '{ sum += $2 } END { print sum }' file`	`END` runs after all input.
Count rows by value	`awk '{ count[$1]++ } END { for (k in count) print k, count[k] }' file`	Arrays are associative.
Set output delimiter	`awk 'BEGIN { OFS="," } { print $1, $2 }' file`	`print a, b` inserts `OFS`.

Common shape

awk 'pattern { action }' file
awk 'BEGIN { setup } pattern { action } END { summary }' file

Pipeline shape

ps aux | awk '$3 > 20 { print $2, $3, $11 }'
git log --oneline | awk '{ print NR ": " $0 }'

3. Invocation and options

Option	Use	Example
`-F fs`	Set input field separator.	`awk -F: '{ print $1 }' /etc/passwd`
`-v name=value`	Pass a variable before processing starts.	`awk -v min=10 '$2 > min' file`
`-f script.awk`	Read awk program from a file.	`awk -f report.awk data.tsv`
`--`	Stop option parsing in many awk versions.	`awk -f script.awk -- -weird-file`
`--posix`	GNU awk: use stricter POSIX behavior.	`gawk --posix -f script.awk file`
`--lint`	GNU awk: warn about questionable code.	`gawk --lint -f script.awk file`
`--csv`	GNU awk 5.3+: parse CSV input.	`gawk --csv '{ print $2 }' file.csv`

Quoting: Put awk programs in single quotes in the shell. Use -v to pass shell values instead of interpolating them into the program text.

Passing variables

min=100
awk -v min="$min" '$3 >= min { print $1, $3 }' file

Multiple input files

awk '{ print FILENAME, FNR, $0 }' *.log
awk 'FNR == 1 { print "== " FILENAME " ==" } { print }' *.txt

4. Records and fields

A record is normally one line. A field is normally one whitespace-separated token. Change RS for records and FS for fields.

Token	Meaning	Example
`$0`	Entire current record.	`awk '{ print $0 }' file`
`$1`, `$2`	Field 1, field 2.	`awk '{ print $1, $2 }' file`
`$NF`	Last field.	`awk '{ print $NF }' file`
`$(NF-1)`	Next-to-last field.	`awk '{ print $(NF-1) }' file`
`NF`	Number of fields in current record.	`awk 'NF > 3' file`
`NR`	Total records read so far.	`awk 'NR == 10' file`
`FNR`	Record number within current file.	`awk 'FNR == 1' *.csv`
`FS`	Input field separator.	`awk 'BEGIN { FS=":" } { print $1 }'`
`OFS`	Output field separator.	`awk 'BEGIN { OFS="\t" } { print $1, $2 }'`
`RS`	Input record separator.	`awk 'BEGIN { RS="" } { print NF }'`
`ORS`	Output record separator.	`awk 'BEGIN { ORS="\n\n" } { print }'`

Separators

awk -F: '{ print $1 }' /etc/passwd
awk 'BEGIN { FS="[,:]" } { print $1, $2 }' file
awk 'BEGIN { FS="\t"; OFS="," } { print $1, $3 }' table.tsv

Special whitespace mode: The default FS=" " means runs of spaces, tabs, and newlines separate fields, and leading or trailing whitespace is ignored. This is different from FS="[ \t]+" in edge cases.

Changing fields rebuilds the record

awk 'BEGIN { OFS="," } { $2 = toupper($2); print }' file
awk '{$1=$1; print}' file

5. Patterns and actions

An awk program is a sequence of pattern { action } rules. A missing pattern means every record. A missing action means print.

Pattern	Meaning	Example
`BEGIN`	Run before input is read.	`awk 'BEGIN { print "start" }'`
`END`	Run after all input is read.	`awk '{ n++ } END { print n }' file`
`/re/`	Records where `$0` matches regex.	`awk '/ERROR/' log`
`expr`	Records where expression is true.	`awk '$4 == 404' access.log`
`p1, p2`	Inclusive range from pattern 1 through pattern 2.	`awk '/BEGIN/,/END/' file`
`pattern { }`	Do nothing for matching records.	`awk 'NR == 1 { next } { print }'`

Common filters

awk 'NF' file
awk 'NF == 0' file
awk '$1 == "GET" && $9 >= 500' log
awk 'NR >= 10 && NR <= 20' file
awk 'FNR == NR { ids[$1]; next } $1 in ids' allowlist data

Truth: Empty strings and numeric zero are false. Non-empty strings and non-zero numbers are true.

6. Regular expressions

Expression	Meaning	Example
`$0 ~ /re/`	Record matches regex.	`awk '$0 ~ /error/' log`
`$1 !~ /re/`	Field does not match regex.	`awk '$1 !~ /^#/' file`
`/re/`	Shortcut for `$0 ~ /re/`.	`awk '/timeout/' log`
`^`, `$`	Start and end of string.	`awk '$1 ~ /^user[0-9]+$/' file`
`[[:digit:]]`	POSIX digit class.	`awk '$2 ~ /^[[:digit:]]+$/' file`
`[[:space:]]`	POSIX whitespace class.	`awk '$0 ~ /[[:space:]]+$/' file`
`(cat\|dog)`	Alternation and grouping.	`awk '$3 ~ /^(cat\|dog)$/' file`

Regex functions

awk '{ sub(/old/, "new"); print }' file
awk '{ gsub(/[[:space:]]+/, " "); print }' file
awk 'match($0, /id=[0-9]+/) { print substr($0, RSTART, RLENGTH) }' log

Function	Use	Notes
`sub(re, repl, target)`	Replace first match.	`target` defaults to `$0`.
`gsub(re, repl, target)`	Replace all matches.	Returns replacement count.
`match(str, re)`	Find regex in string.	Sets `RSTART` and `RLENGTH`.
`split(str, arr, re)`	Split string into array.	Returns number of parts.

7. Variables and built-ins

Variables spring into existence when used. Uninitialized variables behave like an empty string or numeric zero depending on context.

User variables

awk '{ total += $2; seen = 1 } END { print total }' file
awk -v label="Total" '{ sum += $1 } END { print label, sum }' file

Numeric and string context

awk '$1 + 0 > 10' file
awk '$1 "" == "0012"' file

Environment

awk 'BEGIN { print ENVIRON["HOME"] }'
awk -v user="$USER" 'BEGIN { print user }'

Variable	Meaning
`ARGC`, `ARGV`	Command-line argument count and array.
`CONVFMT`	Number-to-string conversion format, default `%.6g`.
`FILENAME`	Current input file name.
`FNR`	Record number in current file.
`FS`, `OFS`	Input and output field separators.
`NF`	Field count for current record.
`NR`	Total input records read.
`OFMT`	Output format for numbers printed with `print`.
`ORS`, `RS`	Output and input record separators.
`RLENGTH`, `RSTART`	Result position and length from `match()`.
`SUBSEP`	Separator used internally for multi-index arrays.

8. Operators

Operator	Meaning	Example
`+ - * / % ^`	Arithmetic.	`awk '{ print ($2 * 100) / $3 }'`
`++ --`	Increment and decrement.	`awk '{ count++ } END { print count }'`
`= += -= *= /= %=`	Assignment.	`awk '{ sum += $1 }'`
`== != < <= > >=`	Comparison.	`awk '$3 >= 90'`
`~ !~`	Regex match and not match.	`awk '$1 ~ /^api-/'`
`&& \|\| !`	Logical and, or, not.	`awk '$1 == "GET" && $9 == 200'`
`expr ? a : b`	Ternary expression.	`awk '{ print $3 > 0 ? "up" : "down" }'`
`in`	Array membership.	`awk '$1 in seen { print $1 }'`
`space`	String concatenation.	`awk '{ print $1 ":" $2 }'`

Concatenation has no operator: $1 $2 means join field 1 and field 2 directly. Use explicit separators such as $1 ":" $2 when output must be unambiguous.

9. Arrays

awk arrays are associative maps. Indexes are strings, even when they look numeric.

Counting

awk '{ count[$1]++ }
END { for (key in count) print key, count[key] }' file

Membership

awk 'FNR == NR { keep[$1]; next }
$1 in keep { print }' ids.txt data.txt

Delete

awk '{ seen[$1]++ }
END { delete seen["tmp"]; for (k in seen) print k }' file

Composite keys

awk '{ count[$1, $2]++ }
END {
  for (key in count) {
    split(key, parts, SUBSEP)
    print parts[1], parts[2], count[key]
  }
}' file

Sorting output

awk '{ count[$1]++ } END { for (k in count) print k, count[k] }' file | sort
gawk 'BEGIN { PROCINFO["sorted_in"] = "@ind_str_asc" }
{ count[$1]++ } END { for (k in count) print k, count[k] }' file

10. Control flow

Construct	Example	Notes
`if`	`if ($3 > 10) print $1`	Use braces for multiple statements.
`if / else`	`{ if ($1 == "") print "empty"; else print $1 }`	Semicolons separate inline statements.
`while`	`while (i <= NF) { print $i; i++ }`	Condition checked first.
`do / while`	`do { i++ } while (i < 10)`	Runs at least once.
`for`	`for (i = 1; i <= NF; i++) print $i`	Useful for fields.
`for in`	`for (k in count) print k, count[k]`	Array order is unspecified in POSIX awk.
`next`	`NR == 1 { next }`	Skip remaining rules for current record.
`nextfile`	`/STOP/ { nextfile }`	GNU awk and some modern awks; not POSIX.
`exit`	`NR > 100 { exit }`	Still runs `END` blocks.

Multi-rule flow

awk '
NR == 1 { next }
$3 == "" { missing++; next }
$3 >= 90 { print $1, "pass"; next }
{ print $1, "review" }
END { print "missing:", missing }
' scores.tsv

11. Functions

Function	Meaning	Example
`length(str)`	String length.	`awk 'length($0) > 120'`
`substr(str, start, len)`	Substring, 1-based.	`awk '{ print substr($1, 1, 8) }'`
`index(str, find)`	Position of substring or 0.	`awk 'index($0, "TODO")'`
`split(str, arr, sep)`	Split string into array.	`awk '{ split($1, a, "-"); print a[1] }'`
`sprintf(fmt, ...)`	Return formatted string.	`awk '{ s = sprintf("%.2f", $1) }'`
`tolower(str)`, `toupper(str)`	Change case.	`awk '{ print tolower($1) }'`
`int(x)`	Truncate toward zero.	`awk '{ print int($1 / 60) }'`
`sqrt(x)`, `log(x)`, `exp(x)`	Math functions.	`awk '{ print sqrt($1) }'`
`sin(x)`, `cos(x)`, `atan2(y, x)`	Trigonometry in radians.	`awk 'BEGIN { print atan2(0, -1) }'`
`rand()`, `srand(seed)`	Pseudo-random numbers.	`awk 'BEGIN { srand(); print rand() }'`
`system(cmd)`	Run shell command.	`awk '{ system("mkdir -p " $1) }'`

User-defined functions

awk '
function pct(part, whole) {
  return whole ? (part * 100 / whole) : 0
}
{ used += $2; total += $3 }
END { printf "%.1f%%\n", pct(used, total) }
' file

12. Input and output

Feature	Example	Notes
`print`	`print $1, $2`	Adds `ORS` at the end.
`printf`	`printf "%-20s %8.2f\n", $1, $2`	No automatic newline.
Overwrite file	`print $0 > "out.txt"`	First write truncates.
Append file	`print $0 >> "out.txt"`	Appends to existing content.
Pipe output	`print $1 \| "sort -u"`	Use `close()` when done.
Read line	`getline line < "extra.txt"`	Returns 1, 0, or -1.
Read command output	`"date +%F" \| getline today`	Close the command string after reading.
Close stream	`close("out.txt")`	Important for many dynamic files or pipes.

Split records into files

awk '{ print > ($1 ".txt"); close($1 ".txt") }' file

Read a lookup file manually

awk 'BEGIN {
  while ((getline line < "names.txt") > 0) {
    split(line, parts, "\t")
    name[parts[1]] = parts[2]
  }
  close("names.txt")
}
{ print $1, name[$1] }' ids.txt

13. Formatting output

Use print for simple separated fields and printf for aligned reports, fixed decimals, zero padding, and custom layouts.

Format	Meaning	Example
`%s`	String.	`printf "%s\n", $1`
`%d`	Integer.	`printf "%04d\n", NR`
`%f`	Floating point.	`printf "%.2f\n", $2`
`%e`, `%g`	Scientific or compact number.	`printf "%g\n", $2`
`%-10s`	Left-align in width 10.	`printf "%-10s %8d\n", $1, $2`
`%8.2f`	Width 8, 2 decimals.	`printf "%8.2f\n", total`
`%%`	Literal percent sign.	`printf "%.1f%%\n", pct`

Report layout

awk '
BEGIN { printf "%-20s %10s\n", "name", "bytes" }
{ total += $2; printf "%-20s %10d\n", $1, $2 }
END { printf "%-20s %10d\n", "TOTAL", total }
' sizes.tsv

14. CSV and structured data

Simple delimited text

Use -F, only when commas cannot appear inside quoted fields and records cannot contain embedded newlines.

awk -F, 'BEGIN { OFS="," } NR > 1 { print $1, $4 }' file.csv

GNU awk CSV mode

GNU awk 5.3+ has --csv, which handles CSV quoting for input.

gawk --csv 'NR > 1 { print $2 }' file.csv

Structured formats

Use format-aware tools for JSON, YAML, XML, and HTML. Regex and field splitting are usually brittle for nested structured data.

jq -r '.items[] | [.name, .size] | @tsv' data.json | awk '{ sum += $2 } END { print sum }'

CSV warning: awk -F, is not a CSV parser. It breaks on fields like "last, first" unless your awk has CSV support or the input is known to be simple.

15. Shell integration

Need	Use	Why
Pass shell variable	`awk -v name="$name" '$1 == name'`	Avoids quote injection and broken spaces.
Use tabs	`awk -F '\t' 'BEGIN { OFS="\t" } ...'`	Single-quoted awk strings can contain escaped tabs.
Exit with failure	`awk 'bad { exit 1 }'`	Use in scripts and CI checks.
Combine with sort	`awk '{ count[$1]++ } END { for (k in count) print count[k], k }' file \| sort -nr`	awk computes; sort orders.
Read stdin	`cmd \| awk '{ print $1 }'`	No file argument means standard input.
Mix stdin and files	`awk '...' - file.txt`	`-` means standard input in most awks.

Safe shell variable pattern

needle='a value with spaces'
awk -v needle="$needle" '$0 == needle { print NR }' file

Shell function wrapper

top_statuses() {
  awk '{ count[$9]++ } END { for (s in count) print count[s], s }' "$@" | sort -nr
}

16. Script files

Move longer awk programs into .awk files when quoting, indentation, or reuse starts to matter.

Run with -f

awk -f report.awk data.tsv
awk -v min=100 -f filter.awk data.tsv

Executable script

#!/usr/bin/awk -f
BEGIN { FS = "\t"; OFS = "\t" }
NR > 1 { print $1, $3 }

Readable report script

BEGIN {
  FS = "\t"
  OFS = "\t"
}

NR == 1 {
  next
}

{
  rows++
  total += $3
  by_team[$2] += $3
}

END {
  print "rows", rows
  print "total", total
  for (team in by_team) {
    print team, by_team[team]
  }
}

17. Recipes

Task	Command
Number lines	`awk '{ print NR, $0 }' file`
Print line length	`awk '{ print length, $0 }' file`
Unique lines, keep first occurrence	`awk '!seen[$0]++' file`
Duplicate lines only	`awk 'seen[$0]++ == 1' file`
Print first N lines	`awk 'NR <= 10' file`
Print last field	`awk '{ print $NF }' file`
Print records with at least 4 fields	`awk 'NF >= 4' file`
Trim leading and trailing whitespace	`awk '{ gsub(/^[[:space:]]+\|[[:space:]]+$/, ""); print }' file`
Convert whitespace to CSV-ish output	`awk 'BEGIN { OFS="," } { print $1, $2, $3 }' file`
Average column	`awk '{ sum += $2; n++ } END { if (n) print sum / n }' file`
Min and max column	`awk 'NR == 1 { min = max = $2 } { if ($2 < min) min = $2; if ($2 > max) max = $2 } END { print min, max }' file`
Group sum	`awk '{ sum[$1] += $2 } END { for (k in sum) print k, sum[k] }' file`
Left join two files by first field	`awk 'FNR == NR { name[$1] = $2; next } { print $0, name[$1] }' names.tsv data.tsv`
Compare two files, print keys only in second	`awk 'FNR == NR { seen[$1]; next } !($1 in seen)' first second`
Extract text between markers	`awk '/BEGIN/,/END/' file`
Remove text between markers	`awk '/BEGIN/{skip=1} !skip {print} /END/{skip=0}' file`
Print paragraph records	`awk 'BEGIN { RS=""; ORS="\n\n" } /needle/' file`
Show top counts	`awk '{ count[$1]++ } END { for (k in count) print count[k], k }' file \| sort -nr \| head`

Access log summary

awk '{
  status[$9]++
  bytes += $10
}
END {
  for (s in status) print s, status[s]
  print "bytes", bytes
}' access.log

Find slow requests

awk -v threshold=1.0 '$NF > threshold { print $1, $7, $NF }' access.log

18. Portability and gotchas

Topic	Guidance
POSIX awk	Use POSIX features when scripts must run everywhere: fields, patterns, actions, associative arrays, standard string/math functions, and simple regexes.
GNU awk features	`nextfile`, `gensub()`, `asort()`, `asorti()`, `PROCINFO`, `IGNORECASE`, `ARGIND`, networking, and `--csv` are GNU awk features or extensions.
macOS awk	macOS ships a BSD-derived awk. Install GNU awk as `gawk` when you need GNU-only behavior.
In-place editing	awk is not primarily an in-place editor. Use a temp file and move it into place, or GNU awk's `-i inplace` extension when acceptable.
Array order	`for (k in array)` order is unspecified unless using GNU awk sorting controls or an external `sort`.
Floating point	Numbers are floating point. Avoid awk for exact decimal money math unless rounding rules are simple and acceptable.
String vs number comparison	awk chooses numeric or string comparison from operand types. Force numeric with `+0` and string with concatenation to `""`.
Locale	Character classes, sorting, and case conversion can depend on locale. Set `LC_ALL=C` for byte-oriented, reproducible command-line processing.
Quoting	Prefer single quotes around programs and `-v` for shell data. Avoid building awk source by concatenating untrusted shell strings.

Practical default: Write one-liners with POSIX awk syntax unless you control the runtime. Use gawk explicitly in scripts that rely on GNU awk extensions.