1. Mental model
What awk does
awk reads input record by record, splits each record into fields, tests patterns, runs actions, and prints or computes results.
When awk is a good fit
Use awk for row and column processing, numeric summaries, simple reports, associative arrays, joins, grouping, and log analysis in shell pipelines.
When to switch tools
Use sed for simple stream substitutions, jq for JSON, xq/xmlstarlet for XML, yq for YAML, and a real CSV parser for complex quoted CSV unless your awk supports CSV mode.
Default cycle: read a record, set $0, split fields into $1 through $NF, update counters, run matching actions, repeat. If a pattern has no action, awk prints the matching record.
2. Quick start
| Task | Command | Notes |
| Print whole file | awk '{ print }' file | Same as awk '{ print $0 }'. |
| Print first column | awk '{ print $1 }' file | Fields are 1-based. |
| Print columns 1 and 3 | awk '{ print $1, $3 }' file | Comma uses OFS, a space by default. |
| Use comma delimiter | awk -F, '{ print $2 }' file.csv | Good only for simple comma-separated text. |
| Filter matching rows | awk '/error/' app.log | No action means print matching records. |
| Filter by column | awk '$3 >= 500 { print $1, $3 }' access.log | Numeric comparison when operands look numeric. |
| Skip header | awk 'NR > 1 { print }' table.tsv | NR is the global record number. |
| Sum a column | awk '{ sum += $2 } END { print sum }' file | END runs after all input. |
| Count rows by value | awk '{ count[$1]++ } END { for (k in count) print k, count[k] }' file | Arrays are associative. |
| Set output delimiter | awk 'BEGIN { OFS="," } { print $1, $2 }' file | print a, b inserts OFS. |
Common shape
awk 'pattern { action }' file
awk 'BEGIN { setup } pattern { action } END { summary }' file
Pipeline shape
ps aux | awk '$3 > 20 { print $2, $3, $11 }'
git log --oneline | awk '{ print NR ": " $0 }'
3. Invocation and options
| Option | Use | Example |
-F fs | Set input field separator. | awk -F: '{ print $1 }' /etc/passwd |
-v name=value | Pass a variable before processing starts. | awk -v min=10 '$2 > min' file |
-f script.awk | Read awk program from a file. | awk -f report.awk data.tsv |
-- | Stop option parsing in many awk versions. | awk -f script.awk -- -weird-file |
--posix | GNU awk: use stricter POSIX behavior. | gawk --posix -f script.awk file |
--lint | GNU awk: warn about questionable code. | gawk --lint -f script.awk file |
--csv | GNU awk 5.3+: parse CSV input. | gawk --csv '{ print $2 }' file.csv |
Quoting: Put awk programs in single quotes in the shell. Use -v to pass shell values instead of interpolating them into the program text.
Passing variables
min=100
awk -v min="$min" '$3 >= min { print $1, $3 }' file
Multiple input files
awk '{ print FILENAME, FNR, $0 }' *.log
awk 'FNR == 1 { print "== " FILENAME " ==" } { print }' *.txt
4. Records and fields
A record is normally one line. A field is normally one whitespace-separated token. Change RS for records and FS for fields.
| Token | Meaning | Example |
$0 | Entire current record. | awk '{ print $0 }' file |
$1, $2 | Field 1, field 2. | awk '{ print $1, $2 }' file |
$NF | Last field. | awk '{ print $NF }' file |
$(NF-1) | Next-to-last field. | awk '{ print $(NF-1) }' file |
NF | Number of fields in current record. | awk 'NF > 3' file |
NR | Total records read so far. | awk 'NR == 10' file |
FNR | Record number within current file. | awk 'FNR == 1' *.csv |
FS | Input field separator. | awk 'BEGIN { FS=":" } { print $1 }' |
OFS | Output field separator. | awk 'BEGIN { OFS="\t" } { print $1, $2 }' |
RS | Input record separator. | awk 'BEGIN { RS="" } { print NF }' |
ORS | Output record separator. | awk 'BEGIN { ORS="\n\n" } { print }' |
Separators
awk -F: '{ print $1 }' /etc/passwd
awk 'BEGIN { FS="[,:]" } { print $1, $2 }' file
awk 'BEGIN { FS="\t"; OFS="," } { print $1, $3 }' table.tsv
Special whitespace mode: The default FS=" " means runs of spaces, tabs, and newlines separate fields, and leading or trailing whitespace is ignored. This is different from FS="[ \t]+" in edge cases.
Changing fields rebuilds the record
awk 'BEGIN { OFS="," } { $2 = toupper($2); print }' file
awk '{$1=$1; print}' file
5. Patterns and actions
An awk program is a sequence of pattern { action } rules. A missing pattern means every record. A missing action means print.
| Pattern | Meaning | Example |
BEGIN | Run before input is read. | awk 'BEGIN { print "start" }' |
END | Run after all input is read. | awk '{ n++ } END { print n }' file |
/re/ | Records where $0 matches regex. | awk '/ERROR/' log |
expr | Records where expression is true. | awk '$4 == 404' access.log |
p1, p2 | Inclusive range from pattern 1 through pattern 2. | awk '/BEGIN/,/END/' file |
pattern { } | Do nothing for matching records. | awk 'NR == 1 { next } { print }' |
Common filters
awk 'NF' file
awk 'NF == 0' file
awk '$1 == "GET" && $9 >= 500' log
awk 'NR >= 10 && NR <= 20' file
awk 'FNR == NR { ids[$1]; next } $1 in ids' allowlist data
Truth: Empty strings and numeric zero are false. Non-empty strings and non-zero numbers are true.
6. Regular expressions
| Expression | Meaning | Example |
$0 ~ /re/ | Record matches regex. | awk '$0 ~ /error/' log |
$1 !~ /re/ | Field does not match regex. | awk '$1 !~ /^#/' file |
/re/ | Shortcut for $0 ~ /re/. | awk '/timeout/' log |
^, $ | Start and end of string. | awk '$1 ~ /^user[0-9]+$/' file |
[[:digit:]] | POSIX digit class. | awk '$2 ~ /^[[:digit:]]+$/' file |
[[:space:]] | POSIX whitespace class. | awk '$0 ~ /[[:space:]]+$/' file |
(cat|dog) | Alternation and grouping. | awk '$3 ~ /^(cat|dog)$/' file |
Regex functions
awk '{ sub(/old/, "new"); print }' file
awk '{ gsub(/[[:space:]]+/, " "); print }' file
awk 'match($0, /id=[0-9]+/) { print substr($0, RSTART, RLENGTH) }' log
| Function | Use | Notes |
sub(re, repl, target) | Replace first match. | target defaults to $0. |
gsub(re, repl, target) | Replace all matches. | Returns replacement count. |
match(str, re) | Find regex in string. | Sets RSTART and RLENGTH. |
split(str, arr, re) | Split string into array. | Returns number of parts. |
7. Variables and built-ins
Variables spring into existence when used. Uninitialized variables behave like an empty string or numeric zero depending on context.
User variables
awk '{ total += $2; seen = 1 } END { print total }' file
awk -v label="Total" '{ sum += $1 } END { print label, sum }' file
Numeric and string context
awk '$1 + 0 > 10' file
awk '$1 "" == "0012"' file
Environment
awk 'BEGIN { print ENVIRON["HOME"] }'
awk -v user="$USER" 'BEGIN { print user }'
| Variable | Meaning |
ARGC, ARGV | Command-line argument count and array. |
CONVFMT | Number-to-string conversion format, default %.6g. |
FILENAME | Current input file name. |
FNR | Record number in current file. |
FS, OFS | Input and output field separators. |
NF | Field count for current record. |
NR | Total input records read. |
OFMT | Output format for numbers printed with print. |
ORS, RS | Output and input record separators. |
RLENGTH, RSTART | Result position and length from match(). |
SUBSEP | Separator used internally for multi-index arrays. |
8. Operators
| Operator | Meaning | Example |
+ - * / % ^ | Arithmetic. | awk '{ print ($2 * 100) / $3 }' |
++ -- | Increment and decrement. | awk '{ count++ } END { print count }' |
= += -= *= /= %= | Assignment. | awk '{ sum += $1 }' |
== != < <= > >= | Comparison. | awk '$3 >= 90' |
~ !~ | Regex match and not match. | awk '$1 ~ /^api-/' |
&& || ! | Logical and, or, not. | awk '$1 == "GET" && $9 == 200' |
expr ? a : b | Ternary expression. | awk '{ print $3 > 0 ? "up" : "down" }' |
in | Array membership. | awk '$1 in seen { print $1 }' |
space | String concatenation. | awk '{ print $1 ":" $2 }' |
Concatenation has no operator: $1 $2 means join field 1 and field 2 directly. Use explicit separators such as $1 ":" $2 when output must be unambiguous.
9. Arrays
awk arrays are associative maps. Indexes are strings, even when they look numeric.
Counting
awk '{ count[$1]++ }
END { for (key in count) print key, count[key] }' file
Membership
awk 'FNR == NR { keep[$1]; next }
$1 in keep { print }' ids.txt data.txt
Delete
awk '{ seen[$1]++ }
END { delete seen["tmp"]; for (k in seen) print k }' file
Composite keys
awk '{ count[$1, $2]++ }
END {
for (key in count) {
split(key, parts, SUBSEP)
print parts[1], parts[2], count[key]
}
}' file
Sorting output
awk '{ count[$1]++ } END { for (k in count) print k, count[k] }' file | sort
gawk 'BEGIN { PROCINFO["sorted_in"] = "@ind_str_asc" }
{ count[$1]++ } END { for (k in count) print k, count[k] }' file
10. Control flow
| Construct | Example | Notes |
if | if ($3 > 10) print $1 | Use braces for multiple statements. |
if / else | { if ($1 == "") print "empty"; else print $1 } | Semicolons separate inline statements. |
while | while (i <= NF) { print $i; i++ } | Condition checked first. |
do / while | do { i++ } while (i < 10) | Runs at least once. |
for | for (i = 1; i <= NF; i++) print $i | Useful for fields. |
for in | for (k in count) print k, count[k] | Array order is unspecified in POSIX awk. |
next | NR == 1 { next } | Skip remaining rules for current record. |
nextfile | /STOP/ { nextfile } | GNU awk and some modern awks; not POSIX. |
exit | NR > 100 { exit } | Still runs END blocks. |
Multi-rule flow
awk '
NR == 1 { next }
$3 == "" { missing++; next }
$3 >= 90 { print $1, "pass"; next }
{ print $1, "review" }
END { print "missing:", missing }
' scores.tsv
11. Functions
| Function | Meaning | Example |
length(str) | String length. | awk 'length($0) > 120' |
substr(str, start, len) | Substring, 1-based. | awk '{ print substr($1, 1, 8) }' |
index(str, find) | Position of substring or 0. | awk 'index($0, "TODO")' |
split(str, arr, sep) | Split string into array. | awk '{ split($1, a, "-"); print a[1] }' |
sprintf(fmt, ...) | Return formatted string. | awk '{ s = sprintf("%.2f", $1) }' |
tolower(str), toupper(str) | Change case. | awk '{ print tolower($1) }' |
int(x) | Truncate toward zero. | awk '{ print int($1 / 60) }' |
sqrt(x), log(x), exp(x) | Math functions. | awk '{ print sqrt($1) }' |
sin(x), cos(x), atan2(y, x) | Trigonometry in radians. | awk 'BEGIN { print atan2(0, -1) }' |
rand(), srand(seed) | Pseudo-random numbers. | awk 'BEGIN { srand(); print rand() }' |
system(cmd) | Run shell command. | awk '{ system("mkdir -p " $1) }' |
User-defined functions
awk '
function pct(part, whole) {
return whole ? (part * 100 / whole) : 0
}
{ used += $2; total += $3 }
END { printf "%.1f%%\n", pct(used, total) }
' file
12. Input and output
| Feature | Example | Notes |
print | print $1, $2 | Adds ORS at the end. |
printf | printf "%-20s %8.2f\n", $1, $2 | No automatic newline. |
| Overwrite file | print $0 > "out.txt" | First write truncates. |
| Append file | print $0 >> "out.txt" | Appends to existing content. |
| Pipe output | print $1 | "sort -u" | Use close() when done. |
| Read line | getline line < "extra.txt" | Returns 1, 0, or -1. |
| Read command output | "date +%F" | getline today | Close the command string after reading. |
| Close stream | close("out.txt") | Important for many dynamic files or pipes. |
Split records into files
awk '{ print > ($1 ".txt"); close($1 ".txt") }' file
Read a lookup file manually
awk 'BEGIN {
while ((getline line < "names.txt") > 0) {
split(line, parts, "\t")
name[parts[1]] = parts[2]
}
close("names.txt")
}
{ print $1, name[$1] }' ids.txt
14. CSV and structured data
Simple delimited text
Use -F, only when commas cannot appear inside quoted fields and records cannot contain embedded newlines.
awk -F, 'BEGIN { OFS="," } NR > 1 { print $1, $4 }' file.csv
GNU awk CSV mode
GNU awk 5.3+ has --csv, which handles CSV quoting for input.
gawk --csv 'NR > 1 { print $2 }' file.csv
Structured formats
Use format-aware tools for JSON, YAML, XML, and HTML. Regex and field splitting are usually brittle for nested structured data.
jq -r '.items[] | [.name, .size] | @tsv' data.json | awk '{ sum += $2 } END { print sum }'
CSV warning: awk -F, is not a CSV parser. It breaks on fields like "last, first" unless your awk has CSV support or the input is known to be simple.
15. Shell integration
| Need | Use | Why |
| Pass shell variable | awk -v name="$name" '$1 == name' | Avoids quote injection and broken spaces. |
| Use tabs | awk -F '\t' 'BEGIN { OFS="\t" } ...' | Single-quoted awk strings can contain escaped tabs. |
| Exit with failure | awk 'bad { exit 1 }' | Use in scripts and CI checks. |
| Combine with sort | awk '{ count[$1]++ } END { for (k in count) print count[k], k }' file | sort -nr | awk computes; sort orders. |
| Read stdin | cmd | awk '{ print $1 }' | No file argument means standard input. |
| Mix stdin and files | awk '...' - file.txt | - means standard input in most awks. |
Safe shell variable pattern
needle='a value with spaces'
awk -v needle="$needle" '$0 == needle { print NR }' file
Shell function wrapper
top_statuses() {
awk '{ count[$9]++ } END { for (s in count) print count[s], s }' "$@" | sort -nr
}
16. Script files
Move longer awk programs into .awk files when quoting, indentation, or reuse starts to matter.
Run with -f
awk -f report.awk data.tsv
awk -v min=100 -f filter.awk data.tsv
Executable script
#!/usr/bin/awk -f
BEGIN { FS = "\t"; OFS = "\t" }
NR > 1 { print $1, $3 }
Readable report script
BEGIN {
FS = "\t"
OFS = "\t"
}
NR == 1 {
next
}
{
rows++
total += $3
by_team[$2] += $3
}
END {
print "rows", rows
print "total", total
for (team in by_team) {
print team, by_team[team]
}
}
17. Recipes
| Task | Command |
| Number lines | awk '{ print NR, $0 }' file |
| Print line length | awk '{ print length, $0 }' file |
| Unique lines, keep first occurrence | awk '!seen[$0]++' file |
| Duplicate lines only | awk 'seen[$0]++ == 1' file |
| Print first N lines | awk 'NR <= 10' file |
| Print last field | awk '{ print $NF }' file |
| Print records with at least 4 fields | awk 'NF >= 4' file |
| Trim leading and trailing whitespace | awk '{ gsub(/^[[:space:]]+|[[:space:]]+$/, ""); print }' file |
| Convert whitespace to CSV-ish output | awk 'BEGIN { OFS="," } { print $1, $2, $3 }' file |
| Average column | awk '{ sum += $2; n++ } END { if (n) print sum / n }' file |
| Min and max column | awk 'NR == 1 { min = max = $2 } { if ($2 < min) min = $2; if ($2 > max) max = $2 } END { print min, max }' file |
| Group sum | awk '{ sum[$1] += $2 } END { for (k in sum) print k, sum[k] }' file |
| Left join two files by first field | awk 'FNR == NR { name[$1] = $2; next } { print $0, name[$1] }' names.tsv data.tsv |
| Compare two files, print keys only in second | awk 'FNR == NR { seen[$1]; next } !($1 in seen)' first second |
| Extract text between markers | awk '/BEGIN/,/END/' file |
| Remove text between markers | awk '/BEGIN/{skip=1} !skip {print} /END/{skip=0}' file |
| Print paragraph records | awk 'BEGIN { RS=""; ORS="\n\n" } /needle/' file |
| Show top counts | awk '{ count[$1]++ } END { for (k in count) print count[k], k }' file | sort -nr | head |
Access log summary
awk '{
status[$9]++
bytes += $10
}
END {
for (s in status) print s, status[s]
print "bytes", bytes
}' access.log
Find slow requests
awk -v threshold=1.0 '$NF > threshold { print $1, $7, $NF }' access.log
18. Portability and gotchas
| Topic | Guidance |
| POSIX awk | Use POSIX features when scripts must run everywhere: fields, patterns, actions, associative arrays, standard string/math functions, and simple regexes. |
| GNU awk features | nextfile, gensub(), asort(), asorti(), PROCINFO, IGNORECASE, ARGIND, networking, and --csv are GNU awk features or extensions. |
| macOS awk | macOS ships a BSD-derived awk. Install GNU awk as gawk when you need GNU-only behavior. |
| In-place editing | awk is not primarily an in-place editor. Use a temp file and move it into place, or GNU awk's -i inplace extension when acceptable. |
| Array order | for (k in array) order is unspecified unless using GNU awk sorting controls or an external sort. |
| Floating point | Numbers are floating point. Avoid awk for exact decimal money math unless rounding rules are simple and acceptable. |
| String vs number comparison | awk chooses numeric or string comparison from operand types. Force numeric with +0 and string with concatenation to "". |
| Locale | Character classes, sorting, and case conversion can depend on locale. Set LC_ALL=C for byte-oriented, reproducible command-line processing. |
| Quoting | Prefer single quotes around programs and -v for shell data. Avoid building awk source by concatenating untrusted shell strings. |
Practical default: Write one-liners with POSIX awk syntax unless you control the runtime. Use gawk explicitly in scripts that rely on GNU awk extensions.