Text processing reference

awk Cheat Sheet

A practical awk reference for filtering rows, extracting columns, computing summaries, rewriting delimited text, formatting reports, and writing portable one-liners and scripts.

POSIX aware GNU awk notes 18 sections

1. Mental model

What awk does

awk reads input record by record, splits each record into fields, tests patterns, runs actions, and prints or computes results.

When awk is a good fit

Use awk for row and column processing, numeric summaries, simple reports, associative arrays, joins, grouping, and log analysis in shell pipelines.

When to switch tools

Use sed for simple stream substitutions, jq for JSON, xq/xmlstarlet for XML, yq for YAML, and a real CSV parser for complex quoted CSV unless your awk supports CSV mode.

Default cycle: read a record, set $0, split fields into $1 through $NF, update counters, run matching actions, repeat. If a pattern has no action, awk prints the matching record.

2. Quick start

TaskCommandNotes
Print whole fileawk '{ print }' fileSame as awk '{ print $0 }'.
Print first columnawk '{ print $1 }' fileFields are 1-based.
Print columns 1 and 3awk '{ print $1, $3 }' fileComma uses OFS, a space by default.
Use comma delimiterawk -F, '{ print $2 }' file.csvGood only for simple comma-separated text.
Filter matching rowsawk '/error/' app.logNo action means print matching records.
Filter by columnawk '$3 >= 500 { print $1, $3 }' access.logNumeric comparison when operands look numeric.
Skip headerawk 'NR > 1 { print }' table.tsvNR is the global record number.
Sum a columnawk '{ sum += $2 } END { print sum }' fileEND runs after all input.
Count rows by valueawk '{ count[$1]++ } END { for (k in count) print k, count[k] }' fileArrays are associative.
Set output delimiterawk 'BEGIN { OFS="," } { print $1, $2 }' fileprint a, b inserts OFS.

Common shape

awk 'pattern { action }' file
awk 'BEGIN { setup } pattern { action } END { summary }' file

Pipeline shape

ps aux | awk '$3 > 20 { print $2, $3, $11 }'
git log --oneline | awk '{ print NR ": " $0 }'

3. Invocation and options

OptionUseExample
-F fsSet input field separator.awk -F: '{ print $1 }' /etc/passwd
-v name=valuePass a variable before processing starts.awk -v min=10 '$2 > min' file
-f script.awkRead awk program from a file.awk -f report.awk data.tsv
--Stop option parsing in many awk versions.awk -f script.awk -- -weird-file
--posixGNU awk: use stricter POSIX behavior.gawk --posix -f script.awk file
--lintGNU awk: warn about questionable code.gawk --lint -f script.awk file
--csvGNU awk 5.3+: parse CSV input.gawk --csv '{ print $2 }' file.csv
Quoting: Put awk programs in single quotes in the shell. Use -v to pass shell values instead of interpolating them into the program text.

Passing variables

min=100
awk -v min="$min" '$3 >= min { print $1, $3 }' file

Multiple input files

awk '{ print FILENAME, FNR, $0 }' *.log
awk 'FNR == 1 { print "== " FILENAME " ==" } { print }' *.txt

4. Records and fields

A record is normally one line. A field is normally one whitespace-separated token. Change RS for records and FS for fields.

TokenMeaningExample
$0Entire current record.awk '{ print $0 }' file
$1, $2Field 1, field 2.awk '{ print $1, $2 }' file
$NFLast field.awk '{ print $NF }' file
$(NF-1)Next-to-last field.awk '{ print $(NF-1) }' file
NFNumber of fields in current record.awk 'NF > 3' file
NRTotal records read so far.awk 'NR == 10' file
FNRRecord number within current file.awk 'FNR == 1' *.csv
FSInput field separator.awk 'BEGIN { FS=":" } { print $1 }'
OFSOutput field separator.awk 'BEGIN { OFS="\t" } { print $1, $2 }'
RSInput record separator.awk 'BEGIN { RS="" } { print NF }'
ORSOutput record separator.awk 'BEGIN { ORS="\n\n" } { print }'

Separators

awk -F: '{ print $1 }' /etc/passwd
awk 'BEGIN { FS="[,:]" } { print $1, $2 }' file
awk 'BEGIN { FS="\t"; OFS="," } { print $1, $3 }' table.tsv
Special whitespace mode: The default FS=" " means runs of spaces, tabs, and newlines separate fields, and leading or trailing whitespace is ignored. This is different from FS="[ \t]+" in edge cases.

Changing fields rebuilds the record

awk 'BEGIN { OFS="," } { $2 = toupper($2); print }' file
awk '{$1=$1; print}' file

5. Patterns and actions

An awk program is a sequence of pattern { action } rules. A missing pattern means every record. A missing action means print.

PatternMeaningExample
BEGINRun before input is read.awk 'BEGIN { print "start" }'
ENDRun after all input is read.awk '{ n++ } END { print n }' file
/re/Records where $0 matches regex.awk '/ERROR/' log
exprRecords where expression is true.awk '$4 == 404' access.log
p1, p2Inclusive range from pattern 1 through pattern 2.awk '/BEGIN/,/END/' file
pattern { }Do nothing for matching records.awk 'NR == 1 { next } { print }'

Common filters

awk 'NF' file
awk 'NF == 0' file
awk '$1 == "GET" && $9 >= 500' log
awk 'NR >= 10 && NR <= 20' file
awk 'FNR == NR { ids[$1]; next } $1 in ids' allowlist data
Truth: Empty strings and numeric zero are false. Non-empty strings and non-zero numbers are true.

6. Regular expressions

ExpressionMeaningExample
$0 ~ /re/Record matches regex.awk '$0 ~ /error/' log
$1 !~ /re/Field does not match regex.awk '$1 !~ /^#/' file
/re/Shortcut for $0 ~ /re/.awk '/timeout/' log
^, $Start and end of string.awk '$1 ~ /^user[0-9]+$/' file
[[:digit:]]POSIX digit class.awk '$2 ~ /^[[:digit:]]+$/' file
[[:space:]]POSIX whitespace class.awk '$0 ~ /[[:space:]]+$/' file
(cat|dog)Alternation and grouping.awk '$3 ~ /^(cat|dog)$/' file

Regex functions

awk '{ sub(/old/, "new"); print }' file
awk '{ gsub(/[[:space:]]+/, " "); print }' file
awk 'match($0, /id=[0-9]+/) { print substr($0, RSTART, RLENGTH) }' log
FunctionUseNotes
sub(re, repl, target)Replace first match.target defaults to $0.
gsub(re, repl, target)Replace all matches.Returns replacement count.
match(str, re)Find regex in string.Sets RSTART and RLENGTH.
split(str, arr, re)Split string into array.Returns number of parts.

7. Variables and built-ins

Variables spring into existence when used. Uninitialized variables behave like an empty string or numeric zero depending on context.

User variables

awk '{ total += $2; seen = 1 } END { print total }' file
awk -v label="Total" '{ sum += $1 } END { print label, sum }' file

Numeric and string context

awk '$1 + 0 > 10' file
awk '$1 "" == "0012"' file

Environment

awk 'BEGIN { print ENVIRON["HOME"] }'
awk -v user="$USER" 'BEGIN { print user }'
VariableMeaning
ARGC, ARGVCommand-line argument count and array.
CONVFMTNumber-to-string conversion format, default %.6g.
FILENAMECurrent input file name.
FNRRecord number in current file.
FS, OFSInput and output field separators.
NFField count for current record.
NRTotal input records read.
OFMTOutput format for numbers printed with print.
ORS, RSOutput and input record separators.
RLENGTH, RSTARTResult position and length from match().
SUBSEPSeparator used internally for multi-index arrays.

8. Operators

OperatorMeaningExample
+ - * / % ^Arithmetic.awk '{ print ($2 * 100) / $3 }'
++ --Increment and decrement.awk '{ count++ } END { print count }'
= += -= *= /= %=Assignment.awk '{ sum += $1 }'
== != < <= > >=Comparison.awk '$3 >= 90'
~ !~Regex match and not match.awk '$1 ~ /^api-/'
&& || !Logical and, or, not.awk '$1 == "GET" && $9 == 200'
expr ? a : bTernary expression.awk '{ print $3 > 0 ? "up" : "down" }'
inArray membership.awk '$1 in seen { print $1 }'
spaceString concatenation.awk '{ print $1 ":" $2 }'
Concatenation has no operator: $1 $2 means join field 1 and field 2 directly. Use explicit separators such as $1 ":" $2 when output must be unambiguous.

9. Arrays

awk arrays are associative maps. Indexes are strings, even when they look numeric.

Counting

awk '{ count[$1]++ }
END { for (key in count) print key, count[key] }' file

Membership

awk 'FNR == NR { keep[$1]; next }
$1 in keep { print }' ids.txt data.txt

Delete

awk '{ seen[$1]++ }
END { delete seen["tmp"]; for (k in seen) print k }' file

Composite keys

awk '{ count[$1, $2]++ }
END {
  for (key in count) {
    split(key, parts, SUBSEP)
    print parts[1], parts[2], count[key]
  }
}' file

Sorting output

awk '{ count[$1]++ } END { for (k in count) print k, count[k] }' file | sort
gawk 'BEGIN { PROCINFO["sorted_in"] = "@ind_str_asc" }
{ count[$1]++ } END { for (k in count) print k, count[k] }' file

10. Control flow

ConstructExampleNotes
ifif ($3 > 10) print $1Use braces for multiple statements.
if / else{ if ($1 == "") print "empty"; else print $1 }Semicolons separate inline statements.
whilewhile (i <= NF) { print $i; i++ }Condition checked first.
do / whiledo { i++ } while (i < 10)Runs at least once.
forfor (i = 1; i <= NF; i++) print $iUseful for fields.
for infor (k in count) print k, count[k]Array order is unspecified in POSIX awk.
nextNR == 1 { next }Skip remaining rules for current record.
nextfile/STOP/ { nextfile }GNU awk and some modern awks; not POSIX.
exitNR > 100 { exit }Still runs END blocks.

Multi-rule flow

awk '
NR == 1 { next }
$3 == "" { missing++; next }
$3 >= 90 { print $1, "pass"; next }
{ print $1, "review" }
END { print "missing:", missing }
' scores.tsv

11. Functions

FunctionMeaningExample
length(str)String length.awk 'length($0) > 120'
substr(str, start, len)Substring, 1-based.awk '{ print substr($1, 1, 8) }'
index(str, find)Position of substring or 0.awk 'index($0, "TODO")'
split(str, arr, sep)Split string into array.awk '{ split($1, a, "-"); print a[1] }'
sprintf(fmt, ...)Return formatted string.awk '{ s = sprintf("%.2f", $1) }'
tolower(str), toupper(str)Change case.awk '{ print tolower($1) }'
int(x)Truncate toward zero.awk '{ print int($1 / 60) }'
sqrt(x), log(x), exp(x)Math functions.awk '{ print sqrt($1) }'
sin(x), cos(x), atan2(y, x)Trigonometry in radians.awk 'BEGIN { print atan2(0, -1) }'
rand(), srand(seed)Pseudo-random numbers.awk 'BEGIN { srand(); print rand() }'
system(cmd)Run shell command.awk '{ system("mkdir -p " $1) }'

User-defined functions

awk '
function pct(part, whole) {
  return whole ? (part * 100 / whole) : 0
}
{ used += $2; total += $3 }
END { printf "%.1f%%\n", pct(used, total) }
' file

12. Input and output

FeatureExampleNotes
printprint $1, $2Adds ORS at the end.
printfprintf "%-20s %8.2f\n", $1, $2No automatic newline.
Overwrite fileprint $0 > "out.txt"First write truncates.
Append fileprint $0 >> "out.txt"Appends to existing content.
Pipe outputprint $1 | "sort -u"Use close() when done.
Read linegetline line < "extra.txt"Returns 1, 0, or -1.
Read command output"date +%F" | getline todayClose the command string after reading.
Close streamclose("out.txt")Important for many dynamic files or pipes.

Split records into files

awk '{ print > ($1 ".txt"); close($1 ".txt") }' file

Read a lookup file manually

awk 'BEGIN {
  while ((getline line < "names.txt") > 0) {
    split(line, parts, "\t")
    name[parts[1]] = parts[2]
  }
  close("names.txt")
}
{ print $1, name[$1] }' ids.txt

13. Formatting output

Use print for simple separated fields and printf for aligned reports, fixed decimals, zero padding, and custom layouts.

FormatMeaningExample
%sString.printf "%s\n", $1
%dInteger.printf "%04d\n", NR
%fFloating point.printf "%.2f\n", $2
%e, %gScientific or compact number.printf "%g\n", $2
%-10sLeft-align in width 10.printf "%-10s %8d\n", $1, $2
%8.2fWidth 8, 2 decimals.printf "%8.2f\n", total
%%Literal percent sign.printf "%.1f%%\n", pct

Report layout

awk '
BEGIN { printf "%-20s %10s\n", "name", "bytes" }
{ total += $2; printf "%-20s %10d\n", $1, $2 }
END { printf "%-20s %10d\n", "TOTAL", total }
' sizes.tsv

14. CSV and structured data

Simple delimited text

Use -F, only when commas cannot appear inside quoted fields and records cannot contain embedded newlines.

awk -F, 'BEGIN { OFS="," } NR > 1 { print $1, $4 }' file.csv

GNU awk CSV mode

GNU awk 5.3+ has --csv, which handles CSV quoting for input.

gawk --csv 'NR > 1 { print $2 }' file.csv

Structured formats

Use format-aware tools for JSON, YAML, XML, and HTML. Regex and field splitting are usually brittle for nested structured data.

jq -r '.items[] | [.name, .size] | @tsv' data.json | awk '{ sum += $2 } END { print sum }'
CSV warning: awk -F, is not a CSV parser. It breaks on fields like "last, first" unless your awk has CSV support or the input is known to be simple.

15. Shell integration

NeedUseWhy
Pass shell variableawk -v name="$name" '$1 == name'Avoids quote injection and broken spaces.
Use tabsawk -F '\t' 'BEGIN { OFS="\t" } ...'Single-quoted awk strings can contain escaped tabs.
Exit with failureawk 'bad { exit 1 }'Use in scripts and CI checks.
Combine with sortawk '{ count[$1]++ } END { for (k in count) print count[k], k }' file | sort -nrawk computes; sort orders.
Read stdincmd | awk '{ print $1 }'No file argument means standard input.
Mix stdin and filesawk '...' - file.txt- means standard input in most awks.

Safe shell variable pattern

needle='a value with spaces'
awk -v needle="$needle" '$0 == needle { print NR }' file

Shell function wrapper

top_statuses() {
  awk '{ count[$9]++ } END { for (s in count) print count[s], s }' "$@" | sort -nr
}

16. Script files

Move longer awk programs into .awk files when quoting, indentation, or reuse starts to matter.

Run with -f

awk -f report.awk data.tsv
awk -v min=100 -f filter.awk data.tsv

Executable script

#!/usr/bin/awk -f
BEGIN { FS = "\t"; OFS = "\t" }
NR > 1 { print $1, $3 }

Readable report script

BEGIN {
  FS = "\t"
  OFS = "\t"
}

NR == 1 {
  next
}

{
  rows++
  total += $3
  by_team[$2] += $3
}

END {
  print "rows", rows
  print "total", total
  for (team in by_team) {
    print team, by_team[team]
  }
}

17. Recipes

TaskCommand
Number linesawk '{ print NR, $0 }' file
Print line lengthawk '{ print length, $0 }' file
Unique lines, keep first occurrenceawk '!seen[$0]++' file
Duplicate lines onlyawk 'seen[$0]++ == 1' file
Print first N linesawk 'NR <= 10' file
Print last fieldawk '{ print $NF }' file
Print records with at least 4 fieldsawk 'NF >= 4' file
Trim leading and trailing whitespaceawk '{ gsub(/^[[:space:]]+|[[:space:]]+$/, ""); print }' file
Convert whitespace to CSV-ish outputawk 'BEGIN { OFS="," } { print $1, $2, $3 }' file
Average columnawk '{ sum += $2; n++ } END { if (n) print sum / n }' file
Min and max columnawk 'NR == 1 { min = max = $2 } { if ($2 < min) min = $2; if ($2 > max) max = $2 } END { print min, max }' file
Group sumawk '{ sum[$1] += $2 } END { for (k in sum) print k, sum[k] }' file
Left join two files by first fieldawk 'FNR == NR { name[$1] = $2; next } { print $0, name[$1] }' names.tsv data.tsv
Compare two files, print keys only in secondawk 'FNR == NR { seen[$1]; next } !($1 in seen)' first second
Extract text between markersawk '/BEGIN/,/END/' file
Remove text between markersawk '/BEGIN/{skip=1} !skip {print} /END/{skip=0}' file
Print paragraph recordsawk 'BEGIN { RS=""; ORS="\n\n" } /needle/' file
Show top countsawk '{ count[$1]++ } END { for (k in count) print count[k], k }' file | sort -nr | head

Access log summary

awk '{
  status[$9]++
  bytes += $10
}
END {
  for (s in status) print s, status[s]
  print "bytes", bytes
}' access.log

Find slow requests

awk -v threshold=1.0 '$NF > threshold { print $1, $7, $NF }' access.log

18. Portability and gotchas

TopicGuidance
POSIX awkUse POSIX features when scripts must run everywhere: fields, patterns, actions, associative arrays, standard string/math functions, and simple regexes.
GNU awk featuresnextfile, gensub(), asort(), asorti(), PROCINFO, IGNORECASE, ARGIND, networking, and --csv are GNU awk features or extensions.
macOS awkmacOS ships a BSD-derived awk. Install GNU awk as gawk when you need GNU-only behavior.
In-place editingawk is not primarily an in-place editor. Use a temp file and move it into place, or GNU awk's -i inplace extension when acceptable.
Array orderfor (k in array) order is unspecified unless using GNU awk sorting controls or an external sort.
Floating pointNumbers are floating point. Avoid awk for exact decimal money math unless rounding rules are simple and acceptable.
String vs number comparisonawk chooses numeric or string comparison from operand types. Force numeric with +0 and string with concatenation to "".
LocaleCharacter classes, sorting, and case conversion can depend on locale. Set LC_ALL=C for byte-oriented, reproducible command-line processing.
QuotingPrefer single quotes around programs and -v for shell data. Avoid building awk source by concatenating untrusted shell strings.
Practical default: Write one-liners with POSIX awk syntax unless you control the runtime. Use gawk explicitly in scripts that rely on GNU awk extensions.