Next: Dup2, Up: Finding Duplicate Lines [Index]
The first version of dup
prints each line that appears more than once in the
standard input, preceded by its count. This program introduces the if
statement, the map
data type, and the bufio
package.
// Dup1 prints the text of each line that appears more than once in // the standard input, preceded by its count. package main import( "bufio" "fmt" "os" ) func main() { counts := make(map[string]int) input := bufio.NewScanner(os.Stdin) for input.Scan() { counts[input.Text()]++ } // NOTE: ignoring potential errors from input.Err() for line, n := range counts { if n > 1 { fmt.Printf("%d\t%s\n", n, line) } } }
if
As with for
, parentheses are never used around the condition in an if
statement, but braces are required for the body. There can be an optional
else
part that is executed if the condition is false.
map
Data TypeA map holds a set of key/value pairs and provides constant-time operations to store, retrieve, or test for an item in the set.
The key may be of any type whose values can compared with ‘==’, strings being the most common example;
the value may be of any type at all.
In this example, the keys are ‘strings’ and the values are ‘ints’.
make
The built-in function make
creates a new empty ‘map’; it has other uses
too. Maps are discussed at length in Section 4.3.
Each time dup
reads a line of input, the line is used as a ‘key’ into the
‘map’ and the corresponding ‘value’ is incremented.
The statement counts[input.Text()]++
is equivalent to these two statements:
line := input.Text() counts[line] = counts[line] + 1
It’s not a problem if the ‘map’ doesn’t yet contain that ‘key’. The first time
a new line is seen, the expression counts[line]
on the right-hand side
evaluates to the zero value for its type, which is 0 for ‘int’.
To print the results, we use another range-based for
loop, this time over the
counts
‘map’. As before, each iteration produces two results, a ‘key’ and
the ‘value’ of the ‘map’ element for that ‘key’. The order of ‘map’ iteration
is not specified, but in practice it is random, varying from one run to
another. This design is intentional, since it prevents programs from relying
on any particular ordering where none is guaranteed.
bufio
Package and the type Scanner
The bufio
package helps make input and output efficient and convenient. One
of its most useful features is a type called Scanner
that reads input and
breaks it into lines or words; it’s often the easiest way to process input that
comes naturally in lines.
The program uses a short variable declaration to create a new variable input
that refers to a bufio.Scanner
:
input := bufio.NewScanner(os.Stdin)
The scanner reads from the program’s ‘standard input’. Each call to
input.Scan()
reads the next line and removes the ‘newline’ character from the
end; the result can be retrieved by calling input.Text()
. The Scan
function returns ‘true’ if there is a line and ‘false’ when there is no more
input.
fmt.Printf
The function fmt.Printf
, like printf
in ‘C’ and other languages, produces
formatted output from a list of expressions. Its first argument is a format
string that specifies how subsequent arguments should be formatted. The format
of each argument is determined by a conversion character, a letter following a
percent sign. For example, ‘%d’ formats an ‘integer’ operand using decimal
notation, and ‘%s’ expands to the value of a ‘string’ operand.
Printf
has over a dozen such conversions, which Go programmers call
verbs. This table is far from a complete specification but
illustrates many of the features that are available:
verb | description |
---|---|
‘%d’ | deciminal integer |
‘%x %o %b’ | integer hex, oct, binary |
‘%f %g %e’ | floating-point numbers |
‘%t’ | boolean: ‘true’ or ‘false’ |
‘%c’ | rune—Unicode code point |
‘%s’ | string |
‘%q’ | quoted string “s” or rune ’c’ |
‘%v’ | value in natural format |
‘%T’ | type of any value |
‘%%’ | literal percent sign |
The format string in dup1
also contains a ‘tab’ ‘\t’ and a ‘newline’ ‘\n’.
String literals may contain such escape sequences for
representing otherwise invisible characters.
Printf does not write a ‘newline’ by default. By convention, formatting
functions whose names end in ‘f’, such as log.Printf
and fmt.Errorf
, use
the formatting rules of fmt.Printf
, whereas those whose names end in ‘ln’
follow Println
, formatting their arguments as if by ‘%v’, followed by a
‘newline’.
Next: Dup2, Up: Finding Duplicate Lines [Index]