Next: , Up: Finding Duplicate Lines   [Index]


1.7.1 Dup1

The first version of dup prints each line that appears more than once in the standard input, preceded by its count. This program introduces the if statement, the map data type, and the bufio package.

// Dup1 prints  the text of each  line that appears more  than once in
// the standard input, preceded by its count.
package main

import(
        "bufio"
        "fmt"
        "os"
)

func main() {
        counts := make(map[string]int)
        input := bufio.NewScanner(os.Stdin)
        for input.Scan() {
                    counts[input.Text()]++
        }
        // NOTE: ignoring potential errors from input.Err()
        for line, n := range counts {
                if n > 1 {
                        fmt.Printf("%d\t%s\n", n, line)
                }
        }
}

Listing 1.9: gopl.io/ch1/dup1

if

As with for, parentheses are never used around the condition in an if statement, but braces are required for the body. There can be an optional else part that is executed if the condition is false.

map Data Type

A map holds a set of key/value pairs and provides constant-time operations to store, retrieve, or test for an item in the set.

key

The key may be of any type whose values can compared with ‘==’, strings being the most common example;

value

the value may be of any type at all.

In this example, the keys are ‘strings’ and the values are ‘ints’.

The Built-in Function make

The built-in function make creates a new empty ‘map’; it has other uses too. Maps are discussed at length in Section 4.3.

Each time dup reads a line of input, the line is used as a ‘key’ into the ‘map’ and the corresponding ‘value’ is incremented.

The statement counts[input.Text()]++ is equivalent to these two statements:

line := input.Text()
counts[line] = counts[line] + 1

It’s not a problem if the ‘map’ doesn’t yet contain that ‘key’. The first time a new line is seen, the expression counts[line] on the right-hand side evaluates to the zero value for its type, which is 0 for ‘int’.

To print the results, we use another range-based for loop, this time over the countsmap’. As before, each iteration produces two results, a ‘key’ and the ‘value’ of the ‘map’ element for that ‘key’. The order of ‘map’ iteration is not specified, but in practice it is random, varying from one run to another. This design is intentional, since it prevents programs from relying on any particular ordering where none is guaranteed.

The bufio Package and the type Scanner

The bufio package helps make input and output efficient and convenient. One of its most useful features is a type called Scanner that reads input and breaks it into lines or words; it’s often the easiest way to process input that comes naturally in lines.

The program uses a short variable declaration to create a new variable input that refers to a bufio.Scanner:

input := bufio.NewScanner(os.Stdin)

The scanner reads from the program’s ‘standard input’. Each call to input.Scan() reads the next line and removes the ‘newline’ character from the end; the result can be retrieved by calling input.Text(). The Scan function returns ‘true’ if there is a line and ‘false’ when there is no more input.

The Function fmt.Printf

The function fmt.Printf, like printf in ‘C’ and other languages, produces formatted output from a list of expressions. Its first argument is a format string that specifies how subsequent arguments should be formatted. The format of each argument is determined by a conversion character, a letter following a percent sign. For example, ‘%d’ formats an ‘integer’ operand using decimal notation, and ‘%s’ expands to the value of a ‘string’ operand.

Printf has over a dozen such conversions, which Go programmers call verbs. This table is far from a complete specification but illustrates many of the features that are available:

verbdescription
%ddeciminal integer
%x %o %binteger hex, oct, binary
%f %g %efloating-point numbers
%tboolean: ‘true’ or ‘false
%crune—Unicode code point
%sstring
%qquoted string “s” or rune ’c’
%vvalue in natural format
%Ttype of any value
%%literal percent sign

Table 1.1: Printf Verbs

Escape Sequences

The format string in dup1 also contains a ‘tab’ ‘\t’ and a ‘newline’ ‘\n’. String literals may contain such escape sequences for representing otherwise invisible characters.

Formatting Functions

Printf does not write a ‘newline’ by default. By convention, formatting functions whose names end in ‘f’, such as log.Printf and fmt.Errorf, use the formatting rules of fmt.Printf, whereas those whose names end in ‘ln’ follow Println, formatting their arguments as if by ‘%v’, followed by a ‘newline’.


Next: Dup2, Up: Finding Duplicate Lines   [Index]