Next: , Previous: , Up: Finding Duplicate Lines   [Index]


1.7.3 Dup3

The versions of dup above operate in a “streaming” mode in which input is read and broken into lines as needed, so in principle these programs can handle an arbitrary amount of input.

An alternative approach is to read the entire input into memory in one big gulp, split it into lines all at once, then process the lines.

The following version, dup3, operates in that fashion. It introduces the function ReadFile (from the io/ioutil package), which reads the entire contents of a named file, and strings.Split, which splits a string into a slice of substrings. (Split is the opposite of strings.Join, which we saw earlier.)

We’ve simplified dup3 somewhat.

// Dup3 reads  the entire  input files into  memory, splits  them into
// lines all at once, then processes the lines by counting each one
  package main

  import (
          "fmt"
          "io/ioutil"
          "os"
          "strings"
  )

  func main() {
          counts := make(map[string]int)
          for _, filename := range os.Args[1:] {
                  data, err := ioutil.ReadFile(filename)
                  if err != nil {
                          fmt.Fprintf(os.Stderr, "dup3: %v\n", err)
                          continue
                  }
                  for _, line := range strings.Split(string(data), "\n") {
                          counts[line]++
                  }
          }
          for line, n := range counts {
                  if n > 1 {
                          fmt.Printf("%d\t%s\n", n, line)
                  }
          }
  }

Listing 1.11: gopl.io/ch1/dup3

ReadFile returns a byte slice that must be converted into a ‘string’ so it can be split by strings.Split.

Under the covers,

use the Read and Write methods of *os.File, but it’s rare that most programmers need to access these lower-level routines directly. The higher-level functions like those from bufio and io/ioutil are easier to use.


Next: Exercise 1.4, Previous: Dup2, Up: Finding Duplicate Lines   [Index]