How is scc so fast?

Scc is a super fast source code counter. In the below example, we can count the lines in the 79,102 files in the Linux kernel source in ~1 second. Scc is written ing Go, but is faster than similar projects written in C and Rust. For the Linux kernel:

  'scc -c linux' ran
    1.07 ± 0.22 times faster than 'loc linux'
    1.39 ± 0.05 times faster than 'tokei linux'
    1.41 ± 0.05 times faster than 'scc linux'
    2.35 ± 0.07 times faster than 'polyglot linux'

How does it do this? This is the core of the program:

	fileListQueue := make(chan *FileJob, FileListQueueSize)             // Files ready to be read from disk
	fileSummaryJobQueue := make(chan *FileJob, FileSummaryJobQueueSize) // Files ready to be summarised

	go func() {
		directoryWalker := NewDirectoryWalker(fileListQueue)

		for _, f := range DirFilePaths {
			err := directoryWalker.Start(f)
			if err != nil {
				fmt.Printf("failed to walk %s: %v", f, err)

	go fileProcessorWorker(fileListQueue, fileSummaryJobQueue)

	result := fileSummarize(fileSummaryJobQueue)

The basic architecture appears to be the following pipeline:

The size of the queues (which are simple Go channels) and the worker pool is calculated as follows:

// FileListQueueSize is the queue of files found and ready to be read into memory
var FileListQueueSize = runtime.NumCPU()

// FileProcessJobWorkers is the number of workers that process the file collecting stats
var FileProcessJobWorkers = runtime.NumCPU() * 4

// FileSummaryJobQueueSize is the queue used to hold processed file statistics before formatting
var FileSummaryJobQueueSize = runtime.NumCPU()

In the below example, we see 13s of user time compared to 1s of real-time. This is interesting because this test is running on a 12-core CPU.

[cbrake@ceres linux]$ time scc ./
Language                 Files     Lines   Blanks  Comments     Code Complexity
C                        33534  23764063  3417782   2665054 17681227    2350392
C Header                 24529   9557281   730121   1434616  7392544      51415
Device Tree               5041   1512839   196326     76350  1240163         36
YAML                      3905    433840    72002     17975   343863          0
ReStructuredText          3472    711541   171655         0   539886          0
Makefile                  2946     76882    11549     12358    52975        431
Plain Text                1741    151231    27041         0   124190          0
Assembly                  1320    375225    34474       737   340014       6015
Shell                      896    172442    28608     21202   122632       9284
JSON                       788    443135        2         0   443133          0
gitignore                  332      1836       28       323     1485          0
Python                     203     68504     3381      3436    61687       2603
SVG                         74     49420       90      1171    48159          0
Perl                        67     50471     5153      3879    41439       3039
Rust                        64     20887     1538      9857     9492        824
BASH                        60      2304      357       339     1608        233
DOT                         20       853       78        42      733          0
AWK                         13      1951      255       187     1509          0
CSV                         11       766       73         0      693          0
Extensible Styleshe…        10       200       26         0      174          0
Happy                       10      6069      710         0     5359          0
LEX                         10      2874      360       312     2202          0
LD Script                    8       376       59        29      288          0
Autoconf                     5       429       30        26      373         10
C++                          5      2138      217        61     1860        309
License                      5       422       84         0      338          0
Unreal Script                5       707      104       158      445         23
CSS                          3       295       54        69      172          0
Systemd                      3       167       25         0      142          0
Bazel                        2       695      152       149      394          6
C++ Header                   2       125       11        55       59          2
CMake                        2         8        0         0        8          0
HEX                          2       173        0         0      173          0
HTML                         2        33        3         8       22          0
INI                          2        13        2         5        6          0
Module-Definition            2       128       15         0      113          4
Gherkin Specificati…         1       291       34        58      199          0
MATLAB                       1        89       17        37       35          3
Markdown                     1        36        9         0       27          0
Ruby                         1        29        4         0       25          1
TOML                         1        12        1         9        2          0
TeX                          1       236        6        74      156          0
Vim Script                   1        42        3        12       27          5
sed                          1        12        2         5        5          0
Total                    79102  37411070  4702441   4248593 28460036    2424635
Estimated Cost to Develop (organic) $1,283,933,538
Estimated Schedule Effort (organic) 208.77 months
Estimated People Required (organic) 546.38
Processed 1380310615 bytes, 1380.311 megabytes (SI)

real    0m1.092s
user    0m13.319s
sys     0m2.853s