Scc is a super fast source code counter. In the below example, we can count the lines in the 79,102 files in the Linux kernel source in ~1 second. Scc is written ing Go, but is faster than similar projects written in C and Rust. For the Linux kernel:
Summary 'scc -c linux' ran 1.07 ± 0.22 times faster than 'loc linux' 1.39 ± 0.05 times faster than 'tokei linux' 1.41 ± 0.05 times faster than 'scc linux' 2.35 ± 0.07 times faster than 'polyglot linux'
How does it do this? This is the core of the program:
fileListQueue := make(chan *FileJob, FileListQueueSize) // Files ready to be read from disk
fileSummaryJobQueue := make(chan *FileJob, FileSummaryJobQueueSize) // Files ready to be summarised
go func() {
directoryWalker := NewDirectoryWalker(fileListQueue)
for _, f := range DirFilePaths {
err := directoryWalker.Start(f)
if err != nil {
fmt.Printf("failed to walk %s: %v", f, err)
os.Exit(1)
}
}
directoryWalker.Run()
}()
go fileProcessorWorker(fileListQueue, fileSummaryJobQueue)
result := fileSummarize(fileSummaryJobQueue)
The basic architecture appears to be the following pipeline:
The size of the queues (which are simple Go channels) and the worker pool is calculated as follows:
// FileListQueueSize is the queue of files found and ready to be read into memory
var FileListQueueSize = runtime.NumCPU()
// FileProcessJobWorkers is the number of workers that process the file collecting stats
var FileProcessJobWorkers = runtime.NumCPU() * 4
// FileSummaryJobQueueSize is the queue used to hold processed file statistics before formatting
var FileSummaryJobQueueSize = runtime.NumCPU()
In the below example, we see 13s of user time compared to 1s of real-time. This is interesting because this test is running on a 12-core CPU.
[cbrake@ceres linux]$ time scc ./
───────────────────────────────────────────────────────────────────────────────
Language Files Lines Blanks Comments Code Complexity
───────────────────────────────────────────────────────────────────────────────
C 33534 23764063 3417782 2665054 17681227 2350392
C Header 24529 9557281 730121 1434616 7392544 51415
Device Tree 5041 1512839 196326 76350 1240163 36
YAML 3905 433840 72002 17975 343863 0
ReStructuredText 3472 711541 171655 0 539886 0
Makefile 2946 76882 11549 12358 52975 431
Plain Text 1741 151231 27041 0 124190 0
Assembly 1320 375225 34474 737 340014 6015
Shell 896 172442 28608 21202 122632 9284
JSON 788 443135 2 0 443133 0
gitignore 332 1836 28 323 1485 0
Python 203 68504 3381 3436 61687 2603
SVG 74 49420 90 1171 48159 0
Perl 67 50471 5153 3879 41439 3039
Rust 64 20887 1538 9857 9492 824
BASH 60 2304 357 339 1608 233
DOT 20 853 78 42 733 0
AWK 13 1951 255 187 1509 0
CSV 11 766 73 0 693 0
Extensible Styleshe… 10 200 26 0 174 0
Happy 10 6069 710 0 5359 0
LEX 10 2874 360 312 2202 0
LD Script 8 376 59 29 288 0
Autoconf 5 429 30 26 373 10
C++ 5 2138 217 61 1860 309
License 5 422 84 0 338 0
Unreal Script 5 707 104 158 445 23
CSS 3 295 54 69 172 0
Systemd 3 167 25 0 142 0
Bazel 2 695 152 149 394 6
C++ Header 2 125 11 55 59 2
CMake 2 8 0 0 8 0
HEX 2 173 0 0 173 0
HTML 2 33 3 8 22 0
INI 2 13 2 5 6 0
Module-Definition 2 128 15 0 113 4
Gherkin Specificati… 1 291 34 58 199 0
MATLAB 1 89 17 37 35 3
Markdown 1 36 9 0 27 0
Ruby 1 29 4 0 25 1
TOML 1 12 1 9 2 0
TeX 1 236 6 74 156 0
Vim Script 1 42 3 12 27 5
sed 1 12 2 5 5 0
───────────────────────────────────────────────────────────────────────────────
Total 79102 37411070 4702441 4248593 28460036 2424635
───────────────────────────────────────────────────────────────────────────────
Estimated Cost to Develop (organic) $1,283,933,538
Estimated Schedule Effort (organic) 208.77 months
Estimated People Required (organic) 546.38
───────────────────────────────────────────────────────────────────────────────
Processed 1380310615 bytes, 1380.311 megabytes (SI)
───────────────────────────────────────────────────────────────────────────────
real 0m1.092s
user 0m13.319s
sys 0m2.853s