cerebras.modelzoo.common.model_utils.count_lines.count_lines#

cerebras.modelzoo.common.model_utils.count_lines.count_lines(filename_pattern)[source]#

Returns linecount in the given filename or sum over all of the filenames matching the pattern.

Takes a filename pattern and globs the pattern to get all matching filenames. In order, this function reads all the files in raw format for a fixed buffer size till EOF is reached. It then counts the number of ``

``

in each buffer and returns the sum. Using takewhile and repeat provides inbuilt speedup compared to writing custom while/for loop to handle EOF.

Note: The size of the buffer is currently set to 1024*1024, but this is not optimized for all files. Some files can be read in faster by modifying the buffer size. This value is suboptimal to memory usage on a local dev instance.

param str filename_pattern

filename glob pattern (or filename)

returns

integer number of lines in the file