cerebras.modelzoo.common.model_utils.count_lines.count_lines#
- cerebras.modelzoo.common.model_utils.count_lines.count_lines(filename_pattern)[source]#
Returns linecount in the given filename or sum over all of the filenames matching the pattern.
Takes a filename pattern and globs the pattern to get all matching filenames. In order, this function reads all the files in raw format for a fixed buffer size till EOF is reached. It then counts the number of ``
- ``
in each buffer and returns the sum. Using
takewhile
andrepeat
provides inbuilt speedup compared to writing custom while/for loop to handle EOF.Note: The size of the buffer is currently set to 1024*1024, but this is not optimized for all files. Some files can be read in faster by modifying the buffer size. This value is suboptimal to memory usage on a local dev instance.
- param str filename_pattern
filename glob pattern (or filename)
- returns
integer number of lines in the file