Detailed analysis of trajectories in the Morris water maze (Gehring et al. 2015)

 Download zip file 
Help downloading and running models
Accession:185090
MATLAB code that can be used for detailed behavioural analyzes of the trajectories of animals be means of a semi-supervised clustering algorithm. The method is applied here to trajectories in the Morris Water Maze (see Gehring, T. V. et al., Scientific Reports, 2015) but the code can easily be adapted to other types experiments. For more information and the latest version of the code please refer to https://bitbucket.org/tiagogehring/mwm_trajectories
Reference:
1 . Gehring TV, Luksys G, Sandi C, Vasilaki E (2015) Detailed classification of swimming paths in the Morris Water Maze: multiple strategies within one trial. Sci Rep 5:14562 [PubMed]
Model Information (Click on a link to find other models with that property)
Model Type:
Brain Region(s)/Organism:
Cell Type(s):
Channel(s):
Gap Junctions:
Receptor(s):
Gene(s):
Transmitter(s):
Simulation Environment: MATLAB;
Model Concept(s): Methods;
Implementer(s):
function [A,ffn,numHeader,repChar,hl,fpos] = txt2mat(varargin)

% TXT2MAT read an ascii file and convert a data table to a matrix
%
% Syntax:
%  A = txt2mat
%  A = txt2mat(fn)
%  [A,ffn,nh,SR,hl,fpos] = txt2mat(fn [,nh,nc,fmt,SR,SX] )
%  [A, ...]              = txt2mat(fn, ... 'param', value, ...)
%  [A, ...]              = txt2mat(fn, instruct)
%
% with
%
% A     output data matrix
% ffn   full file name
% nh    number of header lines
% hl    header lines (as a string)
% fpos  file position of last character read and converted from ascii file
%
% fn    file or path name ('*' is allowed as wildcard in file name)
% nh    number of header lines
% nc    number of data columns
% fmt   format string
% SR    cell array of replacement strings  sr<i>, SR = {sr1,sr2,...}
% SX    cell array of invalid line strings sx<i>, SX = {sx1,sx2,...}
%
% 'param', value    see below for input parameter/value-pairs
%
% instruct          input struct (each field name corresponds to an input
%                   parameter name)
%
% TXT2MAT reads the ascii file <fn> and extracts the values found in a 
% data table with <nc> columns to a matrix, skipping <nh> header lines. 
% When extracting the data, <fmt> is used as format string for each line   
% (see sscanf online doc for details about the format string). 
%
% If <fn> is an existing directory, or contains an asterisk wildcard in the
% file name, or is an empty string, a file selection dialogue is displayed.
%
% Additional strings <sr1>,<sr2>,.. can be supplied within a cell array
% <SR> to perform single character substitution before the data is
% converted: each of the first n-1 characters of an <n> character string is
% replaced by the <n>-th character.
%
% A further optional input argument is a cell array <SX> containing strings
% <sx1>,<sx2>,.. that mark "bad" lines containing invalid data. If every
% line containing invalid data can be caught by the <SX>, TXT2MAT will
% speed up significantly (see EXAMPLE 3a). Any lines that are recognized to
% be invalid are completely ignored (and there is no corresponding row in
% A). 
%
% If the number of header lines <nh> or the number of data columns <nc> are
% not provided, TXT2MAT performs some automatic analysis of the file format.
% This will need the numbers in the file to be decimals (with decimal point
% or comma) and the data arrangement to be more or less regular (see also
% remark 1). 
% If <nc> is negative, TXT2MAT internally initializes the output matrix <A>
% with |<nc>| columns, but allows for expanding <A> if more numeric values
% are found in any line of the file. To this end, TXT2MAT is forced to
% switch to line by line conversion.
%
% If some lines of the data table can not be (fully) converted, the
% corresponding rows in A are padded with NaNs. 
%
% For further options and to facilitate the argument assignment, the
% param/value notation or an input struct can be used instead of the single
% argument syntax txt2mat(ffn,nh,nc,fmt,SR,SX). For usage see EXAMPLE 3a.
% The following table lists the param/value-pairs and their corresponding
% single argument, if existing:
%
%  Param-string      Value type  Example value                  single arg.
%  'NumHeaderLines'  Scalar      13                                      nh
%  'NumColumns'      Scalar      9                                       nc
%  'Format'          String      ['%d.%d.%d' repmat('%f',1,6)]          fmt
%  'ReplaceChar'     Cell        {')Rx ',';: '}                          SR    
%  'BadLineString'   Cell        {'Warng', 'Bad'}                        SX     
%  'GoodLineString'  Cell        {'2009-08-17'}                           -
%  'SelectLineFun'   FunHandle   @(lineNo) rem(lineNo,2) == 0             -
%  'ReplaceStr'      Cell        {{'True','1'},{'#NaN','#Inf','NaN'}}     -
%  'ReplaceRegExpr'  Cell        {{';\s*(?=;)','; NaN'}}                  -
%  'NumericType'     String      'single'                                 -
%  'RowRange'        2x1-vector  [2501 5000]                              -
%  'FilePos'         Scalar      0                                        -
%  'ReadMode'        String      'auto'                                   -
%  'DialogString'    String      'Now choose a log file'                  -
%  'InfoLevel'       Scalar      1                                        -
%  'MemPar'          Scalar      2^17                                     -
%
% The param/value-pairs may follow the usual arguments in any order, e.g.
% txt2mat('file.txt',13,9,'BadLineString',{'Bad'},'Format','%f'). Only the
% single file name argument must be given as the first input.
%
% Param/value-pairs with additional functionality:
%
% · 'GoodLineString': ignore all lines that do not contain at least one of
%   the strings in the cell array (line filtering analogous to
%   'BadLineString'; see EXAMPLE 3b). 
%
% · 'SelectLineFun': a single argument element-wise Boolean function that
%   is applied to the line numbers. If the function returns 'false' for a
%   certain line number, that line is skipped. Line number counting starts
%   with 1 (one) after the header lines. When using this option, the number
%   of header lines should be passed to txt2mat, too. See EXAMPLE 3c.
%
% · The 'ReplaceStr' argument works similar to the 'ReplaceChar' argument.
%   It just replaces character sequences instead of single characters. A
%   cell array containing at least one cell array of strings must be
%   provided. Such a cell array of strings consists of <n> strings, each of
%   the first <n-1> strings is replaced by the <n>-th string. For example,
%   with {{'R1a','R1b, 'S1'}, {'R2a','R2b','R2c', 'S2'}}
%   all the 'R<n>'-strings are replaced by the corresponding 'S<n>' string.
%   In general, replacing whole strings takes more time than 'ReplaceChar',
%   especially if the strings differ in size.
%   Expression replacements are performed before character replacements.
%
% · By the help of the 'ReplaceRegExpr' argument regular expressions can be
%   replaced. The usage is analogous to 'ReplaceStr'. Regular expression
%   replacements are carried out before any other replacement (see 
%   EXAMPLE 4 and EXAMPLE 5).
%
% · 'NumericType' is one of 'int8', 'int16', 'int32', 'int64', 'uint8',
%   'uint16', 'uint32', 'uint64', 'single', or 'double' (default),
%   determining the numeric class of the output matrix A. If the numeric
%   class does not support NaNs, missing elements are padded with zeros
%   instead. Reduce memory  consumption by choosing an appropriate numeric
%   class, if needed. 
% 
% · The 'RowRange' value is a sorted positive integer two element vector
%   defining an interval of data rows to be converted (header lines do not
%   count, but lines that will be recognized as invalid - see above - do). 
%   If the vector's second element exceeds the number of valid data rows in
%   the file, the data is extracted up to the end of the file. Inf is
%   allowed as second element. It may save memory and computation time if
%   only a small part of data has to be extracted from a huge text file. 
% 
% · The 'FilePos' value <fp> is a nonnegative integer scalar. <fp>
%   characters from the beginning of the file will be ignored, i.e. not be
%   read. If you run TXT2MAT with a 'RowRange' argument, you may
%   use the <fpos> output as an 'FilePos' input during the next run in
%   order to continue from where you stopped. By that you can split up the
%   conversion process e.g. when the file is too big to be read as a whole
%   (see EXAMPLE 6). 
% 
% · 'ReadMode':
%   'matrix'  Read and convert sections of multiple lines simultaneously, 
%             requiring each line to contain the same number of values.
%             Finding an improper number of values in such a section will
%             cause an error (see also remark 2).
%   'line'    Read and convert text line by line, allowing different
%             numbers of values per line (slower than 'matrix' mode).
%   'auto'    Try 'matrix' first, continue with 'line' if an error occurs
%             (default). 
%   'block'   Read and convert sections of multiple lines simultaneously
%             and fill up the data matrix regardless of how many values
%             occur in each text line (EXAMPLE 7). Only a warning is issued
%             if a section's number of values is not a multiple of the
%             number of columns of the output data matrix. This is the
%             fastest mode involving numeric conversion.
%   'char'    Do not convert into a numeric array, but return char vector
%             of the contents including omission of the header lines,
%             replacements, line filtering, and file position and row range
%             selection. Useful for reading and manipulating text files
%             with non-numeric contents, see EXAMPLE 3b.
%             With read mode 'char' the file format analysis is disabled.
%   'cell'    Same as 'char', but put each line of text into a separate
%             cell of the output, see EXAMPLE 3b.
%
% · The 'DialogString' argument provides the text shown in the title bar of
%   the file selection dialogue that may appear.
%
% · The 'InfoLevel' argument controls the verbosity of TXT2MAT's outputs in
%   the command window and the message boxes. Currently known values are: 
%   0, 1, 2 (default)
%
% · The 'MemPar' argument provides the minimum amount of characters TXT2MAT
%   will process simultaneously as an internal text section (= a set of
%   text lines). It must be a positive integer. The value does not affect
%   the outputs, but computation time and memory usage. The roughly
%   optimized default is 65536; usually there is no need to change it. 
%
% -------------------------------------------------------------------------
%
% REMARKS
%
% 1) prerequisites for the automatic file format analysis before the
%    numeric conversion (if the number of header lines and data columns is
%    not given): 
%    · header lines can be detected by either non-numeric characters or
%      a strongly deviating number of numeric items in relation to the
%      data section (<10%)
%    · tab, space, slash, comma, colon, and semicolon are accepted as
%      delimiters (e.g. "10/11/2006 08:30 1; 3.3; 0.52" is ok)
%    · after the optional line filtering and user supplied replacements
%      have been carried out, the data section must contain the delimiters
%      and the decimal numbers only (point or comma are accepted as decimal
%      character).
%    Note I: if you do not trigger the internal file format analysis, i.e.
%    you do provide both the number of header lines and the number of data
%    columns, you also have to care for an eventual decimal _comma_ and
%    non-whitespace delimiters. Such a comma can be replaced with a '.',
%    and the whitespaces can either be included into a suitable format
%    string or be replaced with whitespaces (see e.g. the 'ReplaceChar'
%    argument).
%    Note II: if only the number of header lines is given, any character
%    except '+-1234567890aAeEfFiInN.,' (signs, decimals, NaN, Inf, dot, and
%    comma) that is found during file analysis is regarded as a possible
%    separator and therefore replaced by ' ' (space).
%
% 2) In matrix mode, txt2mat checks that the format string is suitable
%    and that the number of values read from a section of the file is the
%    product of the number of text lines and the number of columns. This
%    may be true even if the number of values per line is not uniform and
%    txt2mat may be misled. So using matrix mode you should be sure that
%    all lines that can't be sorted out by a bad line marker string contain
%    the same number of values.
%
% 3) Since txt2mat.m is a comparatively large file, generating a preparsed
%    file txt2mat.p once will speed up the first call during a matlab
%    session. Set the current directory to where you saved txt2mat.m and
%    type
%    >> pcode txt2mat
%    For further information, see the 'pcode' documentation.
%
% =========================================================================
% EXAMPLE 1:        basic usage
% -------------------------------------------------------------------------
%
% A = txt2mat;      % choose a file and let TXT2MAT analyse its format
%                 
% =========================================================================
% EXAMPLE 2:        automatic file format analysis
% -------------------------------------------------------------------------
%
% Supposed your ascii file C:\mydata.log contains the following lines 
% »
% 10 11 2006 08 35.225 1  3.3  0.52
% 31 05 2008 12 12     0  0.0  0.00
%  7 01 2010 15 23.5  -1  3.3  0.535
% «
% type
%
% A = txt2mat('C:\mydata.log',0,8);
%
% or just
%
% A = txt2mat('C:\mydata.log');
%
% Below, TXT2MAT uses its automatic file layout detection as the header
% line and column number is not given. With the file looking like this:
% » 
% some example data
% plus another header line
% 10/11/2006 08:35,225 1; 3,3; 0,52
% 31/05/2008 12:12     0; 0,0; 0,00
% 7/01/2010  15:23,5  -1; 3,3; 0,535
% «
% txt2mat('C:\mydata.log') returns the same output data matrix as above.
%
% =========================================================================
% EXAMPLE 3a:     	line filtering by 'bad' markers; replacements
% -------------------------------------------------------------------------
%
% »
% ;$FILEVERSION=1.1
% ;$STARTTIME=38546.6741619815
% ;---+--   ----+----  --+--  ----+---  +  -+ -- -- -- 
%      3)         7,2  Rx         0300  8  01 A3 58 4D 
%      4)         7,3  Rx         0310  8  06 6E 2B 9F 
%      5)         9,5  Warng  FFFFFFFF  4  00 00 00 08  BUSHEAVY 
%      6)        12,9  Rx         0320  8  02 E1 F6 EF 
% «
% 
% You may specify 
% nh   = 3              % header lines, 
% nc   = 12             % data columns,
% fmt  = '%f %f %x %x %x %x %x %x'  	% as format string for floats
%                                       % and hexadecimals,  
% sr1  = ')Rx '         % as first replacement string to blank the
%                       % characters ')','R', and 'x' (if you don't want to
%                       % include them in the format string), and
% sr2  = ',.'           % to replace the decimal comma with a dot, and
% sx1  = 'Warng'        % as a marker for invalid lines
%
% A = txt2mat('C:\mydata.log', nh, nc, fmt, {sr1,sr2}, {'Warng'});
%
%   A =
% 		3    7.2    768      8      1    163     88     77
% 		4    7.3    784      8      6    110     43    159
% 		6   12.9    800      8      2    225    246    239
% 		...
% 
% If you make use of the param/value-pairs, you can also write more clearly
%
% t2mOpts = {'NumHeaderLines', 3                         , ...
%            'NumColumns'    , 12                        , ...
%            'ReplaceChar'   , {')Rx ',',.'}             , ...
%            'Format'        , '%f %f %x %x %x %x %x %x' , ...
%            'BadLineString' , {'Warng'}                    };
%        
% A = txt2mat('C:\mydata.log', t2mOpts{:});
% 
% ... or you simply use an input struct
%
% t2mIns.NumHeaderLines  = 3;
% t2mIns.NumColumns      = 12;
% t2mIns.ReplaceChar     = {')Rx ',',.'};
% t2mIns.Format          = '%f %f %x %x %x %x %x %x';
% t2mIns.BadLineString   = {'Warng'};
%        
% A = txt2mat('C:\mydata.log', t2mIns);
% 
% Without the {'Warng'} argument, A would have been
%
% 		3    7.2    768      8      1    163     88     77
% 		4    7.3    784      8      6    110     43    159
% 		5    9.5    NaN    NaN    NaN    NaN    NaN    NaN
% 		6   12.9    800      8      2    225    246    239
% 		...
%
% =========================================================================
% EXAMPLE 3b:       line filtering by 'good' markers; return char or cell
% -------------------------------------------------------------------------
%
% »
% Some colours and numbers
% 1 yellow 1 0 0
% 2 green  7 8 10
% 3 red    0 0 0
% 4 green  8 8 9
% 5 green  9 7 7
% 6 yellow 0 2 1
% «
%
% If you only want the data from the lines containing the string 'green':
%
% t2mOpts = {'NumHeaderLines', 1                , ...
%            'NumColumns'    , 4                , ...
%            'Format'        , '%f %*s %f %f %f', ...
%            'GoodLineString', {'green'}           };
%        
% A = txt2mat('C:\mydata.log', t2mOpts{:});
%
%   A =
%       2     7     8    10
%       4     8     8     9
%       5     9     7     7
%
% If you want to obtain those lines as text, use read mode 'char':
%
% t2mOpts = {'NumHeaderLines', 1         , ...
%            'GoodLineString', {'green'} , ...
%            'ReadMode'      , 'char'       };
%
% [A,ffn,nh,SR,hl] = txt2mat('C:\mydata.log', t2mOpts{:});
%
%   A =
%       2 green  7 8 10
%       4 green  8 8 9
%       5 green  9 7 7
%
% whos A
%   Name      Size            Bytes  Class    Attributes
%   A         1x49               98  char
%
% Some examples of what you could do with the char vector A:
%
%     - write it to a new file:
% 
%     fid = fopen('C:\mynewdata.log','w');
%     fwrite(fid, hl);  % write header
%     fwrite(fid, A);   % write data
%     fclose(fid);
% 
%     »
%     Some colours and numbers
%     2 green  7 8 10
%     4 green  8 8 9
%     5 green  9 7 7
%     «
% 
%     - proceed with functions like textscan:
% 
%     C = textscan(A,'%f %s %f %f %f');
% 
%      C = {[2;4;5], {'green';'green';'green'}, [7;8;9], [8;8;7], [10;9;7]}
%
% To put each line into a separate cell of a cell array of strings, use the
% very similar read mode 'cell':
%
% t2mOpts = {'NumHeaderLines', 1         , ...
%            'GoodLineString', {'green'} , ...
%            'ReadMode'      , 'cell'       };
%
% [A,ffn,nh,SR,hl] = txt2mat('C:\mydata.log', t2mOpts{:});
%
%   A = 
%           '2 green  7 8 10'
%           '4 green  8 8 9'
%           '5 green  9 7 7'
%
% =========================================================================
% EXAMPLE 3c:       line filtering by line number
% -------------------------------------------------------------------------
%
% »
% line number and magic
%  1    30    39    48     1    10    19    28
%  2    38    47     7     9    18    27    29
%  3    46     6     8    17    26    35    37
%  4     5    14    16    25    34    36    45
%  5    13    15    24    33    42    44     4
%  6    21    23    32    41    43     3    12
%  7    22    31    40    49     2    11    20
% «
% 
% If you only want to read every 3rd line starting from line 4:
%
% N  = 3;
% n1 = 4;
% selFun  = @(L) rem(L,N)==rem(n1,N) & L>=n1;
%
% fn      = 'C:\mydata.txt';
% t2mOpts = {'NumHeaderLines', 1      , ...
%            'SelectLineFun' , selFun    };
%        
% [A,ffn,nh,SR,hl] = txt2mat(fn, t2mOpts{:});
%
% A =
%      4     5    14    16    25    34    36    45
%      7    22    31    40    49     2    11    20
%
%
% Reading every 2nd line from line 3 to 6:
%
% N  = 2;
% selFun  = @(L) rem(L,N)==1;
% 
% t2mOpts = {'NumHeaderLines', 1       , ...
%            'RowRange'      , [3,6]   , ...
%            'SelectLineFun' , selFun  };
%        
% [A,ffn,nh,SR,hl] = txt2mat_06_56(fn, t2mOpts{:});
%
% A =
%      3    46     6     8    17    26    35    37
%      5    13    15    24    33    42    44     4
%
% =========================================================================
% EXAMPLE 4:        regular expression replacements
% -------------------------------------------------------------------------
%
% Supposed your ascii file C:\mydata.log begins with the following lines:
% »
% datetime	%	ppm	%	ppm	Nm
% datetime	real8	real8	real8	
% 30.10.2006 14:24:06,131	6,4459	478,519	6,5343	
% 30.10.2006 14:24:17,400	6,4093	484,959	6,5343	
% 30.10.2006 14:24:17,499	6,4093	484,959	6,5343	
% «
% you might specify 
% nh   = 2          % header lines, 
% nc   = 9          % data columns,
% fmt  = ['%d.%d.%d' repmat('%f',1,6)] % as format string for
%                                      % integers and hexadecimals,  
% sr1  = ': '       % as first replacement string to blank the ':'
% sr2  = ',.'       % to replace the decimal comma with a dot, and
%
% A = txt2mat('C:\mydata.log', nh, nc, fmt, {sr1,sr2});
%
%   A =
% 		30  10  2006  14  24   6.131  6.4459  478.519  6.5343
% 		30  10  2006  14  24  17.4    6.4093  484.959  6.5343
% 		30  10  2006  14  24  17.499  6.4093  484.959  6.5343
%       ...
% 
% 
% A = txt2mat('C:\mydata.log','ReplaceRegExpr',{{'\.(\d+)\.',' $1 '}});
%
% yields the same result, but uses the built-in file layout analysis to
% determine the number of header lines, the number of columns, the
% delimiters, and the decimal character. You only help TXT2MAT by
% telling it to replace dots surrounding the month number with spaces via
% the regular expression replacement. So you can use the latter command on
% similar files which have a different or previously unknown number of
% header lines etc., too. 
%
% =========================================================================
% EXAMPLE 5:        regular expression replacements
% -------------------------------------------------------------------------
%
% If the data table of your file contains some gaps that can be identified
% by some repeated delimiters (here ;)
% »
% ; 02; 03; 04; 05;
% 11; ; 13; 14; 15;
% 21; ; 23; ;;
% ; 32; 33; 34; 35;
% «
% you can fill them with NaNs by the help of 'ReplaceRegExpr':
%
% A = txt2mat('C:\mydata.log','ReplaceRegExpr',...
%                       {{'((?<=;\s*);)|(^\s*;)','NaN;'}});
%
%   A =
%        NaN     2     3     4     5
%         11   NaN    13    14    15
%         21   NaN    23   NaN   NaN
%        NaN    32    33    34    35
%    
% =========================================================================
% EXAMPLE 6:        processing a file in chunks
% -------------------------------------------------------------------------
% 
% If you want to process the contents of mydata.log step by step,
% converting one million lines at a time:
%
% fp  = 0;          % file position to start with (beginning of file)
% A   = NaN;        % initialize output matrix
% nhl = 12;         % number of header lines for the first call
% 
% while numel(A)>0
%     [A,ffn,nh,SR,hl,fp] = txt2mat('C:\mydata.log','RowRange',[1,1e6], ...
%                                   'FilePos',fp,'NumHeaderLines',nhl);
%     nhl = 0;      % there are no further header lines
%
%     % process intermediate results...
% end
% 
% =========================================================================
% EXAMPLE 7:        read mode 'block' and 'line'
% -------------------------------------------------------------------------
% 
% You can use the read mode 'block' on very large files with a constant
% number of values per line to save some import time compared to the
% 'matrix' mode. Besides, since TXT2MAT then does not check for line breaks
% within the (internally processed) sections of a file, you can use the
% block mode to fill up any output matrix with a fixed number of columns.
% »
%  1  2  3  4  5
%  6  7  8  9 10
%    
% 11 12 13 14 15
% 16 17 18 19 20
% 21 22
% 23 24 25
% 26 27 28 29 30
%
% «
% 
% A = txt2mat('C:\mydata.txt',0,5,'ReadMode','block')
% 
% A =
%      1     2     3     4     5
%      6     7     8     9    10
%     11    12    13    14    15
%     16    17    18    19    20
%     21    22    23    24    25
%     26    27    28    29    30
%
%
% Instead, if you want to preserve the line break information, use the
% (slower) read mode 'line': 
%
% A = txt2mat('C:\mydata.txt',0,5,'ReadMode','line')
%
% or
%
% A = txt2mat('C:\mydata.txt',0,-1)
%
% A =
%      1     2     3     4     5
%      6     7     8     9    10
%    NaN   NaN   NaN   NaN   NaN
%     11    12    13    14    15
%     16    17    18    19    20
%     21    22   NaN   NaN   NaN
%     23    24    25   NaN   NaN
%     26    27    28    29    30
%
% The first command reads up to 5 elements per line, starting from the
% first, and puts them to a Nx5 matrix, whereas the second one
% automatically expands the column size of the output to fit in the maximum
% number of elements occuring in a line. This is effected by the negative
% column number argument that also implies read mode 'line' here.
%  
% =========================================================================
%
%   See also SSCANF


% --- Author: -------------------------------------------------------------
%   Copyright 2005-2014 Andres
%   $Revision: 6.60.0 $  $Date: 2014/03/23 21:52:03 $
% --- E-Mail: -------------------------------------------------------------
% x=-2:3;
% disp(char(round([polyval([-0.32,0.43,1.75,-5.90,-0.95,116],x),...
%                  polyval([-4.44,9.12,29.8,-33.6,-52.9, 98],x)])))
% you may also contact me via the author page
% http://www.mathworks.com/matlabcentral/fileexchange/authors/30255
% --- History -------------------------------------------------------------
% 05.61
%   · fixed bug: possible wrong headerlines output when using 'FilePos'
%   · fixed bug: produced an error if a bad line marker string was already
%     found in the first data line 
%   · corrected user information if sscanf fails in matrix mode
%   · added some more help lines
% 05.62
%   · allow negative NumColumns argument to capture a priori unknown
%     numbers of values per line
% 05.82 beta
%   · support regular expression replacements ('ReplaceRegExpr' argument)
%   · consider user supplied replacements when analysing the file layout
% 05.86 beta
%   · some code clean-up (argincheck subfunction, ...)
% 05.86.1
%   · fixed bug: possibly wrong numeric matlab version number detection
% 05.90
%   · consider skippable lines when analysing the file layout
%   · code rearrangements (subfun for line termination detection, ...)
% 05.96
%   · subfuns to find line breaks / bad-line pos and to initialize output A
%   · better handling of errors and 'degenerate' files, e.g. exit without
%     an error if the file selection dialogue was cancelled 
% 05.97
%   · fixed bug: error in file analysis if first line contains bad line
%     marker
%   · fixed bug: a bad line marker is ignored if the string is split up by
%     two consecutive internal sections
%   · better code readability in findLineBreaks subfunction
% 05.97.1
%   · simplifications by skipping the header when reading from the file;
%     the header is now read separately and is not affected by any
%     replacements
%   · corrected handling of bad line markers that already appear in header
% 05.98
%   · corrected search for long bad line marker strings that could exceed
%     text dimensions
%   · speed-up by improved finding of line break positions
% 06.00
%   · introduction of 'high speed' read mode "block" requiring less line
%     break information
%   · 'MemPar' buffer value changed to scalar
%   · reduced memory demand by translating smaller text portions to char
%   · modified help
% 06.01
%   · fixed bug: possible error message in file analysis when only header
%     line number is given
% 06.04
%   · better handling of replacement strings containing line breaks
%   · allow '*' in file name to use file name as open file dialogue filter
% 06.12
%   · 'good line' filter as requested by Val Schmidt
% 06.17.1
%   · enable 'good line' filtering during automatic file analysis
%   · new read modes 'char' and 'cell' to provide txt2mat's preprocessing
%     features esp. for non-numeric data, too
% 06.17.3
%   · version number workaround for MCR execution (Leonard's remark)
% 06.40
%   · input argument check by inputparser (R2007a), allowing input struct
%   · minor changes in code and documentation
% 06.60
%   · added option to select lines by line number ('SelectLineFun'), e.g. 
%     to skip every n-th line as suggested by Kaare 
%   · reduced memory footprint and improved speed during good/bad line
%     filtering by working in chunks
%
% --- Wish list -----------------------------------------------------------


%% Get input arguments

% check the arguments by argincheck:
arg = argincheck(varargin);
% returns
%       arg.val.(argname)  ->  value of the input
%       arg.has.(argname)  ->  T/F argument was given
%       arg.num.(argname)  ->  number of values for some non-scalar inputs

% some abbreviations
ffn       = arg.val.FileName;
numHeader = arg.val.NumHeaderLines;
numColon  = arg.val.NumColumns;
readMode  = arg.val.ReadMode;
formatStr = arg.val.Format;
repChar   = arg.val.ReplaceChar;
filePos   = arg.val.FilePos;
memPar    = arg.val.MemPar;
numRC     = arg.num.ReplaceChar;
numRS     = arg.num.ReplaceStr;
numRR     = arg.num.ReplaceRegExpr;
numBL     = arg.num.BadLineString;
numGL     = arg.num.GoodLineString;

% ~~~~~ special handling of file name argument arg.val.FileName ~~~~~~~~~~~
% 1) no file or path name is given -> open file dialogue
if ~arg.has.FileName || isempty(ffn)
    [fn,pn] = uigetfile('*.*', arg.val.DialogString);
    ffn = fullfile(pn,fn);
% 2) a path name is given -> open file dialogue with *.* filter spec
elseif exist(ffn,'dir') == 7
    curcd = cd;
    cd(ffn);                   
    [fn,pn] = uigetfile('*.*', arg.val.DialogString);
    ffn = fullfile(pn,fn);
    cd(curcd);
% 3) a valid file name is given -> take it as it is
elseif exist(ffn,'file') 
	[fname,fname,ext] = fileparts(ffn); %#ok<ASGLU>
	fn = [fname,ext];
% 4) an asterisk in the file name -> open file dialogue, use filter spec
%    - OR -
%    nonexisting file -> produce error message and return
else
    [pathstr, fname, ext] = fileparts(ffn);
    doOpenDialog = (isempty(pathstr) || exist(pathstr,'dir')==7) && ...
                   numel(strfind([fname, ext], '*')) > 0;
               
    if doOpenDialog
        if ~isempty(pathstr)
            curcd = cd;
            cd(pathstr)
        end
        
        [fn,pn] = uigetfile({[fname, ext];'*.*'}, arg.val.DialogString);
        ffn = fullfile(pn,fn);
        
        if ~isempty(pathstr)
            cd(curcd);
        end
    else
        % wrong name
        error('txt2mat:invalidFileName','no such file or directory'); 
    end
end

% recheck file name (necessary e.g. after ESC in open file dialogue)
if exist(ffn,'file')~=2
    [A,ffn,numHeader,repChar,hl,fpos] = deal([]);
    if arg.val.InfoLevel>=1
        disp('Exiting txt2mat: No existing file given.')
    end
    return
end

% generate a shortened form of the file name:
if length(fn) < 28
    fnShort = fn;
else
    fnShort = ['...' fn(end-24:end)];
end

arg.val.FileName = ffn;
% ~~~~~ special handling of file name argument arg.val.FileName ~~~~~~~end~

clear varargin

%% Analyze data format

% try some automatic data format analysis if needed (by function anatxt)
doAnalyzeFile = ~all([arg.has.NumHeaderLines, arg.has.NumColumns]); %, is_argin_conv_str]); % commented out as so far anatxt's formatStr is only '%f'
% switch off file analysis if read mode is 'char' or 'cell'
doAnalyzeFile = doAnalyzeFile && ~strcmpi(arg.val.ReadMode,'char') && ~strcmpi(arg.val.ReadMode,'cell');

if doAnalyzeFile 
    % call subfunction anatxt:
    [anaNumHeader, anaNumColon, ~, anaRepChar, anaReadMode, ...
        anaNumAnalyzed, anaHeader, anaFileErr, anaErr] = anatxt(arg); 
    % quit if errors occurred
    if ~isempty(anaErr)
        [A,repChar,fpos,hl] = deal([]);
        numHeader = anaNumHeader;
        if arg.val.InfoLevel>=1
            disp(['Exiting txt2mat: file analysis: ' anaErr])
        end
        return
    end
        
    % accept required results from anatxt:
    if ~arg.has.NumHeaderLines
        numHeader = anaNumHeader;
    end
    if ~arg.has.NumColumns
        numColon = anaNumColon;
    end
    %if ~arg.has.Format      % unused
    %    formatStr = anaFormat;
    %end
    if ~arg.has.ReadMode
        readMode = anaReadMode;
    end
    % add new replacement character strings from anatxt:
    isNewRC	= ~ismember(anaRepChar, repChar);
    numRC   = numRC + sum(isNewRC);
    repChar = [repChar,anaRepChar(isNewRC)];
    % display information:
    if arg.val.InfoLevel >= 1
        disp(repmat('*',1,length(ffn)+2));
        disp(['* ' ffn]);
        if numel(anaFileErr)==0
            sr_display_str = '';
            for idx = 1:numRC;
                sr_display_str = [sr_display_str ' »' repChar{idx} '«']; %#ok<AGROW>
            end
            disp(['* read mode: ' readMode]);
            disp(['* ' num2str(anaNumAnalyzed)        ' data lines analysed' ]);
            disp(['* ' num2str(numHeader)     ' header line(s)']);
            disp(['* ' num2str(abs(numColon)) ' data column(s)']);
            disp(['* ' num2str(numRC)         ' string replacement(s)' sr_display_str]);
        else
            disp(['* fread error: ' anaFileErr '.']);
        end
        disp(repmat('*',1,length(ffn)+2));
    end % if
    
    % return if anatxt did not detect valid data
    if anaNumColon==0
        A = [];
        hl = '';
        fpos = filePos;
        return
    end
end


%% Detect line termination character

if arg.val.InfoLevel >= 1
    hw = waitbar(0,'detect line termination character ...');
    set(hw,'Name',[mfilename ' - ' fnShort]);
    hasWaitbar = true;
else
    hasWaitbar = false;
end

lbfull = detectLineBreakCharacters(ffn);
%   lbfull  line break character(s) as uint8, i.e.
%           [13 10]     (cr+lf) for standard DOS / Windows files
%           [10]        (lf) for Unix files
%           [13]        (cr) for Mac files
% The DOS style values are returned as defaults if no such line breaks are
% found.

lbuint = lbfull(end);      
lbchar = char(lbuint);
numLbfull = numel(lbfull);     

%% Open file and set position indicator to end of header
% ... and extract header separately if not already done

logfid = fopen(ffn);
if numHeader > 0
    if doAnalyzeFile % header lines have already been extracted
        hl = anaHeader;
        lenHeader = numel(hl);
        fseek(logfid,filePos+lenHeader,'bof');
    else
        if arg.has.FilePos
            fseek(logfid,filePos,'bof');
        end

        %*% todo: use function getLines here
        read_len = 65536;   % (quite small) size of text sections just for header line extraction
        do_read  = true;
        num_lb_curr = 0;
        countLoop = 0;
        while do_read
            [f8p,lenf8p]    = fread(logfid,read_len,'*uint8');	% current text section

            ldcp_curr       = find(f8p==lbuint);                % line break positions in current text section
            num_lb_curr     = num_lb_curr + numel(ldcp_curr);   % number of line breaks so far
            
            do_read         = (lenf8p == read_len) && (num_lb_curr < numHeader);
            countLoop       = countLoop + 1;
        end
        
        if num_lb_curr >= numHeader
            lenHeader = ldcp_curr(end-(num_lb_curr-numHeader)) + (countLoop-1)*read_len;
            if countLoop == 1
                % take the complete header from the first section
                hl = char(f8p(1:lenHeader)).';
                fseek(logfid,filePos+lenHeader,'bof');
            else
                % the header did not fit into a single section, so re-read
                % it as a whole
                fseek(logfid,filePos,'bof');
                hl = char(fread(logfid,lenHeader).');
            end
        else 
            % exit here as we have found less line breaks than the given
            % number of header lines!
            fseek(logfid,filePos,'bof');
            hl = char(fread(logfid).');
            fpos = ftell(logfid);
            fclose(logfid);
         	[A,repChar] = deal([]);
            if arg.val.InfoLevel>=1
                disp(['Exiting txt2mat: '  num2str(numHeader) ' header lines expected, but only ' num2str(num_lb_curr) ' line breaks found.'])
                close(hw)
            end
          	return
            
        end
    end
else
    lenHeader = 0;
    hl = '';
    if arg.has.FilePos
     	fseek(logfid,filePos,'bof');
    end
end

%% Read in ASCII file - case 1: portions only, as RowRange is given.
% RowRange should be given if the file is too huge to be read at once by
% fread. In this case multiple freads are used to read in consecutive
% sections of the text. By counting the line breaks those rows of the text
% that match the RowRange argument are added to the 'core' variable f8 that
% is later used for the numeric conversion.

% By definition, a line begins with its first character and ends with its
% last termination character.

if hasWaitbar
    waitbar(0.01,hw,'reading file ...');
end

if arg.has.RowRange
    do_read             = true;     % loop condition
    num_lb_prev         = 0;
    read_len            = memPar;
    f8                  = [];
    while do_read
        [f8p,lenf8p]  = fread(logfid,read_len,'*uint8');  	% current text section

        ldcp_curr       = find(f8p==lbuint);                % line break positions in current text section
        num_lb_curr     = numel(ldcp_curr);

        % add lines of interest to f8
        if (arg.val.RowRange(1) <= num_lb_prev+num_lb_curr+1) && (num_lb_prev < arg.val.RowRange(2))

            if arg.val.RowRange(1) <= num_lb_prev + 1	% lines of interest started before current section
                sdx = 1;                                        % start index is beginning of section => the part of the section to be added to f8 includes the start of the section 
            else                                                % lines of interest start within current section
                num_lines_to_omit = arg.val.RowRange(1)-1-num_lb_prev;  % how many lines not to add
                sdx = ldcp_curr(num_lines_to_omit)+1;         	% start right after the omitted lines
            end

            if arg.val.RowRange(2) > num_lb_curr+num_lb_prev    % lines of interest end beyond current section
                edx = lenf8p;                                   % end index is length of section => the part of the section to be added to f8 includes the end of the section 
            else                                                % lines of interest end within current section
                num_lines_to_add = arg.val.RowRange(2)-num_lb_prev;     % how many lines to add
                edx = ldcp_curr(num_lines_to_add);             	% corresponding end index
            end

            f8 = [f8; f8p(sdx:edx)]; %#ok<AGROW>
            fpos = ftell(logfid)-lenf8p+edx;       % position of the latest added character 
        end

        % quit loop if all rows of interest are read or if end of file is reached 
        if num_lb_prev >= arg.val.RowRange(2) || lenf8p<read_len
            do_read = false;
        end
        num_lb_prev          = num_lb_prev + num_lb_curr;  	% absolute number of dectected line breaks
    end
    
end
%% Read in ASCII file - case 2: full file. Then close file.

if ~arg.has.RowRange
    [f8,fcount]  = fread(logfid,Inf,'*uint8');
    fpos = fcount + filePos + lenHeader;
end

if ftell(logfid) == -1
    error(ferror(fid, 'clear'));
end

fclose(logfid); 

if numel(f8)==0
    A = [];
    if arg.val.InfoLevel>=1
        disp('Exiting txt2mat: no numeric data found.')
        close(hw)
    end
    return
end


%% Clean up whitespaces at the end of file

f8 = cleanUpFinalWhitespace(f8,lbfull);


%% check line break position awareness

hasReplacements = any([numRC,numRS,numRR] > 0 );

% as finding the line breaks is time-critical, "LbAwareness" is
% introduced to tell us what we know about line break positions:
% 0: nothing
% 1: the positions of the final line break in every section
% 2: the above + the number of lines up to each of those line breaks
% 3: all line break positions

% determine the minimum reqired LbAwareness, and set a waitbar progress
% factor >1 if there's no sscanf read: 
wbFactor = 1;
switch lower(readMode)
    case 'char'
        minLbAwareness = double(hasReplacements);
        wbFactor = 2;
    case 'block'
        minLbAwareness = 1;
    case {'matrix','auto'}
        minLbAwareness = 2;
    case 'line'
        minLbAwareness = 3;
    case 'cell'
        minLbAwareness = 3;
        wbFactor = 1.9;
end


%% filter lines (rows)

% select lines by line number and 'bad' and/or 'good' marker strings

if arg.has.SelectLineFun || (numBL + numGL > 0)
    if hasWaitbar
        waitbar(wbFactor*0.10,hw,'filtering lines ...');
    end
    [f8, idcLb, cntLb, secLbIdc] = filterLines(f8, lbuint, memPar, arg);
    LbAwareness = 3;
else
    LbAwareness = 0;
end


%% Find line break positions if necessary

if LbAwareness < minLbAwareness
    
    if hasWaitbar
        waitbar(wbFactor*0.20,hw,'updating line break positions ...');
    end
    
    % Find out if we have to expect text length changes due to the
    % replacemets
    doExpectLengthChange = false;   % default
    if numRR > 0
        % always expect changes by regular expressions
        doExpectLengthChange = true;
    else
        % check for string replacements that will change the length
        for edx = 1:numRS
            if any(diff(cellfun('length', arg.val.ReplaceStr{edx})))
                doExpectLengthChange = true;
                break
            end
        end
    end
    
    if doExpectLengthChange || strcmpi(readMode,'block')
        % - make K1
        doFindAll = false;
        doCount   = false;
        LbAwareness = 1;
    else
        if strcmpi(readMode,'line') || strcmpi(readMode,'cell')
        	% - make K3
            doFindAll = true;
            doCount   = true;
            LbAwareness = 3;
        else  % readmode is 'auto' or 'matrix'
            % - make K2
            doFindAll = false;
            doCount   = true;
            LbAwareness = 2;
        end
    end

    [idcLb,cntLb,secLbIdc] = findLineBreaks(f8, lbuint, memPar, doFindAll, doCount);
end

%% Replace (regular) expressions and characters

doReplaceLb = false;   % default, to be checked below

if numRR > 0
    has_length_changed = true;
else
    has_length_changed = false; % flag for changes of length of f8 by replacements
end

if hasReplacements
    if hasWaitbar
        waitbar(wbFactor*0.20,hw,'replacing strings ...');
    end

    numSectionLb = numel(secLbIdc);

    % If a ReplaceStr begins with a line break character, such a character
    % will temporarily be prepended to each replacement section to apply
    % the replacement to the _first_ line of a section, too.
    % Besides, check for any occurence of the break character in the
    % ReplaceStr in order to preventively trigger an update of the line
    % break positions afterwards.
    % Set defaults before checking:
    doPrependLb = false;   
    numPrepend  = 0;       
    if numRS>0
        % put all the characters from the ReplaceStr strings into an
        % uint8-array:
        uint8Replace = uint8(char([arg.val.ReplaceStr{:}]));
        % check if any row starts with a line break:
        if any(uint8Replace(:,1)==lbuint)
            doPrependLb = true;
            numPrepend  = 1;
        end
        if any(uint8Replace(:)==lbuint)
            doReplaceLb = true;
        end
    end
    
    for sdx = 2:numSectionLb
        
        if doPrependLb
            f8_akt = char([lbuint, f8(idcLb(secLbIdc(sdx-1))+1 : idcLb(secLbIdc(sdx))).']);
        else
            f8_akt = char(f8(idcLb(secLbIdc(sdx-1))+1 : idcLb(secLbIdc(sdx))).');
        end
        
        if numRS > 0 || numRR > 0
            len_f8_akt = idcLb(secLbIdc(sdx)) - idcLb(secLbIdc(sdx-1));  % length of current section before replacements

            % Replacements, e.g. {'odd','one','1'} replaces 'odd' and 'one' by '1'

            % Regular Expression Replacements: ============================
            for vdx = 1:numRR                  % step through replacements arguments
                srarg = arg.val.ReplaceRegExpr{vdx};    	% pick a single argument...

                for xdx = 1:(numel(srarg)-1)
                    f8_akt = regexprep(f8_akt, srarg{xdx}, srarg{end});     % ... and perform replacements
                end % for

            end % for

            % Expression Replacements: ====================================
            for vdx = 1:numRS                  % step through replacements arguments
                srarg = arg.val.ReplaceStr{vdx};    	% pick a single argument...

                for xdx = 1:(numel(srarg)-1)
                    f8_akt = strrep(f8_akt, srarg{xdx}, srarg{end});        % ... and perform replacements
                    if ~has_length_changed && (len_f8_akt~=numel(f8_akt))
                        has_length_changed = true;                          % detect a change of length of f8
                    end
                end % for

            end % for

            % update f8-sections by f8_akt ================================
            exten = numel(f8_akt) - len_f8_akt;	% extension by replacements
            
            if exten == 0   
                if doPrependLb
                    f8( idcLb(secLbIdc(sdx-1))+1 : idcLb(secLbIdc(sdx)) ) = uint8(f8_akt(1+numPrepend:end)).';
                else
                    f8( idcLb(secLbIdc(sdx-1))+1 : idcLb(secLbIdc(sdx)) ) = uint8(f8_akt).';
                end
            else   
                if doPrependLb
                    f8 = [f8(1:idcLb(secLbIdc(sdx-1))); uint8(f8_akt(1+numPrepend:end)).'; f8(idcLb(secLbIdc(sdx))+1:end)];
                else
                    f8 = [f8(1:idcLb(secLbIdc(sdx-1))); uint8(f8_akt).'                  ; f8(idcLb(secLbIdc(sdx))+1:end)];
                end
                % update linebreak indices of the following sections
                % (but we don't know the lb indices of the current one anymore):
                idcLb(secLbIdc(sdx:end)) = idcLb(secLbIdc(sdx:end)) + exten;
            end
            
        end % if numRS > 0 || numRR > 0
        
        % Character Replacements: =========================================
        for vdx = 1:numRC                  % step through replacement arguments
            srarg = repChar{vdx};       % pick a single argument
            for xdx = 1:(numel(srarg)-1)
                rep_idx = idcLb(secLbIdc(sdx-1))+strfind(f8_akt,srarg(xdx))-numPrepend;
                f8(rep_idx) = uint8(srarg(end));   % perform replacement
            end % for
        end
        
        if hasWaitbar && ~mod(sdx,256)
            waitbar(wbFactor*(0.20+0.25*((sdx-1)/(numSectionLb-1))),hw)
        end
        
    end

    clear f8_akt
end % if


%% ReadMode 'char': exit here with char array

if strcmpi(readMode,'char')
    A=char(f8.');
    if arg.val.InfoLevel>=1
        close(hw)
    end
    return
end


%% Update linebreak indices
% see above...

% if the final line break might have changed, clean up trailing whitespaces
% here again
if doReplaceLb || numRR > 0
    f8 = cleanUpFinalWhitespace(f8,lbfull);
end

if has_length_changed || (LbAwareness < minLbAwareness) || doReplaceLb
    if hasWaitbar
        waitbar(0.45,hw,'updating line break positions ...');
    end

    if strcmpi(readMode,'block')
        % - make K1
        doFindAll = false;
        doCount   = false;
        LbAwareness = 1;
    elseif strcmpi(readMode,'line') || strcmpi(readMode,'cell')
        % - make K3
        doFindAll = true;
        doCount   = true;
        LbAwareness = 3;
    else  % readmode is 'auto' or 'matrix'
        % - make K2
        doFindAll = false;
        doCount   = true;
        LbAwareness = 2;
    end

    [idcLb,cntLb,secLbIdc] = findLineBreaks(f8, lbuint, memPar, doFindAll, doCount);
end

% Determine the total number of line breaks (including the leading 'zero'
% line break and the eventually added final line break) depending on
% LbAwareness. If LbAwareness is less than 2, we can't know that number.
if LbAwareness == 2
    num_lf = cntLb(end)+1;
elseif LbAwareness == 3
    num_lf = numel(idcLb);
else
    num_lf = NaN;
end

%% ReadMode 'cell': return lines in a cell array

if strcmpi(readMode,'cell')
    
    f8 = char(f8).';
    
    % A = arrayfun(@(m,n) {(f8(m:n)}, ...
    %                         lf_idc(1:end-1)+1,lf_idc(2:end)-num_lbfull);
    % but arrayfun is slower here, so use V6 code (for loop) only:
    
    A = repmat({''},num_lf-1,1);
    for m = 1:num_lf-1
        A{m} = f8(idcLb(m)+1:idcLb(m+1)-numLbfull);
    end

    if arg.val.InfoLevel>=1
        close(hw)
    end
    return

end

%% ReadMode 'block': wilfully fill up output matrix

if strcmpi(readMode,'block')
    
    if hasWaitbar
        waitbar(0.5,hw,'converting in ''block'' mode ...');
    end
    
    numColonBlock   = abs(numColon);    % number of columns in output matrix
    isNumelOk       = true;             % initialize flag "in every section the number of elements is a multiple of number of columns"
    numSectionLb    = numel(secLbIdc);  % 1 + number of sections to process
    doSetNan        = true;             % flag "output matrix will be initialized with NaNs"
    
    % convert first section ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    startIdcF8 = idcLb(secLbIdc(1))+1;
    endIdcF8   = idcLb(secLbIdc(2));
    
    % THE conversion of this section by sscanf:
    [Atmp,count,errmsg,nextindex] = ...
            sscanf(char(f8(startIdcF8 : endIdcF8)), formatStr); %#ok<ASGLU>
    numAtmp = numel(Atmp);
	
    % examine how many elements we found in this section
    numRowsCurr      = ceil(numAtmp/numColonBlock);         % how many rows will contain these elements
    numelMissing     = numRowsCurr*numColonBlock-numAtmp;   % how many elements are missing to fill up the last of these rows

    a = initializeMatrix(1,1,arg.val.NumericType,doSetNan);
    
    if numSectionLb < 3
        % there is only one section, so just generate the final output
        % matrix here:
        A = reshape([Atmp;repmat(a,numelMissing,1)],numColonBlock,numRowsCurr).';
        if numelMissing>0
            isNumelOk = false;
        end
    else
        % there are multiple sections, so initialize the output matrix
        % first ...
        if isnan(num_lf)
              % guess final size of A for preallocating
              expandFactor   = diff(idcLb(secLbIdc([1,end])))/diff(idcLb(secLbIdc([1,2])));
              numRowsGuessed = round(numRowsCurr * expandFactor);
        else
            numRowsGuessed = num_lf;
        end
        A = initializeMatrix(numRowsGuessed,numColonBlock,arg.val.NumericType,doSetNan);

        % ... and put the first elements to it:
        startRow = 1;
        endRow   = numRowsCurr;
        Atmp = reshape([Atmp;repmat(a,numelMissing,1)],numColonBlock,numRowsCurr).';
        A(startRow:endRow,1:numColonBlock) = Atmp;

        % If the first section was incomplete, the first elements of the
        % second section will be added to the last row of the first
        % section. So keep in mind the elements of the incomplete row here:
        if numelMissing>0
            isNumelOk = false;
            repeatRow = 1;
            ARepeat = A(endRow,1:(numColonBlock-numelMissing)).';
        else
            repeatRow = 0;
            ARepeat = [];
        end

        % now step through the following sections
        for sdx = 2:numSectionLb-1

            % the text positions of the current section:
            startIdcF8 = idcLb(secLbIdc(sdx))+1;
            endIdcF8   = idcLb(secLbIdc(sdx+1));

            % THE conversion of this section by sscanf:
            [Atmp,count,errmsg,nextindex] = ...
                sscanf(char(f8(startIdcF8 : endIdcF8)), formatStr); %#ok<ASGLU>
            numAtmp = numel(Atmp);
            if numAtmp == 0
                Atmp = double(Atmp);
            end

            % as with the first section, add the new values the output
            % matrrix
            numRowsCurr  = ceil( (numAtmp-numelMissing) / numColonBlock );
            numelMissing = numRowsCurr*numColonBlock-(numAtmp-numelMissing);
            startRow     = endRow+1-repeatRow;
            endRow       = endRow+numRowsCurr;
            
            Atmp = reshape([ARepeat;Atmp;repmat(a,numelMissing,1)],numColonBlock,numRowsCurr+repeatRow).';
            A(startRow:endRow,1:numColonBlock) = Atmp;     
            
            % remember elements of an incomplete row for the next section
            if numelMissing>0
                isNumelOk = false;
                repeatRow = 1;
                ARepeat = A(endRow,1:(numColonBlock-numelMissing)).';
            else
                repeatRow = 0;
                ARepeat = [];
            end
            
            if hasWaitbar && ~mod(sdx,256)
                waitbar(0.5+0.5*((sdx-1)/(numSectionLb-1)),hw)
            end
            
        end
        
        if numRowsGuessed > endRow
            A = A(1:endRow,:);
            % A(endRow+1:numRowsGuessed,:) = [];
        end
        
    end
    
    if ~isNumelOk
        warning('txt2mat:NumberOfElements', 'Number of elements did not fill up a complete row')
    end
        
end

%% ReadMode 'matrix': try converting large sections
% sscanf will be applied to consecutive working sections consisting of
% <ldx_rng> rows. The number of numeric values must then be a multiple of
% the number of columns. Otherwise, or if sscanf produces an error, inform
% the user and eventually proceed to the (slower) line-by-line conversion.


errmsg = '';    % Init. error message variable
if strcmpi(readMode,'auto') || strcmpi(readMode,'matrix') 
    if hasWaitbar
        waitbar(0.5,hw,'converting in ''matrix'' mode ...');
    end
    
    try
        numColonMatrix  = abs(numColon);
        errorType = 'none';         % 
        A = initializeMatrix(num_lf-1,numColonMatrix,arg.val.NumericType,false);
        
        % Usually, in 'matrix' mode, we have LbAwareness == 2. As the way
        % we calculate the number of rows in a section depends on
        % LbAwareness, we check that here: 
        hasNotAllLb = LbAwareness < 3;
        
        numSectionLb = numel(secLbIdc);
        
        %*% for testing purposes: aggregate multiple sections to a larger one 
        %sectionStep =1;    % how many sections to aggregate
        %selectedSectionIdc = min(2:sectionStep:numSectionLb+sectionStep-1, numSectionLb);
        %*% in this case, use max(1,sdx-sectionStep) instead of sdx-1 below 
        
        selectedSectionIdc = 2:numSectionLb;
    
        for sdx = selectedSectionIdc
            
            % start and end indices of the current section in the text:
            startIdcF8 = idcLb(secLbIdc(sdx-1))+1;
            endIdcF8   = idcLb(secLbIdc(sdx));
            
            % THE conversion of this section by sscanf:
            [Atmp,count,errmsg,nextindex] = ...
                    sscanf(char(f8(startIdcF8 : endIdcF8)), formatStr); 

            % the correponding row indices of the output matrix:
            if hasNotAllLb
                startRow       = cntLb(sdx-1)+1;
                endRow         = cntLb(sdx);
            else
                startRow       = secLbIdc(sdx-1);
                endRow         = secLbIdc(sdx)-1;
            end
            num_lines_loop = endRow - startRow + 1;
            
            %~% error handling ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            if ~isempty(errmsg) 
                % there's an sscanf error message
                errorType = 'sscanf';
                break
            elseif numel(Atmp) ~= numColonMatrix * num_lines_loop
                % we did not read the expected number of numeric elements
                errorType = 'numel';
                numelExpected = numColonMatrix * num_lines_loop;
                numelFound    = numel(Atmp);
                break
            end
            %~% end error handling ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            
            % put the values to the right dimensions and add them to A
            Atmp = reshape(Atmp,numColonMatrix,num_lines_loop)';
            A(startRow:endRow,:) = Atmp;
            
            if hasWaitbar && ~mod(sdx,256)
                waitbar(0.5+0.5*((sdx-1)/(numSectionLb-1)),hw)
            end

        end % for sdx = 2:numSectionLb
        
        % error diagnosis and user information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        switch errorType
            case 'sscanf'
                if (arg.val.InfoLevel >= 2) && ( nextindex <= endIdcF8 - startIdcF8 + 1 )  
                    % If sscanf did not process the whole string, display
                    % the text line where it stopped.
                    
                    % line break indices in the current section
                    idcLbCurr = [0, strfind(f8(startIdcF8 : endIdcF8).', lbchar)];
                    % find line break index of the abortion line
                    idxErrorLine = find(idcLbCurr-nextindex > 0, 1 );    
                    % text content of the abortion line
                    errorLineText = f8(startIdcF8 + (idcLbCurr(idxErrorLine-1):idcLbCurr(idxErrorLine)-numLbfull-1) ).';
                    % display information about the error cause
                    disp(['Sscanf error after reading ' num2str((startRow-1)*numColonMatrix+count) ' numeric values.'])
                    disp(['Text content of the critical row (no. ' num2str(numHeader+startRow-1+idxErrorLine-1) ' without deleted lines): '])
                    disp(errorLineText)
                end % if
                
            case 'numel'
                if arg.val.InfoLevel >= 2
                    % We don't know the exact lines containing the wrong
                    % number of values. As a guess, just display the
                    % positions of the longest or the shortest text lines
                    % (by simply counting characters).
                    
                    % line break indices in the current section
                    idcLbCurr = [0, strfind(f8(startIdcF8 : endIdcF8).', lbchar)];
                    % corresponding text line lengths
                    lenLine = diff(idcLbCurr);
                    [lenLineSorted,idclenLineSorted] = sort(lenLine);
                    maxNumDisplayed = min(5,numel(lenLine));

                    if numelFound < numelExpected
                        disp(['Found less elements (' num2str(numelFound) ') than expected (' num2str(numelExpected) ') in the current section.'])
                        disp('As a hint, these are the text lines containing the least characters:')
                        disp(['lines no. [' num2str(numHeader+startRow-1+idclenLineSorted(1:maxNumDisplayed)) '] having [' num2str(lenLineSorted(1:maxNumDisplayed),' %i') '] characters, resp.'])
                    else
                        disp(['Found more elements (' num2str(numelFound) ') than expected (' num2str(numelExpected) ') in the current section.'])
                        disp('As a hint, these are the text lines containing the most characters:')
                        disp(['lines no. [' num2str(numHeader+startRow-1+idclenLineSorted(end:-1:end-maxNumDisplayed+1)) '] having [' num2str(lenLineSorted(end:-1:end-maxNumDisplayed+1),' %i') '] characters, resp.'])
                    end
                end
                error('Unexpected number of elements in read mode ''matrix''.')
        end
        % end error diagnosis and user information ~~~~~~~~~~~~~~~~~~~~~~~~        
        
    catch   %#ok<CTCH> % catch further errors (old catch style)
        if ~exist('errmsg','var') || isempty(errmsg)
            errmsg = lasterr; %#ok<LERR> (old catch style)
        end
    end % try
end

% Quit on error if 'matrix'-mode was enforced: 
if strcmpi(readMode,'matrix') && ~isempty(errmsg)
    if arg.val.InfoLevel >= 1
        close(hw)
    end
    error(errmsg);
end


%% ReadMode 'line': convert line-by-line

clear Atmp

if strcmpi(readMode,'line') || ~isempty(errmsg) 
    num_data_per_row = zeros(num_lf-1,1);
    
    if ~strcmpi(readMode,'line')
        numColon = -abs(numColon);
        if arg.val.InfoLevel >= 2
            disp('Due to error')
            disp(strrep(['  ' errmsg],char(10),char([10 32 32])))
            disp('txt2mat will now try to read line by line...')
        end % if
    end
    
    if LbAwareness < 3
        idcLb = findLineBreaks(f8, lbuint, memPar, true, false);
        num_lf = numel(idcLb);
    end

    % initialize result matrix A depending on matlab version:
    width_A = max(abs(numColon),1);
    [A,A1] = initializeMatrix(num_lf-1,width_A,arg.val.NumericType,true);

    if hasWaitbar
        if strcmpi(readMode,'line')
            waitbar(0.5,hw,{'reading line-by-line ...'})
        else
            poshw = get(hw,'Position');
            set(hw,'Position',[poshw(1), poshw(2)-4/7*poshw(4), poshw(3), 11/7*poshw(4)]);
            waitbar(0.5,hw,{'now reading line-by-line because of error:';['[' errmsg ']']})
            set(findall(hw,'Type','text'),'interpreter','none');
        end
        drawnow
    end
	
	% extract numeric values line-by-line:
	for ldx = 1:(num_lf-1)
        a = sscanf(char(f8( (idcLb(ldx)+1) : idcLb(ldx+1)-1 )),formatStr)';
        num_data_per_row(ldx) = numel(a);
        % If necessary, expand A along second dimension (allowed if
        % numColon < 0)
        if (num_data_per_row(ldx) > width_A) && (numColon < 0)
            A = [A, repmat(A1,size(A,1),...
                 num_data_per_row(ldx)-width_A)]; %#ok<AGROW>
            width_A = num_data_per_row(ldx);
        end
        A(ldx,1:min(num_data_per_row(ldx),width_A)) = a(1:min(num_data_per_row(ldx),width_A));
        
        % display waitbar:
        if hasWaitbar && ~mod(ldx,10000)
                waitbar(0.5+0.5*(ldx./(num_lf-1)),hw)
        end % if
	end % for
    
    % display info about number of numeric values per line
    if arg.val.InfoLevel >= 2
        if numColon>=0
            reference = numColon;
        elseif numColon == -1;
            reference = width_A;
        else
            reference = -numColon;
        end
        
        disp('txt2mat row length info:')
        idc_less_data = find(num_data_per_row<reference);
        idc_more_data = find(num_data_per_row>reference);
        num_less_data = numel(idc_less_data);
        num_more_data = numel(idc_more_data);
        num_equal_data = num_lf-1 - num_less_data - num_more_data;
        info_ca(1:3,1) = {['  ' num2str(num_equal_data)];['  ' num2str(num_less_data)];['  ' num2str(num_more_data)]};
        info_ca(1:3,2) = {[' row(s) found with ' num2str(reference) ' values'],...
                           ' row(s) found with less values',...
                           ' row(s) found with more values'};
        info_ca(1:3,3) = {' ';' ';' '};
        if num_less_data>0
            info_ca{2,3} = [' (row no. ', num2str(numHeader+idc_less_data(1:min(10,num_less_data))'), repmat(' ...',1,num_less_data>10), ')'];
        end
        if num_more_data>0
            info_ca{3,3} = [' (row no. ', num2str(numHeader+idc_more_data(1:min(10,num_more_data))'), repmat(' ...',1,num_more_data>10), ')'];
        end
        disp(strcatcell(info_ca));

    end % if arg.val.InfoLevel >= 2
    
end % if

if arg.val.InfoLevel >= 1
    close(hw)
end


%% : : : : : subfunction ANATXT : : : : : 

function [anaNumHeader, anaNumColon, anaFormat, anaRepChar, anaReadMode, ...
    anaNumAnalyzed, anaHeader, anaFileErr, anaErr] = anatxt(arg)

% ANATXT analyse data layout in a text file for txt2mat
% 
% Usage:
% [nh, nc, fmt, SR, RM, llta, hl, ferrmsg, aerrmsg] = ...
%       anatxt(arg);
%
% nh            number of header lines
% nc            number of columns
% fmt           format string (curr. always '%f')
% SR            character replacement string
% RM            recommended read mode
% llta          lines analysed after header
% hl            header line characters
% ferrmsg       file operation error message
% aerrmsg       other error messages from this function
%
% arg           txt2mat's input argument struct

%   Copyright 2006-2014 Andres
%   $Revision: 4.00 $  $Date: 2014/03/18 14:05:08 $
%   todo: especially this function needs a cleanup...

% some preparations
ffn             = arg.val.FileName;
filePos         = arg.val.FilePos;
repChar         = arg.val.ReplaceChar;
repStr          = arg.val.ReplaceStr;
repReg          = arg.val.ReplaceRegExpr;
numHeader       = arg.val.NumHeaderLines;

numRR           = arg.num.ReplaceRegExpr;
numRS           = arg.num.ReplaceStr;
numRC           = arg.num.ReplaceChar;
numBL           = arg.num.BadLineString;
numGL           = arg.num.GoodLineString;

[anaNumColon, anaNumAnalyzed] = deal(0);
[anaReadMode, anaHeader, anaErr] = deal('');
anaRepChar = {};
anaNumHeader = numHeader;


%% Read in file

% definitions
numCharRead = 65536;    % minimum number of characters to read
minLines    = 10;       % minimum number of lines to read
if isfinite(numHeader)
    minLines = minLines + numHeader;
end
valueRatio  = 0.1;      % this ratio will tell if a row has enough values
anaFormat   = '%f';     % assume floats only (so far)

hasFileErr  = false;    % init
anaFileErr  = '';       % init

fid = fopen(ffn); 
if filePos > 0
    status = fseek(fid,filePos,'bof');
    if status ~= 0
        hasFileErr = true;
        anaFileErr = ferror(fid,'clear');
    end
end

if ~hasFileErr
    % detect line termination character
    lbfull = detectLineBreakCharacters(ffn);
    lbuint = lbfull(end);        
    lbchar = char(lbuint);
    % read in the first part of the file
    [f8,numLb,posLb] = getLines(fid, minLines, numCharRead, 0, 0, false, lbfull);
    % getLines get a set of consecutive lines from file
    % [hl,numLb,posLb,isAtEnd] = getLines(fid, minLines, minChars, offset, ...
    %                                  origin, inclWsAtEnd, lbfull, lenSection)

end
fclose(fid); 

% care for some exceptions
if hasFileErr
    anaErr = 'file operation error';
    return
end
if isempty(f8)
    anaErr = 'empty file';
    return
end
if numLb <= numHeader
    anaErr = 'file has not more lines than given number of header lines';
    return
end

% remember the original text before deletions and replacements
f8Orig    = char(f8.');
posLbOrig = posLb;

if numHeader > 0
    % select post-header-part of f8
    f8 = f8(posLb(numHeader+1):end);    
    numLb = numLb - numHeader;
    posLb = posLb(numHeader+1:end) - posLb(numHeader+1);
end

%% filter lines (rows)

doFilter = arg.has.SelectLineFun || (numBL + numGL > 0);
if doFilter
    [f8, posLb, numLb, ~, isOk] = filterLines(f8, lbuint, numCharRead, arg);
end


%% Replace regular expressions, strings, and characters, if needed
        
if numRS>0 || numRC>0 || numRR>0
    
    % If a ReplaceStr begins with a line break character, such a character
    % will temporarily be prepended to apply the replacement to the _first_
    % line, too.
    prependChar = '';       % prepend nothing by default
    if numRS>0
        % put all the characters from the ReplaceStr strings into an
        % uint8-array:
        uint8Replace = uint8(char([repStr{:}]));
        % check if any row starts with a line break:
        if any(uint8Replace(:,1)==lbuint)
            prependChar = lbchar;
        end
    end
    numPrepend = numel(prependChar);
    
    f8=[prependChar, char(f8.')];
    
    if numRR>0
        for vdx = 1:numRR       % step through regex replacement arguments 
            srarg = repReg{vdx};    % pick a single replacement argument
            for sdx = 1:(numel(srarg)-1)
                f8 = regexprep(f8, srarg{sdx}, srarg{end}); % replace it
            end
        end
    end

    if numRS>0
        for vdx = 1:numRS     	% step through string replacement arguments 
            srarg = repStr{vdx};    % pick a single replacement argument
            for sdx = 1:(numel(srarg)-1)
                f8 = strrep(f8, srarg{sdx}, srarg{end});    % replace it
            end
        end
    end

    if numRC>0
        for vdx = 1:numRC     	% step through char replacement arguments
            srarg = repChar{vdx}; 	% pick a single replacement argument
            for sdx = 1:(numel(srarg)-1)
                f8( strfind(f8,srarg(sdx)) ) = srarg(end);  % replace it
            end
        end
    end
    
    f8 = uint8(f8(1+numPrepend:end).');
    % update line break indices
    isLB   = f8==lbuint;
    posLb  = [0;find(isLB)];
    numLb  = numel(posLb)-1;
end

%% Find character types

% further representations of the text as required below
f8c      = char(f8.');
f8d      = double(f8.');

% types of characters:
prnAscii = uint8([32:127, 128+32:255]);                 % printable ASCIIs
dec_nr_p = sort(uint8('+-1234567890dDeE.NanIiFfA'));    % decimals with NaN, Inf, signs and .
sep_wo_k = uint8([9 32    47 58 59]);   	% separators excluding comma  
sep_wi_k = uint8([9 32 44 47 58 59]);   	% separators including comma (Tab Space ,/:;)
komma    = uint8(',');               	% ,
other    = setdiff(prnAscii, [sep_wi_k, dec_nr_p]); % printables without separators and decimals

% characters not expected to appear in the data lines:
is_othr = ismembc(f8d,double(other));       % switch to double for compatibility 
is_beg_othr = diff([false, is_othr]);       % true where groups of such characters begin
idc_beg_othr = find(is_beg_othr==1);        % start indices of these groups
[~, sidx] = sort([posLb(2:end).',idc_beg_othr]);     % in sidx, the numbers (1:num_lb) representing the linebreaks are placed between the indices of the start indices from above
num_beg_othr_per_line = diff([0,find(sidx<=numLb)]) - 1;   % number of character groups per line

% numbers enclosing a dot:
% idc_digdotdig = regexp(f8c, '[\+\-]?\d+\.\d+([deDE][\+\-]?\d+)?', 'start');
idc_digdotdig = regexp(f8c, '[\+\-]?\d+\.\d+([deDE][\+\-]?\d+)?');
[~, sidx] = sort([posLb(2:end).',idc_digdotdig]);
num_beg_digdotdig_per_line = diff([0,find(sidx<=numLb)]) - 1;

% numbers enclosing a comma:
% idc_digkomdig = regexp(f8c, '[\+\-]?\d+,\d+([eE][\+\-]?\d+)?', 'start');
idc_digkomdig = regexp(f8c, '[\+\-]?\d+,\d+([eE][\+\-]?\d+)?');
[~, sidx] = sort([posLb(2:end).',idc_digkomdig]);
num_beg_digkomdig_per_line = diff([0,find(sidx<=numLb)]) - 1;

% numbers without a dot or a comma:
% idc_numbers = regexp(f8c, '[\+\-]?\d+([eE][\+\-]?\d+)?', 'start');
idc_numbers = regexp(f8c, '[\+\-]?\d+([eE][\+\-]?\d+)?');
[~, sidx] = sort([posLb(2:end).',idc_numbers]);
num_beg_numbers_per_line = diff([0,find(sidx<=numLb)]) - 1;

% NaN and Inf items :
idc_nan = regexpi(f8c, '\<[\+\-]?(nan|inf)\>');
[~, sidx] = sort([posLb(2:end).',idc_nan]);
num_beg_nan_per_line = diff([0,find(sidx<=numLb)]) - 1;

% commas enclosed by numeric digits
% idc_kombd = regexp(f8c, '(?<=[\d]),(?=[\d])', 'start');
% if compareversion(vn,7)
%     idc_kombd = regexp(f8c, '(?<=[\d]),(?=[\d])');  % lookaround new to v7.0??
% else
    idc_kombd = 1+regexp(f8c, '\d,\d');
% end
[~, sidx] = sort([posLb(2:end).',idc_kombd]);
num_beg_kombd_per_line = diff([0,find(sidx<=numLb)]) - 1;

% two sequential commas without a (different) separator inbetween
% idc_2kom  = regexp(f8c, ',[^\s:;],', 'start');
idc_2kom  = regexp(f8c, ',[^\s:;/],');

% commas:
is_kom  = f8.'==komma;
idc_kom = find(is_kom);
[~, sidx] = sort([posLb(2:end).',idc_kom]);
num_kom_per_line = diff([0,find(sidx<=numLb)]) - 1;


%% Analyze

if isnan(numHeader) % ~~~~~ there's no user-supplied number of header lines
    % determine number of header lines:
    numHeader = max([0, find(num_beg_othr_per_line>0)]); % for now, take the last line containing an 'other'-character 
    if numHeader>=numLb
        anaErr = 'no numeric data found';
        if numHeader>0
            anaHeader = char(f8(1:posLb(numHeader+1)));
        end
        return
    end
    num_beg_numbers_ph = num_beg_numbers_per_line(numHeader+1:end)+num_beg_nan_per_line(numHeader+1:end);    % number of lines following
    % by definition, a line is a valid data line if it contains enough
    % numbers compared to the average:
    has_enough_numbers = num_beg_numbers_ph>valueRatio.*mean(num_beg_numbers_ph);  
    numHeader = numHeader + find(has_enough_numbers, 1 ) - 1; 
    % extract header and data section
    if numHeader>0    
        f8v_idx1 = posLb(numHeader+1)+1; % beginning of the data section in f8
        if doFilter
            % reconstruct number of header lines from the original text
            anaNumHeader = find(cumsum(isOk)==numHeader,1,'first');
        else
            anaNumHeader = numHeader;
        end
        anaHeader = f8Orig(1:posLbOrig(anaNumHeader+1));
    else
        f8v_idx1 = 1;
        anaHeader = [];
        anaNumHeader = 0;
    end
    f8 = f8(f8v_idx1:end);	% valid data section of f8
    anaNumAnalyzed = numLb - numHeader;	% number of non-header lines to analyse
else % ~~~~~~~~~~~~~~~ a number of header lines was given as input argument
    if numHeader>0
        anaHeader = f8Orig(1:posLbOrig(numHeader+1));
    else
        anaHeader = [];
    end
    anaNumAnalyzed = numLb;
end

% find out decimal separator character ('.' or ',')
anaRepChar = {};    % Init. replacement character string
SR_idx     = 0;     % Init. counter of the above
sepchar    = '';    % Init. separator (delimiter) character
decchar    = '.';   % Init. decimal character (default)

num_values_per_line = -num_beg_digdotdig_per_line + num_beg_numbers_per_line;

% Are there commas? If yes, are they decimal commas or delimiters?
if any( num_kom_per_line(numHeader+1:end) > 0 ) 
    sepchar = ',';  % preliminary take comma for delimiter
    % Decimal commas are neighboured by two numeric digits ...
    % and between two commas there has to be another separator
    if  all(num_kom_per_line(numHeader+1:end) == num_beg_kombd_per_line(numHeader+1:end)) ... % Are all commas enclosed by numeric digits?
        && ~any(num_beg_digdotdig_per_line(numHeader+1:end) > 0) ...   % There are no numbers with dots?
        && ~any(idc_2kom(numHeader+1:end) > 0)                         % There is no pair of commas with no other separator inbetween?

        decchar = ',';
        sepchar = '';
        
        num_values_per_line = -num_beg_digkomdig_per_line + num_beg_numbers_per_line; % number of values per line
    end
end

% replacement string for replacements by spaces
% other separators
is_wo_k_found = ismember(sep_wo_k, f8);  % Tab Space : ;
is_other_found= ismember(other,f8);      % other printable ASCIIs

% possible replacement string to replace : and ;
sr1 = [sepchar, char(sep_wo_k([0 0 1 1 1]&is_wo_k_found))];   
% possible replacement string to replace other characters
sr2 = char(other(is_other_found));        % still obsolete as such lines are treated as header lines
                                          % Wrong! The above is not true if
                                          % the number of header lines is
                                          % given by the user.

if numel([sr1,sr2])>0
    SR_idx = SR_idx + 1;
    anaRepChar{SR_idx} = [sr1, sr2, ' '];
end

% possible replacement string to replace the decimal character
if strcmp(decchar,',')
    SR_idx = SR_idx + 1;
    anaRepChar{SR_idx} = ',.';
end

num_items_per_line = num_values_per_line + num_beg_nan_per_line;

anaNumColon = max(num_items_per_line(numHeader+1:end));    % proposed number of columns

if isempty(anaNumColon)
    anaErr = 'no numeric data found';
    return
end

% suggest a proper read mode depending on uniformity of the number of values per
% line
if numel(unique(num_items_per_line(numHeader+1:end))) > 1
    anaReadMode = 'line';
    anaNumColon = -anaNumColon;
else
    anaReadMode = 'auto';
end

%% : : : : : further subfunctions : : : : : 

function s = strcatcell(C)

% STRCATCELL Concatenate strings of a 1D/2D cell array of strings
%
% C = {'a ','123';'b','12'}
%   C = 
%     'a '    '123'
%     'b'     '12' 
% s = strcatcell(C)
%   s =
%     a 123
%     b 12 

num_col = size(C,2);
D = cell(1,num_col);
for idx = 1:num_col
    D{idx} = char(C{:,idx});
end
s = [D{:}];


function [w, newidcoi, vi] = cutvec(v,li,hi,doKeep,varargin)

% CUTVEC remove multiple sections from a vector by linear index intervals
%
% Syntax:
%   w = cutvec(v,li,hi,doKeep)
% OR
%   [w, new_idc_oi, vi] = ...
%       cutvec(v,li,hi,doKeep,old_idc_oi)
%
% v             input vector
% li            lower endpoints of linear index intervals (sorted vector)
% hi            upper endpoints of linear index intervals (sorted vector)
% doKeep        true:   remove values outside all intervals
%               false:  remove values within all intervals
% old_idc_oi    indices of interest in v (optional)
% 
% w             output vector consisting of v-sections
% new_idc_oi    corresponding indices of interest in w
% vi            logical matrix with w=v(vi)
%
% Inputs li, hi and doKeep may also be cell arrays of equal size holding
% multiple sets of index endpoints and logicals.
%
% EXAMPLE:
%
% w = cutvec([1:20],[3,10,16],[7,12,19],1)
%
%   =>  w = [3 4 5 6 7   10 11 12   16 17 18 19]
%
% w = cutvec([1:20],[3,10,16],[7,12,19],0)
%
%   =>  w = [1 2   8 9   13 14 15   20]
%
% w = cutvec([1:20],{[3,10,16],[1,15]},...
%                    {[7,12,19],[5,20]},{0,1})
%
%   =>  w = [1 2   15   20]
%
% tic, w = cutvec([1:5000000]',[100:500:5000000],[200:500:5000000],0); toc
% 
% % Elapsed time is 0.202949 seconds.
%
% v = 1:20;
% li= [10,18];
% hi= [12,19];
% doKeep = 0;
% idcoi = [1,4,7,10,13,18,20];
% 
% [w, newidcoi, vi] = cutvec(v,li,hi,doKeep,idcoi)

%   $Revision: 1.23 $ 

lenV   = numel(v);
has_idcoi = false;
newidcoi=[];

if nargin == 5  % indices of interest are provided
    idcoi   = int32(varargin{1});
    if ~issorted(idcoi)
        error([mfilename ': vector of indices of interest must be sorted!'])
    end
    has_idcoi = true;
end

if iscell(li)
    vi = endpoint2logical(lenV,li{1},hi{1},logical(doKeep{1}));
    for ci = 2:numel(li)
        vi = vi & endpoint2logical(lenV,li{ci},hi{ci},logical(doKeep{ci}));
    end
else
    vi = endpoint2logical(lenV,li,hi,logical(doKeep));
end

if has_idcoi
    remidc   = int32(find(vi));
    newidcoi = ismembc2(idcoi,remidc);
end

w = v(vi);


function vi = endpoint2logical(len,li,hi,doInclude)

% ENDPOINT2LOGICAL convert endpoints of index intervals to logical index
%
% Syntax:
%   vi = endpoint2logical(len,li,hi,doInclude)
%
% with
%
% len           length of logical index vector
% li            vector with lower endpoints of linear index intervals
% hi            vector with upper endpoints of linear index intervals
% doInclude     true:  logical indices are 1 only inside  the intervals
%               false: logical indices are 1 only outside the intervals
%
% vi            logical index vector

% initialize output:
if doInclude
    vi = false(len,1);
else
    vi = true(len,1);
end

for i = 1:numel(li)
    vi(li(i):hi(i)) = doInclude;
end


function arg = argincheck(allargin)

% ARGINCHECK get input arguments for txt2mat
%
% arg = argincheck(allargin)
% provides input argument information in struct arg with fields
%       arg.val.(argname)  ->  value of the input
%       arg.has.(argname)  ->  T/F argument was given
%       arg.num.(argname)  ->  number of values for some non-scalar inputs

% Check input argument occurence (Property/Value-pairs)
%  1 'NumHeaderLines',     Scalar,     13
%  2 'NumColumns',         Scalar,     100
%  3 'Format',             String,     ['%d.%d.%d' repmat('%f',1,6)]
%  4 'ReplaceChar',        CellAString {')Rx ',';: '}
%  5 'BadLineString'       CellAString {'Warng', 'Bad'}
%  6 'ReplaceStr',         CellAString {{'True','1'},{'False','0'},{'#Inf','Inf'}}
%  7 'DialogString'        String      'Now choose a Labview-Logfile'
%  8 'MemPar'              2x1-Vector  [2e7, 2e5]
%  9 'InfoLevel'           Scalar      2
% 10 'ReadMode'            String      'Auto'
% 11 'NumericType'         String      'single'
% 12 'RowRange'            2x1-Vector  [1,Inf]
% 13 'FilePos'             Scalar      1e5
% 14 'ReplaceRegExpr'      CellArOfStr {{'True','1'},{'False','0'},{'#Inf','Inf'}} 
% 15 'GoodLineString'      CellAString {'OK'}
% 16 'SelectLineFun'       FunHandle   @(rowNo) rem(rowNo-1,2) < 1

% check for validated argument struct as last input to bypass further input
% parsing (undocumented, untested -> todo)
hasValidatedArgStruct = false;  % default
if ~isempty(allargin) && isstruct(allargin{end})
    argStruct = allargin{end};
    if isfield(argStruct,'isValidated') && argStruct.isValidated
        hasValidatedArgStruct = true;
    end
end

if hasValidatedArgStruct
    % carry over validated inputs
    arg = argStruct;
else
    %-- main input parsing
    p = inputParser;
    p.KeepUnmatched = true;
    p.FunctionName = 'txt2mat';
    %-- optional inputs that follow the file name
    p.addOptional(  'NumHeaderLines', NaN , @(x)isempty(x)||(isnumeric(x)&&isscalar(x)))
    p.addOptional(  'NumColumns'    , []  , @(x)isempty(x)||(isnumeric(x)&&isscalar(x)))
    p.addOptional(  'Format'        , '%f', @(x)isempty(x)||(ischar(x)&&any(x=='%')))
    p.addOptional(  'ReplaceChar'   , {}  , @(x)isempty(x)||iscellstr(x)||ischar(x))
    p.addOptional(  'BadLineString' , {}  , @(x)isempty(x)||iscellstr(x))
    %-- param/value only inputs:
    p.addParamValue('SelectLineFun' , {}  , @(x)isa(x,'function_handle'))
    p.addParamValue('GoodLineString', {}  , @(x)isempty(x)||iscellstr(x))
    p.addParamValue('ReplaceStr'    , {}  , @(x)isempty(x)||iscell(x))
    p.addParamValue('ReplaceRegExpr', {}  , @(x)isempty(x)||iscell(x))
    p.addParamValue('NumericType'   , 'double', @(x)ischar(x))
    p.addParamValue('RowRange'      , [1,Inf] , @(x)isnumeric(x)&&(numel(x)==2))
    p.addParamValue('FilePos'       , 0   , @(x)isnumeric(x)&&isscalar(x))
    p.addParamValue('ReadMode'      , 'auto', @(x)ischar(x))
    p.addParamValue('DialogString'  , 'Select File', @(x)ischar(x))
    p.addParamValue('InfoLevel'     , 2   , @(x)isnumeric(x)&&isscalar(x))
    p.addParamValue('MemPar'        , 65536, @(x)isnumeric(x)&&isscalar(x))
    %-- older param names, still accepted
    p.addParamValue('ConvString'    , '%f', @(x)isempty(x)||ischar(x))
    p.addParamValue('ReplaceExpr'   , {}  , @(x)isempty(x)||iscell(x))
    %-- parse inputs:
    p.parse(allargin{2:end})

    % rearrange input argument parsing results to a nested struct called
    % 'arg', with
    % arg.val.(name) holding the values of the inputs
    % arg.has.(name) indicating whether the input was given (t/f)
    if ~isempty(fieldnames(p.Unmatched))
        ufn = fieldnames(p.Unmatched);
     	warning('txt2mat:unmatchedArg', ...
                ['Unmatched input parameter names were ignored ("'  ufn{1} '").'])
    end
    arg.val   = p.Results;
    defnames  = p.UsingDefaults;
    argnames  = fieldnames(arg.val);
    isdefcell = [argnames.'; repmat({true},1,numel(argnames))];
    arg.has   = struct(isdefcell{:});
    for k = defnames
        arg.has.(k{:}) = false;
    end
    
    % ~~~ additional checks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    % error if both old and new param names occur, accept if solely old one
    % 'ConvString' -> 'Format'
    if arg.has.ConvString && arg.has.Format
        error('txt2mat:deprecatedConvString', ...
              'use param name ''Format'' only (instead of ''ConvString'')');
    elseif arg.has.ConvString
        arg.has.Format = arg.has.ConvString;
        arg.val.Format = arg.val.ConvString;
    end
    % 'ReplaceExpr' -> 'ReplaceStr'
    if arg.has.ReplaceExpr && arg.has.ReplaceStr
        error('txt2mat:deprecatedReplaceExpr', ...
              'use param name ''ReplaceStr'' only (instead of ''ReplaceExpr'')');
    elseif arg.has.ReplaceExpr
        arg.has.ReplaceStr = arg.has.ReplaceExpr;
        arg.val.ReplaceStr = arg.val.ReplaceExpr;
    end
    
    % NumHeaderLines must be a nonnegative integer.
    if arg.has.NumHeaderLines && arg.val.NumHeaderLines < 0 && ...
            arg.val.NumHeaderLines ~= round(arg.val.NumHeaderLines)
        error('txt2mat:wrongNumHeaderLines', ...
              'NumHeaderLines must be a nonnegative integer.')
    end
    
    % NumColumns must be an integer scalar.
    if arg.has.NumColumns && ...
            arg.val.NumColumns ~= round(arg.val.NumColumns)
        error('txt2mat:wrongColumns', ...
              'NumColumns must be integer.')
    end

    % change empty format string to default
    if isempty(arg.val.Format)
        arg.val.Format = '%f';
    end

    % wrap a single string ReplaceChar into a cell
    if ischar(arg.val.ReplaceChar)
        arg.val.ReplaceChar = {arg.val.ReplaceChar};
        %warning('txt2mat:ineptReadmode', ...
        %    'for future versions, please use a cell array of strings for character replacements.')
    end
    arg.num.ReplaceChar    = numel(arg.val.ReplaceChar);

    % add number of bad and good line strings
    arg.num.BadLineString  = numel(arg.val.BadLineString);
    arg.num.GoodLineString = numel(arg.val.GoodLineString);
    
    % add numbers of string and regular expression replacements
    arg.num.ReplaceStr     = numel(arg.val.ReplaceStr);
    arg.num.ReplaceRegExpr = numel(arg.val.ReplaceRegExpr);
    
    % check if ReplaceStr is empty or has correct Find+Replace string pairs
    if arg.has.ReplaceStr && ~isempty(arg.val.ReplaceStr) && ~( ...
       ( iscellstr(arg.val.ReplaceStr)&&(numel(arg.val.ReplaceStr)==2) ) ...
       || all(cellfun(@(x)iscellstr(x)&&(numel(x)==2),arg.val.ReplaceStr)) )
        error('txt2mat:ReplaceStr', ...
              'ReplaceStr must be a cell array of two-element cell arrays of strings.')
    end

    % check if ReplaceRegExpr is empty or has correct Find+Replace string pairs
    if arg.has.ReplaceRegExpr && ~isempty(arg.val.ReplaceRegExpr) && ~( ...
       (iscellstr(arg.val.ReplaceRegExpr)&&(numel(arg.val.ReplaceRegExpr)==2)) ...
       || all(cellfun(@(x)iscellstr(x)&&(numel(x)==2),arg.val.ReplaceRegExpr)) )
        error('txt2mat:ReplaceRegExpr', ...
              'ReplaceRegExpr must be a cell array of two-element cell arrays of strings.')
    end

    % force ReadMode to 'line' if NumColumns < 0
    if arg.has.NumColumns && arg.val.NumColumns < 0
        arg.val.ReadMode = 'line';
        if arg.has.ReadMode && ~strcmpi(arg.val.ReadMode,'line')
            warning('txt2mat:changedReadmode', ...
                'ReadMode is changed to ''line'' as NumColumns is negative.')
        end
    end
    
    % further checks on RowRange
    if arg.has.RowRange  && ~issorted(arg.val.RowRange) && ...
       any(arg.val.RowRange ~= round(arg.val.RowRange)) && ...
       arg.val.RowRange(1) < 1
        error('txt2mat:wrongRowRange', ...
              'RowRange must be a sorted positive integer 2x1 vector.')
    end
        
    % further checks on FilePos
    if arg.has.FilePos  &&  arg.val.FilePos < 0 && ...
       any(arg.val.FilePos ~= round(arg.val.FilePos))
        error('txt2mat:wrongFilePos', ...
              'FilePos must be a nonnegative integer.')
    end
    
    % further checks on SelectLineFun
    if arg.has.SelectLineFun
        try
            SlfTest = arg.val.SelectLineFun((1:8).');
        catch err
        error('txt2mat:errorSelectLineFun', ...
              'SelectLineFun error on test data (1:8).''')
        end
        if numel(SlfTest)~=8
            error('txt2mat:wrongSelectLineFun', ...
              'SelectLineFun must preserve length of input (test data: (1:8).'')')
        elseif ~islogical(SlfTest)
            warning('txt2mat:SelectLineFunNonLogical', ...
                ['SelectLineFun output should be logical, but it is ' class(SlfTest) ])
        end
    end
        
    % ~~~ additional checks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ end ~
            
    % confirm arg struct validation for repeated usage (future version)
    arg.isValidated = true;
end

% add file name
if numel(allargin) >= 1
    arg.val.FileName = allargin{1};
    arg.has.FileName = true;
else
    arg.val.FileName = '';
    arg.has.FileName = false;
end


function lb = detectLineBreakCharacters(ffn)

% DETECTLINEBREAKCHARACTERS find out type of line termination of a file
%
% lb = detectLineBreakCharacters(ffn)
%
% with
%   ffn     ascii file name
%   lb      line break character(s) as uint8, i.e.
%           [13 10]     (cr+lf) for standard DOS / Windows files
%           [10]        (lf) for Unix files
%           [13]        (cr) for Mac files
%
% The DOS style values are returned as defaults if no such line breaks are
% found.

% www.editpadpro.com/tricklinebreak.html :
% Line Breaks in Windows, UNIX & Macintosh Text Files
% A problem that often bites people working with different platforms, such
% as a PC running Windows and a web server running Linux, is the different
% character codes used to terminate lines in text files. 
% 
% Windows, and DOS before it, uses a pair of CR and LF characters to
% terminate lines. UNIX (Including Linux and FreeBSD) uses an LF character
% only. The Apple Macintosh, finally, uses a CR character only. In other
% words: a complete mess.

lfuint   = uint8(10);       % LineFeed
cruint   = uint8(13);       % CarriageReturn
crlfuint = [cruint,lfuint];
lfchar   = char(10);
crchar   = char(13);
crlfchar = [crchar,lfchar];
readlen  = 16384;

% Cycle through file and read until we find line termination characters or
% we reach the end of file. 
% Possible line breaks are: cr+lf (default), lf, cr

logfid = fopen(ffn); 
has_found_lbs = false;
while ~has_found_lbs

    [f8,cntr] = fread(logfid,readlen,'*char');

    pos_crlf = strfind(f8',crlfchar);
    pos_lf   = strfind(f8',lfchar);
    pos_cr   = strfind(f8(1:end-1)',crchar);
    % here we ignored a cr at the end as it might belong to a cr+lf
    % combination (later we'll step back one byte in the file position to
    % avoid overlooking such a single cr)

    num_lbs = [numel(pos_crlf),numel(pos_lf),numel(pos_cr)];

    if all(num_lbs==0)
        fseek(logfid, -1, 0);    % step back one byte
        
        % if we reached the end of file without finding any special
        % character, set the endmost line break character and the complete
        % line break character to DOS values as defaults
        if cntr < readlen
            has_found_lbs = true;   % just to exit the while loop
            lb = crlfuint;          % complete line break character set
        end
    elseif num_lbs(1)>0
        has_found_lbs = true;
        lb = crlfuint;
    elseif num_lbs(2)>0
        has_found_lbs = true;
        lb = lfuint;
    elseif num_lbs(3)>0
        has_found_lbs = true;
        lb = cruint;
    end
end
fclose(logfid); 

function [txt, posLb0, cntLbMod, idxSecEndLb, isOk] = ...
    filterLines(txt, uintLb, startLenSbs, arg)

% FILTERLINES loop through sections of txt and remove unwanted lines
% 
% [txt, posLb0, cntLbMod, idxSecEndLb, isOk] = ...
%                                filterLines(txt, uintLb, startLenSbs, arg)
% 
% Inputs: 
%   txt         uint8 representation of the original char string
%   uintLb      line break character as uint8
%   startLenSbs initial subsection length
%   arg         struct with fields
%       has.SelectLineFun   logical, tells if a selection function exists
%       val.SelectLineFun   function for line selection
%       val.GoodLineString  cell with good line marker strings
%       val.BadLineString   cell with bad line marker strings
%
% Outputs:
%   txt         uint8 representation of modified char string
%   posLb0      line break positions in modified txt (starting with 0)
%   cntLbMod    number of lines in modified txt
%   idxSecEndLb posLb0(idxSecEndLb) are the positions of the line breaks
%               at the section borders
%   isOk        true: line from input txt is kept, false: line is removed

% some abbreviations:
goodStr = arg.val.GoodLineString;
badStr  = arg.val.BadLineString;
numGood = numel(goodStr);
numBad  = numel(badStr);

% initializations for the text section loop
doRead   = true;
idxRdLo  = 0;	% idxRdLo: start index of a full section to be read from 
                %     inside original txt (equals last line break position
                %     from previous section)  
                % iTxtLo, iTxtHi: indices of a subsection inside txt
             	% iSecLo, iSecHi: indices of a subsection inside a section 
idxWrLo  = 0;   % start index of a modified section to be written to txt
cntLbTxt = 0;   % counts lines in original txt
cntLbMod = 0;   % counts lines in modified txt
cntSec   = 0;   % counts sections in original txt
cntSecMod= 0;   % counts sections in modified txt
lenWork  = max(2,startLenSbs); % set initial subsection length to at least 2
numTxt   = numel(txt);
while doRead        % loop through text sections
    cntSec  = cntSec + 1;
    
    % prepare building a section of txt that contains line breaks
    workSec = zeros(lenWork,1, 'uint8');    % initialize content of section
    isLbSec = false(lenWork,1);      % will be true at line break positions

    % loop through subsections until at least one line break is found to
    % ensure we have complete lines in the current section
    hasNoLb = true;
    iSecHi  = 0;
    while hasNoLb   
        iSecLo 	= iSecHi+1;
        iSecHi	= iSecLo + lenWork - 1;
        iTxtLo  = idxRdLo + iSecLo;
        [iTxtHi,ci] = min([numTxt,idxRdLo + iSecHi]);
        workSbs = txt(iTxtLo:iTxtHi);               % current subsection
        isLbSbs = workSbs==uintLb;                  
        workSec(iSecLo:iTxtHi-idxRdLo) = workSbs; 	% add text to section
        isLbSec(iSecLo:iTxtHi-idxRdLo) = isLbSbs;  	% add line break t/f
        doRead  = ci > 1;
        hasNoLb = ~any(isLbSbs) & ci > 1;
    end
    lenWork    = iTxtHi-idxRdLo;  	% adapt future length of subsections
    posLbSec   = find(isLbSec);   	% line break positions in current section
    posLbSec0  = [0; posLbSec];   	% ", prepend zero
    lenLineSec = diff(posLbSec0);  	% length of line (in characters)
    numLbSec   = numel(posLbSec); 	% number of line breaks in current section
    lenSec     = posLbSec(end);   	% position of last line break
    workSec    = workSec(1:lenSec);	% crop section to last line break
    idxRdLo    = idxRdLo + lenSec;  % set start index for next iteration
    
    % initialize vector holding all line break positions in modified txt,
    % including a zero at the beginning (posLb0), and a vector indexing the
    % end-of-section line breaks inside it (idxSecEndLb)
    if cntSec == 1	
        posLb0 = zeros(ceil(numLbSec/lenSec*numTxt)+1,1);
        idxSecEndLb = ones(max(2,ceil(numTxt/lenSec)),1);
    end
        
    % ~~~ line selection by function and good/bad line strings ~~~~~~~~~~~~

    % apply the selection function on the current line numbers
    if arg.has.SelectLineFun
        % test line numbers to decide which lines (i.e. rows) to keep
        % we must use the original line numbers from txt here
        isLineSel = arg.val.SelectLineFun((cntLbTxt+1:cntLbTxt+numLbSec).');
    else
        isLineSel = true(numLbSec,1);   % do not remove any lines
    end
  
    % Find lines marked good or bad.
    % Start with the good line marker strings.
    if numGood > 0
        idcGoodCurr = cell(numGood,1);
        for k = 1:numGood
            % find positions of the current marker in the current section:
            idcGoodCurr{k} = strfind(char(workSec.'),goodStr{k}).';
        end
        % ~~ get the corresponding line break positions... ~~~~~~~~~~~~~~~~
        % sort all marker positions found and remove doublets
        idcGoodAll = unique(cat(1,idcGoodCurr{:}));
        % see how they sort into the sorted vector of line break positions
        [~,ix] = sort([idcGoodAll; posLbSec0]);
        % a line break position that is no longer followed directly by
        % another line break position but by one or more marker positions
        % denotes a line containing at least one marker string
        [~,ix] = sort(ix);
        isLineGood = diff(ix(numel(idcGoodAll)+1:end))>1;
        % ~~ ...done. (Is there a faster solution??) ~~~~~~~~~~~~~~~~~~~~~~
    else
        isLineGood = true(numLbSec,1);
    end
    % Then do the same for the bad line marker strings.
    if numBad > 0
        idcBadCurr = cell(numBad,1);
        for k = 1:numBad
            idcBadCurr{k} = strfind(char(workSec.'),badStr{k}).';
        end
        idcBadAll = unique(cat(1,idcBadCurr{:}));
        [~,ix] = sort([idcBadAll; posLbSec0]);
        [~,ix] = sort(ix);
        isLineNotBad = diff(ix(numel(idcBadAll)+1:end))==1;
    else
        isLineNotBad = true(numLbSec,1);
    end
    
    % ~~~ combine selection critera and update txt ~~~~~~~~~~~~~~~~~~~~~~~~
    
    isLineOk = isLineSel & isLineGood & isLineNotBad;
    
    if nargout > 4
        if cntSec == 1	
            isOk = false(ceil(numLbSec/lenSec*numTxt)+1,1);
        end
        isOk(cntLbTxt+1:cntLbTxt+numLbSec) = isLineOk;
    end
    
    % update line break counter for original txt (to remind the numbers for
    % the selection function and to have the indices for isOk output)
    cntLbTxt = cntLbTxt + numLbSec; 
    
    
    if all(isLineOk)	% no lines of the current section will be removed
        if cntSec > 1
            % write section to new position into txt
            txt(idxWrLo+1:idxWrLo+lenSec) = workSec;
        end
        % collect line break positions of modified txt
        posLb0(cntLbMod+2:cntLbMod+numLbSec+1) = idxWrLo+posLbSec;
        % count line breaks in modified txt
        cntLbMod    = cntLbMod + numLbSec;
        % start index of next txt write
        idxWrLo = idxWrLo + lenSec;
        % define a new section inside the modified txt
        cntSecMod = cntSecMod + 1;
        % index for the section ends in posLb0
        idxSecEndLb(cntSecMod+1) = idxSecEndLb(cntSecMod)+numLbSec;
    elseif any(isLineOk)	% some lines will be removed
        lenLineOk = lenLineSec(isLineOk);
        % start and end indices of groups of continguous lines that will remain 
        selL = posLbSec0( [isLineOk(1:end) & ~[false;isLineOk(1:end-1)]; false]  );
        selR = posLbSec0( [false; isLineOk(1:end) & ~[isLineOk(2:end); false]] );
        % update section:
        workSec     = cutvec(workSec,selL+1,selR,true);
        posLbMod    = cumsum(lenLineOk);
        numLbMod    = numel(lenLineOk);
        % write modified section back into txt
        txt(idxWrLo+1:idxWrLo+posLbMod(end)) = workSec;
        % collect line break positions of modified txt
        posLb0(cntLbMod+2:cntLbMod+numLbMod+1) = idxWrLo+posLbMod;
        % count line breaks in modified txt
        cntLbMod    = cntLbMod + numLbMod;
        % start index of next txt write
        idxWrLo = idxWrLo + posLbMod(end);   
        % define a new section inside the modified txt
        cntSecMod = cntSecMod + 1;
        % index for the section ends in posLb0
        idxSecEndLb(cntSecMod+1) = idxSecEndLb(cntSecMod)+numLbMod;
    end
end

% remove overallocated parts from the outputs:
txt         = txt(1:idxWrLo);
posLb0      = posLb0(1:cntLbMod+1);
idxSecEndLb = idxSecEndLb(1:cntSecMod+1);


function [idcLb, cntLb, secLbIdc] = ...
    findLineBreaks(f8, uintLb, lenSection, doFindAll, doCount) 

% FINDLINEBREAKS find line break indices
%
% [idcLb, cntLb, secLbIdc, idcBad] = ...
%               findLineBreaks(f8, uintLb, lenSection, doFindAll, doCount)
%
% This function cycles through a text by manageable sections and finds line
% break characters - either all or just the last one in each section. If
% only the last line break in each section is to be found, findLineBreaks
% can provide the corresponding consecutive number of this line break in
% the text. 
%
% idcLb     	(nx1)-vector. Zero + some or all line break positions in f8
% cntLb         empty or (nx1)-vector. If not all line breaks have to be
%               found, but doCount is true, this is the number of of each
%               line break in f8 that is listed in idcLb (with a zero put
%               in front). Otherwise cntLb is left empty, as cntLb would
%               just be trivially [0:numel(idcLb)]
% secLbIdc      idcLb(secLbIdc) are the positions of the last line
%               break in each section (including the "zero" line break)
%
% f8            the text as an uint8 (Nx1)-vector
% uintLb        uint8-scalar representation of the line break character to
%               be found (10 or 13; could actually be any character). 
% lenSection   	character length of a section
% doFindAll     true: find and index every line break; false: find only the
%               last one in a section
% doCount       count number of every line break in cntLb - this is active
%               only in the non-trivial case when only the last line
%               break in a section has to be found 

%   $Revision: 4.00 $ 

lenF8   = numel(f8);
idxLo 	= 1;   % init., start index of a section processed in a loop
cntLb   = [];

numSection = ceil(lenF8/lenSection);

if doFindAll    % ~~~~~~~~~~~~ find all line break positions ~~~~~~~~~~~~~~
    % In what follows, the text will repeatedly be processed in consecutive
    % sections of length <lenSection> to help avoid memory problems.
    secLbIdc = ones(numSection+1,1); 
    loopCntr = 0;
    lbCntr   = 0;
    while idxLo <= lenF8
        loopCntr = loopCntr + 1;
        idxHi = min(idxLo - 1 + lenSection,lenF8);	% end index of current section

        % find line breaks in current section
        isLb = f8(idxLo:idxHi)==uintLb;
        crPosLb = find(isLb)+idxLo-1;
        numCrLb = numel(crPosLb);
        if loopCntr == 1
            % preallocate idcLb with estimated number of line breaks
            idcLb = zeros((1+numSection+1)*(numCrLb+1), 1);
        end

        % collect line break indices
        idcLb(lbCntr+2:lbCntr+numCrLb+1) = crPosLb;
        
        secLbIdc(loopCntr+1) = numel(idcLb);
        
        idxLo = idxHi + 1;                      % start index for the following loop
        lbCntr = lbCntr + numCrLb;
    end % while
    
    idcLb = idcLb(1:lbCntr+1);
    
else    % ~~~~~~ find last line break position of each section only ~~~~~~~

    % Preallocate maximum space for output variables:
    if doCount
        cntLb = zeros(numSection+1,1);
    end
    idcLb = zeros(numSection+1,1);
    
    lbCntr = 0; % keep in mind how many line breaks have been found,
               	% as some sections might not contain a line break at all

    % Find line break indices within lenSection distance
    while idxLo <= lenF8
        idxHi = min(idxLo - 1 + lenSection,lenF8);   % end index of current section

        % parse backwards to find the last line break of the section
        cntr = 0;
        doKeepOnLooking = true;
        while doKeepOnLooking
            hasNotFound = (f8(idxHi-cntr) ~= uintLb);
            cntr = cntr+1;
            doKeepOnLooking = hasNotFound && (cntr < lenSection);
        end
        
        if ~hasNotFound
            lbCntr = lbCntr + 1;
            % add the line break to the list
            idcLb(lbCntr+1) = idxHi-cntr+1;
            
            % if desired, count all line breaks of the section
            if doCount
                cntLb(lbCntr+1)= cntLb(lbCntr) + sum(f8(idxLo:idxHi)==uintLb); %#ok<AGROW>
            end
        end
        idxLo = idxHi + 1;
    end % while 
    
    % if too much space was preallocated, shorten the outputs:
    if lbCntr<numSection
        idcLb(lbCntr+2:numSection+1) = [];
        if doCount
            cntLb(lbCntr+2:numSection+1) = [];
        end
    end
    
    secLbIdc = (1:numel(idcLb)).';
    
end     % ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

function [A,a] = initializeMatrix(numRows,numColumns,numericType,doSetNan)

% INITIALIZEMATRIX initialize result matrix A depending on matlab version
%
% [A,a] = initializeMatrix(numRows,numColumns,numericType,...
%                          doSetNan, matlabVersionNumber);
%
% A                     numRows x numColumns - Matrix
% a                     scalar of the same type a A
%
% numRows               nonnegative integer
% numColumns            nonnegative integer
% numericType           numeric type string ('double','single',...)
% doSetNan              logical - if true, set outputs to NaNs rather than
%                       zeros if the numericType allows NaNs

if doSetNan && (strcmpi(numericType,'double') || ...
        strcmpi(numericType,'single'))
    A = NaN(numRows,numColumns,numericType);
    a = NaN;
else
    A = zeros(numRows,numColumns,numericType);
    a = 0;
end


function f8 = cleanUpFinalWhitespace(f8,lbfull)

% CLEANUPFINALWHITESPACE replace final whitespaces by spaces + line break
%
% f8 = cleanUpFinalWhitespace(f8,lbfull)
% with
% f8        text as uint8-vector
% lbfull    full line break characters as uint8-vector

spuint   = uint8(32);   % Space (= ascii whitespace limit) as uint8
num_lbfull = numel(lbfull); 
cnt_trail_white = 0;
is_ws_at_end = true;

while is_ws_at_end  % step through the endmost characters
    if f8(end-cnt_trail_white) <= spuint        % is it a whitespace?
        cnt_trail_white = cnt_trail_white + 1;
    else
        f8(end-cnt_trail_white+1:end) = spuint;	% fill with spaces
        if cnt_trail_white >= num_lbfull
            % replace endmost space(s) by a line break:
            f8(end-num_lbfull+(1:num_lbfull))  = lbfull;    
        else
            % append a final line break:
            f8(end+(1:num_lbfull))  = lbfull;               
        end
        is_ws_at_end = false;
    end
end % while


function [f8, l, pLb, isAtEnd] = getLines(fid,minLines,varargin)

% getLines get a set of consecutive lines from file
% 
% [hl,numLb,posLb,isAtEnd] = getLines(fid, minLines, minChars, offset, ...
%                                  origin, inclWsAtEnd, lbfull, lenSection)
%
% fid           file identifier
% minLines      minimum number of lines to retrieve
% minChars      minimum number of characters to retrieve
%               (optional, default 0)
% offset        file position to start at, relative to origin
%               (optional, default 0)
% origin        (optional) a string whose legal values are
%               'bof'  Beginning of file
%               'cof'  Current position in file (default)
%               'eof'  End of file
% inclWsAtEnd   include a trailing line in the file that consists of white-
%               space only and that is not terminated by a line break
%               (optional, default false)
% lbfull        line termination character(s) as uint8
%               (optional, default [13,10])
% lenSection    (internal) length of a processed section
%               (optional, default 65536)
% 
% hl            the lines from the file as uint8 vector - each line is
%               terminated by a line break, even a final line that was not
%               terminated in the file
% numLb         number of lines, i.e. number of line breaks in hl
% posLb         line break positions in hl, including an added leading zero 
% isAtEnd       true if the end of hl corresponds to the end of file

%   $Revision: 2.11 $ 

% set defaults for opt. inputs:
% minChars, offset,origin,inclWsAtEnd,lbfull,lenSection
optargs = {0, 0, 'cof', false, uint8([13 10]),  65536};

% skip any new inputs if they are empty
isEmptyArg = cellfun('isempty', varargin);

% overwrite defaults with non-emty arguments in varargin
optargs(~isEmptyArg) = varargin(~isEmptyArg);
 
% assign to variables
[minChars, offset, origin, inclWsAtEnd, lbfull, lenSection] = optargs{:};

% move to requested file position
if offset ~= 0 || ~strcmpi(origin,'cof')
	fseek(fid,offset,origin);
end

% get number of the file's remaining bytes from current position:
bytePos = ftell(fid);
fseek(fid, 0, 'eof');
byteEnd = ftell(fid);
fseek(fid, bytePos, 'bof');
numByte = byteEnd-bytePos;
spuint  = uint8(32);   % Space (= ascii whitespace limit) as uint8
lbuint  = lbfull(end);
lenLb   = numel(lbfull);

[f8c,nF8c]	= fread(fid,lenSection,'*uint8');	% current text section

isIn = nF8c < numByte;      % not at the end of text?
pLbc = find(f8c==lbuint); 	% current line break positions
nLbc = numel(pLbc);         % number of line breaks so far
gLb  = max([0;pLbc]);       % greatest line break position

% continue reading if not enough line breaks or characters have been read
% and if we're not at the end of the file
doRead = ((nLbc < minLines) || gLb < minChars) && isIn;

if doRead       % estimate output sizes and preallocate
     f8 = zeros(min( max(ceil(nF8c/nLbc*minLines), minChars), ...
                     numByte),1,'uint8');
     pLb = zeros(max(1+minLines+ceil(nLbc/nF8c), ...
                       ceil(nLbc/nF8c*minChars) ), 1);
end

% start to write to outputs
f8(1:nF8c,1)    = f8c;
pLb(1:nLbc+1,1) = [0;pLbc];


f = nF8c;     % counts number of characters read from fid
l = nLbc;     % counts number of line breaks
while doRead
    % continue to read
    [f8c,nF8c]	= fread(fid,lenSection,'*uint8');	% current text section
    pLbc     	= find(f8c==lbuint);	% position of current line breaks
    nLbc    	= numel(pLbc);          % number   of current line breaks

    % continue to write to outputs
    f8(f+1:f+nF8c)      = f8c;
    pLb(l+2:l+nLbc+1,1)	= f   + pLbc;

    % prepare reading next section
    if nLbc > 0                 % if there are new line breaks...
        gLb = pLb(l+nLbc+1);    % ...update greatest line break position
    end
    f       = f + nF8c;
    l       = l + nLbc;
    isIn    = f < numByte;      % end of text?
    doRead  = ((l < minLines) || (gLb < minChars) ) && isIn;
end

if ((l < minLines) || (gLb < minChars)) && f > 0 && ...
        f8(f) ~= lbuint && ...
            (  inclWsAtEnd || any(f8(pLb(l+1)+1:f) > spuint ) )
    % We're at the end of file but have not yet enough lines and/or chars.
    % Add a line break to the last line if it has none and if it has
    % at least some non-white-space characters (i.e. ignore a final line
    % without a line break that contains only white-space-chars)
    f8(f+1:f+lenLb) = lbfull;
    f          = f+lenLb;
    l          = l + 1;
    pLb(l+1,1) = f;
end

if l < minLines || pLb(1+l) < minChars
    % having read up to the end of the file, delete overallocated parts of
    % pLb and f8 
    pLb = pLb(1:l+1);
    f8  = f8(1:pLb(l+1));
    isAtEnd = true;
else
    % first line break index satisfying minLines and minChars 
    k = minLines + find( pLb(1+minLines:end) >= minChars, 1, 'first');
    % select desired parts and correct file position
    f8  = f8(1:pLb(k));
    pLb = pLb(1:k);
    l   = k-1;
    fseek(fid,pLb(k)-f,'cof');
    isAtEnd = pLb(k) >= numByte;
end

Loading data, please wait...