CsvTicFactory

CsvTicFactory(self, tic_dat_factory)

Primary class for reading/writing csv files with TicDat objects. Your system will need the csv package if you want to use this class. Don't create this object explicitly. A CsvTicFactory will automatically be associated with the csv attribute of the parent

create_tic_dat

CsvTicFactory.create_tic_dat(dir_path,
                             dialect='excel',
                             headers_present=True,
                             freeze_it=False,
                             encoding=None)

Create a TicDat object from the csv files in a directory

:param dir_path: the directory containing the .csv files.

:param dialect: the csv dialect. Consult csv documentation for details.

:param headers_present: Boolean. Does the first row of data contain the column headers?

:param encoding: see docstring for the Python.open function

:param freeze_it: boolean. should the returned object be frozen?

:return: a TicDat object populated by the matching files.

caveats: Missing files resolve to an empty table, but missing fields on matching files throw an Exception. By default, data field values (but not primary key values) will be coerced into floats if possible. This includes coercing "inf" and "-inf" into +/- float("inf") The rules over which fields are/are-not coerced can be controlled via hints from the default values or data types. Default values and data types can also be used to control whether or not the empty string should be coerced into None. The infinity_io_flag rules are applied subsequent to this coercion.

Note - pandas doesn't really do a fantastic job handling date types either, since it coerces all columns to be the same type. If that's the behavior you want you can use PanDatFactory.

JSON is just a better file format than csv for a variety of reasons, to include typing of data.

find_duplicates

CsvTicFactory.find_duplicates(dir_path,
                              dialect='excel',
                              headers_present=True,
                              encoding=None)

Find the row counts for duplicated rows.

:param dir_path: the directory containing .csv files.

:param dialect: the csv dialect. Consult csv documentation for details.

:param headers_present: Boolean. Does the first row of data contain the column headers?

:param encoding: see docstring for the Python.open function

:return: A dictionary whose keys are the table names for the primary key tables. Each value of the return dictionary is itself a dictionary. The inner dictionary is keyed by the primary key values encountered in the table, and the value is the count of records in the Excel sheet with this primary key. Row counts smaller than 2 are pruned off, as they aren't duplicates

caveats: Missing files resolve to an empty table, but missing fields (data or primary key) on matching files throw an Exception.

write_directory

CsvTicFactory.write_directory(tic_dat,
                              dir_path,
                              allow_overwrite=False,
                              dialect='excel',
                              write_header=True,
                              case_space_table_names=False)

write the ticDat data to a collection of csv files

:param tic_dat: the data object

:param dir_path: the directory in which to write the csv files

:param allow_overwrite: boolean - are we allowed to overwrite existing files?

:param dialect: the csv dialect. Consult csv documentation for details.

:param write_header: Boolean. Should the header information be written as the first row?

:param case_space_table_names: boolean - make best guesses how to add spaces and upper case characters to table names

:return: