CsvPanFactory

CsvPanFactory(self, pan_dat_factory)

Primary class for reading/writing csv files with PanDat objects. Don't create this object explicitly. A CsvPanFactory will automatically be associated with the csv attribute of the parent PanDatFactory.

create_pan_dat

CsvPanFactory.create_pan_dat(dir_path,
                             fill_missing_fields=False,
                             **kwargs)

Create a PanDat object from a directory of csv files.

:param db_file_path: the directory containing the .csv files.

:param fill_missing_fields: boolean. If truthy, missing fields will be filled in with their default value. Otherwise, missing fields throw an Exception.

:param kwargs: additional named arguments to pass to pandas.read_csv

:return: a PanDat object populated by the matching tables.

caveats: Missing tables always throw an Exception. Table names are matched with case-space insensitivity, but spaces are respected for field names. (ticdat supports whitespace in field names but not table names).

Note that if you save a DataFrame to csv and then recover it, the type of data might change. For example

df = pd.DataFrame({"a":["100", "200", "300"]})
df.to_csv("something.csv")
df2 = pd.read_csv("something.csv")

results in a numeric column in df2. To address this, you need to either use set_data_type for your PanDatFactory, or specify "dtype" in kwargs. (The former is obviously better).

This problem is even worse with df = pd.DataFrame({"a":["0100", "1200", "2300"]})

write_directory

CsvPanFactory.write_directory(pan_dat,
                              dir_path,
                              case_space_table_names=False,
                              index=False,
                              **kwargs)

write the PanDat data to a collection of csv files

:param pan_dat: the PanDat object to write

:param dir_path: the directory in which to write the csv files Set to falsey if using con argument.

:param case_space_table_names: boolean - make best guesses how to add spaces and upper case characters to table names

:param index: boolean - whether or not to write the index.

:param kwargs: additional named arguments to pass to pandas.to_csv

:return:

caveats: The row names (index) isn't written (unless kwargs indicates it should be).

JsonPanFactory

JsonPanFactory(self, pan_dat_factory)

Primary class for reading/writing json data with PanDat objects. Don't create this object explicitly. A JsonPanFactory will automatically be associated with the json attribute of the parent PanDatFactory.

create_pan_dat

JsonPanFactory.create_pan_dat(path_or_buf,
                              fill_missing_fields=False,
                              orient='split',
                              **kwargs)

Create a PanDat object from a JSON file or string

:param path_or_buf: a valid JSON string or file-like

:param fill_missing_fields: boolean. If truthy, missing fields will be filled in with their default value. Otherwise, missing fields throw an Exception. Doesn't work with list-of-lists format.

:param orient: Indication of expected JSON string format. See pandas.read_json for more details.

:param kwargs: additional named arguments to pass to pandas.read_json

:return: a PanDat object populated by the matching tables.

caveats: Missing tables always resolve to an empty table.

     Table names are matched with case-space insensitivity, but spaces
     are respected for field names.

     (ticdat supports whitespace in field names but not table names).

Note that if you save a DataFrame to json and then recover it, the type of data might change. Specifically, text that looks numeric might be recovered as a number, to include the loss of leading zeros. To address this, you need to either use set_data_type for your PanDatFactory, or specify "dtype" in kwargs. (The former is obviously better).

write_file

JsonPanFactory.write_file(pan_dat, json_file_path)

Write the PanDat data to a json file (or json string). Writes each table as a list-of-lists. See write_file_pd for other formats.

:param pan_dat: the PanDat object to write

:param json_file_path: the json file into which the data is to be written. If falsey, will return a JSON string

:return: A JSON string if json_file_path is falsey, otherwise None

write_file_pd

JsonPanFactory.write_file_pd(pan_dat,
                             json_file_path,
                             case_space_table_names=False,
                             orient='split',
                             index=False,
                             indent=2,
                             sort_keys=False,
                             **kwargs)

write the PanDat data to a json file (or json string). Use this routine to write json text that is consistent with what pandas.to_json. The list-of-lists format is created with write_file. In older ticdat releases, write_file implemented the functionaltiy now provided with write_file_pd.

:param pan_dat: the PanDat object to write

:param json_file_path: the json file into which the data is to be written. If falsey, will return a JSON string

:param case_space_table_names: boolean - make best guesses how to add spaces and upper case characters to table names

:param orient: Passed through to pandas.to_json. Default of "split", combined with index=False, writes a smaller json file.

:param index: boolean - whether or not to write the index.

:param indent: 2. See json.dumps

:param sort_keys: See json.dumps

:param kwargs: additional named arguments to pass to pandas.to_json

:return:

NB - pandas seems stubbornly unable to inject Infinity into json, but it can read Infinity from json. We work around this with a GUID created flagging string when encountering float("inf"), float(-"inf").

SqlPanFactory

SqlPanFactory(self, pan_dat_factory)

Primary class for reading/writing SQLite files with PanDat objects. Don't create this object explicitly. A SqlPanFactory will automatically be associated with the sql attribute of the parent PanDatFactory.

create_pan_dat

SqlPanFactory.create_pan_dat(db_file_path,
                             con=None,
                             fill_missing_fields=False)

Create a PanDat object from a SQLite database file

:param db_file_path: A SQLite DB File. Set to falsey if using con argument

:param con: A connection object that can be passed to pandas read_sql. Set to falsey if using db_file_path argument.

:param fill_missing_fields: boolean. If truthy, missing fields will be filled in with their default value. Otherwise, missing fields throw an Exception.

:return: a PanDat object populated by the matching tables.

caveats: Missing tables always resolve to an empty table, but missing fields on matching tables throw an exception (unless fill_missing_fields is truthy).

     Table names are matched with case-space insensitivity, but spaces
     are respected for field names.
     (ticdat supports whitespace in field names but not table names).

write_file

SqlPanFactory.write_file(pan_dat,
                         db_file_path,
                         con=None,
                         if_exists='replace',
                         case_space_table_names=False)

write the PanDat data to an excel file

:param pan_dat: the PanDat object to write

:param db_file_path: The file path of the SQLite file to create. Set to falsey if using con argument.

:param con: A connection object that can be passed to pandas to_sql. Set to falsey if using db_file_path argument

:param if_exists: ‘fail’, ‘replace’ or ‘append’. How to behave if the table already exists

:param case_space_table_names: boolean - make best guesses how to add spaces and upper case characters to table names

:return:

caveats: The row names (index) isn't written. The default pandas schema generation is used, and thus foreign key relationships aren't written.

XlsPanFactory

XlsPanFactory(self, pan_dat_factory)

Primary class for reading/writing Excel files with panDat objects. Don't create this object explicitly. A XlsPanFactory will automatically be associated with the xls attribute of the parent PanDatFactory.

create_pan_dat

XlsPanFactory.create_pan_dat(xls_file_path, fill_missing_fields=False)

Create a PanDat object from an Excel file

:param xls_file_path: An Excel file containing sheets whose names match the table names in the schema.

:param fill_missing_fields: boolean. If truthy, missing fields will be filled in with their default value. Otherwise, missing fields throw an Exception.

:return: a PanDat object populated by the matching sheets.

caveats: Missing sheets resolve to an empty table, but missing fields on matching sheets throw an Exception (unless fill_missing_fields is truthy). Table names are matched to sheets with with case-space insensitivity, but spaces and case are respected for field names. (ticdat supports whitespace in field names but not table names).

Note that if you save a DataFrame to excel and then recover it, the type of data might change. For example

df = pd.DataFrame({"a":["100", "200", "300"]})
df.to_excel("something.xlsx")
df2 = pd.read_excel("something.xlsx")

results in a numeric column in df2. To address this, you need to use set_data_type for your PanDatFactory.

This problem is even worse with df = pd.DataFrame({"a":["0100", "1200", "2300"]})

write_file

XlsPanFactory.write_file(pan_dat,
                         file_path,
                         case_space_sheet_names=False)

write the panDat data to an excel file

:param pan_dat: the PanDat object to write

:param file_path: The file path of the excel file to create

:param case_space_sheet_names: boolean - make best guesses how to add spaces and upper case characters to sheet names

:return:

caveats: The row names (index) isn't written.