CsvPanFactory
CsvPanFactory(self, pan_dat_factory)
Primary class for reading/writing csv files with PanDat objects. Don't create this object explicitly. A CsvPanFactory will automatically be associated with the csv attribute of the parent PanDatFactory.
create_pan_dat
CsvPanFactory.create_pan_dat(dir_path,
fill_missing_fields=False,
**kwargs)
Create a PanDat object from a directory of csv files.
:param db_file_path: the directory containing the .csv files.
:param fill_missing_fields: boolean. If truthy, missing fields will be filled in with their default value. Otherwise, missing fields throw an Exception.
:param kwargs: additional named arguments to pass to pandas.read_csv
:return: a PanDat object populated by the matching tables.
caveats: Missing tables always throw an Exception. Table names are matched with case-space insensitivity, but spaces are respected for field names. (ticdat supports whitespace in field names but not table names).
Note that if you save a DataFrame to csv and then recover it, the type of data might change. For example
df = pd.DataFrame({"a":["100", "200", "300"]})
df.to_csv("something.csv")
df2 = pd.read_csv("something.csv")
results in a numeric column in df2. To address this, you need to either use set_data_type for your PanDatFactory, or specify "dtype" in kwargs. (The former is obviously better).
This problem is even worse with df = pd.DataFrame({"a":["0100", "1200", "2300"]})
write_directory
CsvPanFactory.write_directory(pan_dat,
dir_path,
case_space_table_names=False,
index=False,
**kwargs)
write the PanDat data to a collection of csv files
:param pan_dat: the PanDat object to write
:param dir_path: the directory in which to write the csv files Set to falsey if using con argument.
:param case_space_table_names: boolean - make best guesses how to add spaces and upper case characters to table names
:param index: boolean - whether or not to write the index.
:param kwargs: additional named arguments to pass to pandas.to_csv
:return:
caveats: The row names (index) isn't written (unless kwargs indicates it should be).
JsonPanFactory
JsonPanFactory(self, pan_dat_factory)
Primary class for reading/writing json data with PanDat objects. Don't create this object explicitly. A JsonPanFactory will automatically be associated with the json attribute of the parent PanDatFactory.
create_pan_dat
JsonPanFactory.create_pan_dat(path_or_buf,
fill_missing_fields=False,
orient='split',
**kwargs)
Create a PanDat object from a JSON file or string
:param path_or_buf: a valid JSON string or file-like
:param fill_missing_fields: boolean. If truthy, missing fields will be filled in with their default value. Otherwise, missing fields throw an Exception. Doesn't work with list-of-lists format.
:param orient: Indication of expected JSON string format. See pandas.read_json for more details.
:param kwargs: additional named arguments to pass to pandas.read_json
:return: a PanDat object populated by the matching tables.
caveats: Missing tables always resolve to an empty table.
Table names are matched with case-space insensitivity, but spaces
are respected for field names.
(ticdat supports whitespace in field names but not table names).
Note that if you save a DataFrame to json and then recover it, the type of data might change. Specifically, text that looks numeric might be recovered as a number, to include the loss of leading zeros. To address this, you need to either use set_data_type for your PanDatFactory, or specify "dtype" in kwargs. (The former is obviously better).
write_file
JsonPanFactory.write_file(pan_dat, json_file_path)
Write the PanDat data to a json file (or json string). Writes each table as a list-of-lists. See write_file_pd for other formats.
:param pan_dat: the PanDat object to write
:param json_file_path: the json file into which the data is to be written. If falsey, will return a JSON string
:return: A JSON string if json_file_path is falsey, otherwise None
write_file_pd
JsonPanFactory.write_file_pd(pan_dat,
json_file_path,
case_space_table_names=False,
orient='split',
index=False,
indent=2,
sort_keys=False,
**kwargs)
write the PanDat data to a json file (or json string). Use this routine to write json text that is consistent with what pandas.to_json. The list-of-lists format is created with write_file. In older ticdat releases, write_file implemented the functionaltiy now provided with write_file_pd.
:param pan_dat: the PanDat object to write
:param json_file_path: the json file into which the data is to be written. If falsey, will return a JSON string
:param case_space_table_names: boolean - make best guesses how to add spaces and upper case characters to table names
:param orient: Passed through to pandas.to_json. Default of "split", combined with index=False, writes a smaller json file.
:param index: boolean - whether or not to write the index.
:param indent: 2. See json.dumps
:param sort_keys: See json.dumps
:param kwargs: additional named arguments to pass to pandas.to_json
:return:
NB - pandas seems stubbornly unable to inject Infinity into json, but it can read Infinity from json. We work around this with a GUID created flagging string when encountering float("inf"), float(-"inf").
SqlPanFactory
SqlPanFactory(self, pan_dat_factory)
Primary class for reading/writing SQLite files with PanDat objects. Don't create this object explicitly. A SqlPanFactory will automatically be associated with the sql attribute of the parent PanDatFactory.
create_pan_dat
SqlPanFactory.create_pan_dat(db_file_path,
con=None,
fill_missing_fields=False)
Create a PanDat object from a SQLite database file
:param db_file_path: A SQLite DB File. Set to falsey if using con argument
:param con: A connection object that can be passed to pandas read_sql. Set to falsey if using db_file_path argument.
:param fill_missing_fields: boolean. If truthy, missing fields will be filled in with their default value. Otherwise, missing fields throw an Exception.
:return: a PanDat object populated by the matching tables.
caveats: Missing tables always resolve to an empty table, but missing fields on matching tables throw an exception (unless fill_missing_fields is truthy).
Table names are matched with case-space insensitivity, but spaces
are respected for field names.
(ticdat supports whitespace in field names but not table names).
write_file
SqlPanFactory.write_file(pan_dat,
db_file_path,
con=None,
if_exists='replace',
case_space_table_names=False)
write the PanDat data to an excel file
:param pan_dat: the PanDat object to write
:param db_file_path: The file path of the SQLite file to create. Set to falsey if using con argument.
:param con: A connection object that can be passed to pandas to_sql. Set to falsey if using db_file_path argument
:param if_exists: ‘fail’, ‘replace’ or ‘append’. How to behave if the table already exists
:param case_space_table_names: boolean - make best guesses how to add spaces and upper case characters to table names
:return:
caveats: The row names (index) isn't written. The default pandas schema generation is used, and thus foreign key relationships aren't written.
XlsPanFactory
XlsPanFactory(self, pan_dat_factory)
Primary class for reading/writing Excel files with panDat objects. Don't create this object explicitly. A XlsPanFactory will automatically be associated with the xls attribute of the parent PanDatFactory.
create_pan_dat
XlsPanFactory.create_pan_dat(xls_file_path, fill_missing_fields=False)
Create a PanDat object from an Excel file
:param xls_file_path: An Excel file containing sheets whose names match the table names in the schema.
:param fill_missing_fields: boolean. If truthy, missing fields will be filled in with their default value. Otherwise, missing fields throw an Exception.
:return: a PanDat object populated by the matching sheets.
caveats: Missing sheets resolve to an empty table, but missing fields on matching sheets throw an Exception (unless fill_missing_fields is truthy). Table names are matched to sheets with with case-space insensitivity, but spaces and case are respected for field names. (ticdat supports whitespace in field names but not table names).
Note that if you save a DataFrame to excel and then recover it, the type of data might change. For example
df = pd.DataFrame({"a":["100", "200", "300"]})
df.to_excel("something.xlsx")
df2 = pd.read_excel("something.xlsx")
results in a numeric column in df2. To address this, you need to use set_data_type for your PanDatFactory.
This problem is even worse with df = pd.DataFrame({"a":["0100", "1200", "2300"]})
write_file
XlsPanFactory.write_file(pan_dat,
file_path,
case_space_sheet_names=False)
write the panDat data to an excel file
:param pan_dat: the PanDat object to write
:param file_path: The file path of the excel file to create
:param case_space_sheet_names: boolean - make best guesses how to add spaces and upper case characters to sheet names
:return:
caveats: The row names (index) isn't written.