You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/source/io.rst
+33-18
Original file line number
Diff line number
Diff line change
@@ -1001,7 +1001,7 @@ Objects can be written to the file just like adding key-value pairs to a dict:
1001
1001
store['wp'] = wp
1002
1002
1003
1003
# the type of stored data
1004
-
store.handle.root.wp._v_attrs.pandas_type
1004
+
store.root.wp._v_attrs.pandas_type
1005
1005
1006
1006
store
1007
1007
@@ -1037,8 +1037,7 @@ Storing in Table format
1037
1037
1038
1038
``HDFStore`` supports another ``PyTables`` format on disk, the ``table`` format. Conceptually a ``table`` is shaped
1039
1039
very much like a DataFrame, with rows and columns. A ``table`` may be appended to in the same or other sessions.
1040
-
In addition, delete & query type operations are supported. You can create an index with ``create_table_index``
1041
-
after data is already in the table (this may become automatic in the future or an option on appending/putting a ``table``).
1040
+
In addition, delete & query type operations are supported.
1042
1041
1043
1042
.. ipython:: python
1044
1043
:suppress:
@@ -1061,11 +1060,7 @@ after data is already in the table (this may become automatic in the future or a
1061
1060
store.select('df')
1062
1061
1063
1062
# the type of stored data
1064
-
store.handle.root.df._v_attrs.pandas_type
1065
-
1066
-
# create an index
1067
-
store.create_table_index('df')
1068
-
store.handle.root.df.table
1063
+
store.root.df._v_attrs.pandas_type
1069
1064
1070
1065
Hierarchical Keys
1071
1066
~~~~~~~~~~~~~~~~~
@@ -1090,8 +1085,7 @@ Storing Mixed Types in a Table
1090
1085
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1091
1086
1092
1087
Storing mixed-dtype data is supported. Strings are store as a fixed-width using the maximum size of the appended column. Subsequent appends will truncate strings at this length.
1093
-
Passing ``min_itemsize = { column_name : size }`` as a paremeter to append will set a larger minimum for the column. Storing ``floats, strings, ints, bools`` are currently supported.
1094
-
Pass ``min_itemsize`` with a ``column_name`` of values to effect a minimum pre-allocation of space for strings in the dataset.
1088
+
Passing ``min_itemsize = { `values` : size }`` as a parameter to append will set a larger minimum for the string columns. Storing ``floats, strings, ints, bools`` are currently supported.
1095
1089
1096
1090
.. ipython:: python
1097
1091
@@ -1100,11 +1094,14 @@ Pass ``min_itemsize`` with a ``column_name`` of values to effect a minimum pre-a
You can create an index for a table with ``create_table_index`` after data is already in the table (after and ``append/put`` operation). Creating a table index is **highly** encouraged. This will speed your queries a great deal when you use a ``select`` with the indexed dimension as the ``where``. It is not automagically done now because you may want to index different axes than the default (except in the case of a DataFrame, where it almost always makes sense to index the ``index``.
- You can not append/select/delete to a non-table (table creation is determined on the first append, or by passing ``table=True`` in a put operation)
1153
1167
- ``HDFStore`` is **not-threadsafe for writing**. The underlying ``PyTables`` only supports concurrent reads (via threading or processes). If you need reading and writing *at the same time*, you need to serialize these operations in a single thread in a single process. You will corrupt your data otherwise. See the issue <https://github.com/pydata/pandas/issues/2397> for more information.
1154
1168
1155
-
- ``PyTables`` only supports fixed-width string columns in ``tables``. The sizes of a string based indexing column (e.g. *column* or *minor_axis*) are determined as the maximum size of the elements in that axis or by passing the parameter ``min_itemsize`` on the first table creation (``min_itemsize`` can be an integer or a dict of column name to an integer). If subsequent appends introduce elements in the indexing axis that are larger than the supported indexer, an Exception will be raised (otherwise you could have a silent truncation of these indexers, leading to loss of information).
1169
+
- ``PyTables`` only supports fixed-width string columns in ``tables``. The sizes of a string based indexing column (e.g. *columns* or *minor_axis*) are determined as the maximum size of the elements in that axis or by passing the parameter ``min_itemsize`` on the first table creation (``min_itemsize`` can be an integer or a dict of column name to an integer). If subsequent appends introduce elements in the indexing axis that are larger than the supported indexer, an Exception will be raised (otherwise you could have a silent truncation of these indexers, leading to loss of information). Just to be clear, this fixed-width restriction applies to **indexables** (the indexing columns) and **string values** in a mixed_type table.
wp = wp.rename_axis(lambda x: x + '_big_strings', axis=2)
1161
1175
store.append('wp_big_strings', wp)
1162
1176
store.select('wp_big_strings')
1163
1177
1178
+
# we have provided a minimum minor_axis indexable size
1179
+
store.root.wp_big_strings.table
1180
+
1164
1181
Compatibility
1165
1182
~~~~~~~~~~~~~
1166
1183
1167
1184
0.10 of ``HDFStore`` is backwards compatible for reading tables created in a prior version of pandas,
1168
-
however, query terms using the prior (undocumented) methodology are unsupported. You must read in the entire
1169
-
file and write it out using the new format to take advantage of the updates.
1185
+
however, query terms using the prior (undocumented) methodology are unsupported. ``HDFStore`` will issue a warning if you try to use a prior-version format file. You must read in the entire
1186
+
file and write it out using the new format to take advantage of the updates. The group attribute ``pandas_version`` contains the version information.
1170
1187
1171
1188
1172
1189
Performance
1173
1190
~~~~~~~~~~~
1174
1191
1175
-
- ``Tables`` come with a performance penalty as compared to regular stores. The benefit is the ability to append/delete and query (potentially very large amounts of data).
1192
+
- ``Tables`` come with a writing performance penalty as compared to regular stores. The benefit is the ability to append/delete and query (potentially very large amounts of data).
1176
1193
Write times are generally longer as compared with regular stores. Query times can be quite fast, especially on an indexed axis.
1177
1194
- ``Tables`` can (as of 0.10.0) be expressed as different types.
1178
1195
1179
1196
- ``AppendableTable`` which is a similiar table to past versions (this is the default).
1180
1197
- ``WORMTable`` (pending implementation) - is available to faciliate very fast writing of tables that are also queryable (but CANNOT support appends)
1181
1198
1182
1199
- To delete a lot of data, it is sometimes better to erase the table and rewrite it. ``PyTables`` tends to increase the file size with deletions
1183
-
- In general it is best to store Panels with the most frequently selected dimension in the minor axis and a time/date like dimension in the major axis, but this is not required. Panels can have any major_axis and minor_axis type that is a valid Panel indexer.
1184
-
- No dimensions are currently indexed automagically (in the ``PyTables`` sense); these require an explict call to ``create_table_index``
1185
1200
- ``Tables`` offer better performance when compressed after writing them (as opposed to turning on compression at the very beginning)
1186
1201
use the pytables utilities ``ptrepack`` to rewrite the file (and also can change compression methods)
1187
1202
- Duplicate rows can be written, but are filtered out in selection (with the last items being selected; thus a table is unique on major, minor pairs)
0 commit comments