source: file: Compression

DFFML is hoping to participate in Google Summer of Code (GSoC) under the Python Software Foundation umbrella.  You can read all about what this means at http://python-gsoc.org/.  This issue, and any others tagged `gsoc` and `project` are not generally available bugs, but related to project ideas for GSoC.

# Project Idea: File Source Compression

**Project description:**

DFFML's initial release includes a `FileSource` which saves and loads data from files using the `load_fd` and `dump_fd` methods.

> JSON Example

https://github.com/intel/dffml/blob/dd8007d0c9f8c58c35c94faf148e2b5d6ce4c101/dffml/source/json.py#L19-L27

For the `open` method of `FileSource`

https://github.com/intel/dffml/blob/dd8007d0c9f8c58c35c94faf148e2b5d6ce4c101/dffml/source/file.py#L36-L44

Allow for reading and writing the following file formats, transparently (so without subclasses having to do anything) to any source which is a subclass of `FileSource`.

- [x] gzip (by @yashlamba)
- [x] bz2
- [x] lzma
- [x] zip

**Skills:** Python, git
**Difficulty level:** Easy

**Related Readings/Links:**

See https://docs.python.org/3/library/archiving.html for documentation

**Potential mentors:** @pdxjohnny

**Getting Started:** Figure out how to do one of the file types, probably gzip (as that probably is as simple as using https://docs.python.org/3/library/gzip.html#gzip.GzipFile if the filename ends in `.gz`) then move on to the rest. For now just make modifications directly to the `FileSource` class. We may have you split out the logic later, but don't worry about another class for now.

**What we want to see in your application:** Describe how you intend to solve the problem, and give us some "stretch goals", maybe implement a remote file source which reads form URLs. Don't forget to include some time for building appropriate tests.

	async def load_fd(self, fd):
	repos = json.load(fd)
	self.mem = {src_url: Repo(src_url, data=data) \
	for src_url, data in repos.items()}
	LOGGER.debug('%r loaded %d records', self, len(self.mem))

	async def dump_fd(self, fd):
	json.dump({repo.src_url: repo.dict() for repo in self.mem.values()}, fd)
	LOGGER.debug('%r saved %d records', self, len(self.mem))

	async def _open(self):
	if not os.path.exists(self.filename) \
	or os.path.isdir(self.filename):
	LOGGER.debug('%r is not a file, initializing memory to empty dict',
	self.filename)
	self.mem = {}
	return
	with open(self.filename, 'r') as fd:
	await self.load_fd(fd)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

source: file: Compression #15

Project Idea: File Source Compression

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

source: file: Compression #15

Description

Project Idea: File Source Compression

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions