Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add emldir as source type #760

Open
deajan opened this issue Oct 26, 2024 · 0 comments
Open

Add emldir as source type #760

deajan opened this issue Oct 26, 2024 · 0 comments
Assignees
Labels
🎁 feature request Not existing yet and need to be implemented 🙏 help wanted I can't do this alone and need contributors

Comments

@deajan
Copy link

deajan commented Oct 26, 2024

Is your feature request related to a problem? Please describe.

I've got 3 folders containing a total of 590k .eml files that I exported from outlook PST / OST files.
I'd like to archive them, but beforehand, I want to dedup them.

Found your tool, ran the following command line:

/mnt/venv/bin/mdedup -b raw --action move-selected --strategy select-biggest --export /mnt/DEDUPED /mnt/EML /mnt/EML-OST /mnt/EML-PST

mdedup complained with

Traceback (most recent call last):
  File "/mnt/venv/bin/mdedup", line 8, in <module>
    sys.exit(main())
  File "/mnt/venv/lib64/python3.9/site-packages/mail_deduplicate/__main__.py", line 45, in main
    mdedup(prog_name=mdedup.name)
  File "/mnt/venv/lib64/python3.9/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/mnt/venv/lib64/python3.9/site-packages/click_extra/commands.py", line 347, in main
    return super().main(*args, **kwargs)
  File "/mnt/venv/lib64/python3.9/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/mnt/venv/lib64/python3.9/site-packages/click_extra/commands.py", line 377, in invoke
    return super().invoke(ctx)
  File "/mnt/venv/lib64/python3.9/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/mnt/venv/lib64/python3.9/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/mnt/venv/lib64/python3.9/site-packages/cloup/_context.py", line 47, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/mnt/venv/lib64/python3.9/site-packages/mail_deduplicate/cli.py", line 408, in mdedup
    dedup.add_source(source)
  File "/mnt/venv/lib64/python3.9/site-packages/mail_deduplicate/deduplicate.py", line 379, in add_source
    boxes = open_box(path, self.conf.input_format, self.conf.force_unlock)
  File "/mnt/venv/lib64/python3.9/site-packages/mail_deduplicate/mail_box.py", line 163, in open_box
    box_type = autodetect_box_type(path)
  File "/mnt/venv/lib64/python3.9/site-packages/mail_deduplicate/mail_box.py", line 134, in autodetect_box_type
    raise ValueError(msg)
ValueError: Missing sub-directory 'cur'

I had to manually create directories {tmp,cur,new} in each of my EML directories, and move all emails into cur dir, which obviously is painful, especially when the first "greeting" of the program is an obscure error.

Describe the solution you'd like

The detection in autodetect_box_type is kind of flawed.
Raising an error because the target is a directory without subdirectories is a bit hard.
There should at least be an error message.

Also, you should definitly add a box_type = 'eml_folder' which would just walk over all subfolders and read eml files from it.
Basically, this could replace maildir box_type which basically is a folder with subfolders containing eml files.

@deajan deajan added 🎁 feature request Not existing yet and need to be implemented 🙏 help wanted I can't do this alone and need contributors labels Oct 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🎁 feature request Not existing yet and need to be implemented 🙏 help wanted I can't do this alone and need contributors
Projects
None yet
Development

No branches or pull requests

2 participants