Discussion:
Extract data from email
(too old to reply)
SteveW
2008-08-06 15:07:35 UTC
Permalink
I need to extract the data from an email that is in HTML format.

It contains 2 or 3 or 4 images. I see from the sorce code in outlook the
following:

<TD><img height=187 srv="cid:oofdo1c8f7d4$obe4$obe407$***@AGT"

I presume this is the image name but I have searched the hard drive and
cannot find it.

How would I get started with please.

Any help appreciated.

Cheers

SteveW
Ken White
2008-08-06 15:35:13 UTC
Permalink
Post by SteveW
I need to extract the data from an email that is in HTML format.
It contains 2 or 3 or 4 images. I see from the sorce code in outlook the
I presume this is the image name but I have searched the hard drive and
cannot find it.
How would I get started with please.
Any help appreciated.
When an HTML email is sent, it is accompanied by a plain text
version of the message, and any images are contained in this
text version as attachments. The URL for the img source above
is a reference to this embedded image attachment. It has nothing
to do with a file on the hard disk; it indicates an image
attached to the text copy of the email.

The Indy email components have the capability of working with
the plain text version of the mail, including access to the
attachments. They're included with Delphi out of the box.

HTH,

Ken
Remy Lebeau (TeamB)
2008-08-06 17:42:01 UTC
Permalink
Post by Ken White
When an HTML email is sent, it is accompanied by a plain text
version of the message
Not always. That is dependant on how the email sender is implemented.
Post by Ken White
any images are contained in this text version as attachments.
That is not quite how it works. The plain-text version, if present, is
wrapped inside the same MIME container that holds the HTML and its related
attachments. The plain text itself is not the attachments.
Post by Ken White
The URL for the img source above is a reference to this
embedded image attachment.
That much is true, but only if the image URL is using the "cid" scheme.
Post by Ken White
It has nothing to do with a file on the hard disk
That depends on how the email receiver is implemented. It might be storing
the attachments in temporary files on the hard disk rather than holding them
in memory.


Gambit
Roy Lambert
2008-08-06 15:44:42 UTC
Permalink
SteveW


Looking at that its probably an email in mime format with the images embedded in-line. Does the email header have mime version in it? if you search the body of the email can you find a reference outside of an image tag to oofdo1c8f7d4$obe4$obe407$0201a8coAGT?

Roy Lambert
SteveW
2008-08-06 17:56:50 UTC
Permalink
No to both questions.

Cheers

SteveW
Post by SteveW
SteveW
Looking at that its probably an email in mime format with the images
embedded in-line. Does the email header have mime version in it? if you
search the body of the email can you find a reference outside of an image
tag to oofdo1c8f7d4$obe4$obe407$0201a8coAGT?
Roy Lambert
Roy Lambert
2008-08-07 06:24:07 UTC
Permalink
SteveW


Can you post a zip of the email into attachments so I can have a look?

Roy Lambert
Remy Lebeau (TeamB)
2008-08-06 17:38:08 UTC
Permalink
Post by SteveW
I presume this is the image name but I have searched the
hard drive and cannot find it.
Because that is not where the images are stored. They are attached to the
email itself. In the case above, loop through the email attachments looking
for the one that has a "Content-ID" header of
"oofdo1c8f7d4$obe4$obe407$***@AGT" or
"<oofdo1c8f7d4$obe4$obe407$***@AGT>". That is what the "cid" in the
"img" tag is referring to.


Gambit
Mike Shkolnik
2008-08-06 19:11:39 UTC
Permalink
Steve,

in what format is your message stored? .eml? .msg?
Why you search the image as file on hard disk? This image is a part of
message file (eml/msg)
--
With best regards, Mike Shkolnik
Scalabium Software
http://www.scalabium.com
Post by SteveW
I need to extract the data from an email that is in HTML format.
It contains 2 or 3 or 4 images. I see from the sorce code in outlook the
I presume this is the image name but I have searched the hard drive and
cannot find it.
How would I get started with please.
Any help appreciated.
Cheers
SteveW
SteveW
2008-08-07 10:25:58 UTC
Permalink
I have saved the email from outlook and posted in the attachment section.

The email appears with images within the document not as a seperate
attachment. The file save attachment in outlook is greyed out.

I can forward the email if needed. Email ***@nospam__googlemail.com

Cheers
Post by Mike Shkolnik
Steve,
in what format is your message stored? .eml? .msg?
Why you search the image as file on hard disk? This image is a part of
message file (eml/msg)
--
With best regards, Mike Shkolnik
Scalabium Software
http://www.scalabium.com
Post by SteveW
I need to extract the data from an email that is in HTML format.
It contains 2 or 3 or 4 images. I see from the sorce code in outlook the
I presume this is the image name but I have searched the hard drive and
cannot find it.
How would I get started with please.
Any help appreciated.
Cheers
SteveW
Roy Lambert
2008-08-07 12:18:34 UTC
Permalink
SteveW


What you posted isn't an email. Its an html document which looks as though its an email when displayed. It may be caused by the export from Outlook (which I don't use).

Unless someone's changed the standard whilst I wasn't looking (or there's an RFC I'm don't know) any headers should be outside the html wrapper. If this is what you have then you have no chance of extracting any images - there aren't any.

Try this - send yourself an email with an image embedded, export it and have a look at the result in Notepad. There should be a head block separated by a blank line from the message and at the bottom will be a load of "garbage" that is your encoded image.

Roy Lambert

Continue reading on narkive:
Loading...