The process of helping search engines, small businesses and web searchers find agreement.
RSS:
Publications
Comments

Extracting Images from MS Word Docs and PDFs

If you’ve been designing web sites for a while then you know how often clients have problems sending you text and images in a format that you can use. Usually they will send MS Word Documents with images added, or even worse, they’ll send PDFs.

Taking screen shots of the embedded images is trying, at best, and copying text from PDFs causes weird line breaks to occur. The images from the Word Docs aren’t full sized but squeezed small by the client, and they’re almost always skewed.

To fix this, save the Word doc as an HTML file and you’ll have not only the small squeezed images (skewed or not) and the full sized image as well!  Now, to extract the images from PDFs you’ll have to convert the PDF to a Word Doc first and then convert the Word Doc to an HTML – and voila! You now have full access to both text and images. One sad note seems to be that the larger images are lost in the PDFs. At least that has been my experience so far.

I used to get upset with Word Docs and PDFs but no more. Now I’m more than happy to see just where the client wants any images, because it’s already laid out – an easy copy/paste for the text, and a folder search for the images makes the job go a lot smoother. From now on if the client can make a Word Document I’m recommending to them that they send their pages to me that way.

I hope this information is useful to you.  For PDF to Word conversion I’m using Free PDF to Word Doc Converter It’s free but at some point it sends you to their site each time you convert any document and asks a rather annoying (and somewhat complicated) arithmetic question – so it’s worth paying the $15 (actually $18 something – they add this extended software nonsense when you go to final payment) so you don’t have to deal with that each time.

Share and Enjoy:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • Add to favorites
  • email
  • Faves
  • FriendFeed
  • LinkedIn
  • Live
  • MySpace
  • StumbleUpon
  • Technorati
  • Twitter
  • PDF
  • Print
  • RSS

Leave a Reply

You can use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>