File Format and Conversion Guide

How to submit papers in various formats to ScholarlyCommons@Penn

ScholarlyCommons@Penn provides papers in Adobe's Portable Document Format (PDF). Documents in this format can be read online with programs like Adobe's Acrobat Reader, Apple's Preview, and open source programs like Xpdf and Ghostscript. The text of properly prepared PDF files can also be searched, and copied and pasted into other documents. The search engine of ScholarlyCommons@Penn can search all papers in the collection that are in text-searchable PDF. Ideally, the PDF files will also have all fonts embedded (that is, their specifications are included with the file) so that they look the same on all machines. If you are using special symbols, embedding the fonts may be crucial, since other fonts might not define the same symbols as your font.

To create searchable, good-looking PDF files, follow the steps below. If you have any questions or problems, please don't hesitate to contact the library. You can reach us by email at

Contents

How to check and submit a PDF file

If you already have a PDF file for your paper, you can submit it directly to ScholarlyCommons@Penn. You might first want to make sure that the file is searchable and the fonts are generally readable.

  • To see if the file is searchable, open it in your favorite PDF viewer, and search for a word or phrase in the body text of the paper.
  • To see if the file's fonts are generally readable, try opening it on a different machine than the one you used to create it, and see if it still looks correct.
  • If you created the PDF by converting it from another format, check that these things are as you expect:
    • the number of pages (i.e. see that it goes all the way to the end)
    • the appearance and placement of diagrams and other images
    • the appearance of any special symbols

{ top }

How to submit a file created in Microsoft Word

You can also submit Word files (in .doc format) directly to ScholarlyCommons@Penn, and they will be converted to PDF by the repository software.

{ top }

How to submit a file created in another word processor or editor

Open the document in the appropriate editor, and then select the "Save As…" menu item or command. Select "RTF" or "Rich Text Format" as the save format, and then save the file. Open the RTF file in the editor to make sure it looks the way you want. If it does, submit the RTF file to ScholarlyCommons@Penn, and it will be converted to PDF by the repository software.

If your word processor doesn't have an option to save as RTF, or if the RTF doesn't look right, you can try printing to a file instead. This will usually create a PostScript file, which you can then convert as described below. (Some word processors also have a "Save as PDF" option.)

{ top }

How to submit a TeX or LaTeX document

You'll need to create a PDF file from the document. The easiest way to do this is to use the free program “pdftex” (which might also be run as "pdflatex"). Pdftex processes documents written in TeX or LaTeX, and produces output in PDF. Pdftex also supports new TeX macros that authors can use to enhance the appearance of the PDF version.

If you don't already have pdftex on your machine, you can download it, and read its documentation, at http://www.tug.org/applications/pdftex/.

After you run pdftex, check the resulting PDF as described earlier. If it looks good, then submit the PDF file to ScholarlyCommons@Penn.

If pdftex doesn't work for you, you can use older TeX or LaTeX software to create PostScript instead, and then follow the steps below for PostScript.

{ top }

How to submit a PostScript file

PostScript can usually be converted to PDF without much difficulty, since PDF is in some respects a successor format to PostScript. However, not all PostScript files will convert to PDF files that are searchable and have embedded fonts. So if you can find an earlier file in one of the formats above, it's usually better to try to start from that.

If you do want to convert from PostScript, there are basically three alternatives to try. We hope the first of these will be sufficient in most cases, but if it doesn't work well in a particular case, try the others on the list:

  • Use GSView / Ghostscript (free software)
  • Use Adobe Acrobat Standard or Pro (commercial software)
  • Use an online service (see next section)

The free GSView and Ghostscript programs allow you, in many cases, to create a fully searchable version of your PostScript file. You can find more information on GSView, and download the software at http://www.cs.wisc.edu/~ghost/gsview/.

Here are the steps we recommend for best results with GSView:

  1. Open your PostScript document in GSView.
  2. From the File menu, select Convert
  3. From the Device list select pdfwrite
  4. From the resolution list select 300 dpi or higher. (If you choose 72dpi, fonts will look rough, especially when printed. We recommend 600dpi if it's available.)
  5. Click on the Properties button
  6. Under Properties select Compatibility Level
  7. Under Value select Level 1.3, then click on OK
  8. Under Properties Select EmbedAllFonts
  9. Under Value select true, then click on OK
  10. Under Properties select SubsetAllFonts
  11. Under Value select True, then click on OK
  12. Under Value select MaxSubsetPct
  13. Under Value enter 100, then click on OK
  14. Click on OK in the Convert window
  15. Choose the destination file path, enter a file name and end the file name with the extension .pdf
  16. Check the PDF version for accuracy, as described earlier.

Adobe also sells Adobe Acrobat software that supports the creation of PDFs from PostScript, and certain other formats. (This functionality was formerly sold separately as Distiller, and is now integrated into the standard and professional Acrobat products. It is not part of the free Adobe Reader download, though.) We have found that Adobe's software can convert some PostScript documents that are not fully compliant with the PostScript standards, and that cannot be converted with Ghostscript or GSView. (On the other hand, some Ghostscript users report that they can convert PostScript documents that Adobe's software can't.) For more information on Acrobat, see http://www.adobe.com/

{ top }

How to use an online service to create PDF files

Several online services offer conversion to PDF from PostScript and other formats. You upload the original document to a server, and the server returns a PDF version of the document. This works best for self-contained documents, and may not work well (or at all) for papers that depend on externally linked files (as many TeX documents do). You should check the resulting PDF files as described earlier to make sure the service made a complete, searchable, and good-looking conversion.

Most public PDF conversion services require an account, and charge a fee (sometimes after a few free samples). The best known is probably the one offered by Adobe at http://createpdf.adobe.com/ but others exist as well, with varying prices and quality.

{ top }

How to prepare supplementary files

You can also submit supplementary files along with your paper. For example, there might be images, audio, video, or data that accompanies the paper. While ScholarlyCommons@Penn can accept supplementary files in any format, we strongly recommend you limit files to our supported supplementary file formats if possible. (You can convert them if necessary before submission, using appropriate software or the online services mehtioned above.) Using the supported formats makes it more likely that your readers will be able to view or use the supplementary files, and that we will be able to migrate the files to new technologies when necessary.

{ top }

Questions, problems, suggestions

If you have any questions or problems creating submission files for ScholarlyCommons@Penn, or suggestions for tips we could add to this guide, please let us know. Write to us at . We'll try to get back to you and help you out as soon as we can.

{ top }