Cum sa procesezi fisiere Word si Excel in Java
Ceea ce am vrut sa realizez este extragerea unor informatii aflate intr-un tabel Excel si crearea pe baza lor a unui fisier Word cu un rezumat al acestora.
Fisierele Word si Excel sunt salvate in formatul Microsoft numit OLE (Object Linking and Embedding). Ele pot fi accesate din Java in mai multe moduri, insa eu voi vorbi despre biblioteca de functii POI (http://poi.apache.org/).
POI, prescurtarea pentru Poor Obfuscation Implementation, este un proiect startat de apache.
La linkul http://www.apache.org/dyn/closer.cgi/poi/release/ este disponibil spre a fi downloadat ultimul release, care odata inclus in Build-Path-ul proiectului la care lucrati, va furniza accesul la clasele specializate de procesare a fisierelor in formatul specific MS Office.
Terminologia POI:
- POIFS (Poor Obfuscation Implementation File System): Java APIs for reading and writing
- OLE (Object Linking and Embedding) 2 compound document formats.
- HSSF (Horrible Spreadsheet Format): Java API to read Microsoft Excel.
- HDF (Horrible Document Format): Java API to read and write Microsoft Word 97.
- HPSF (Horrible Property Set Format): Java API for reading property sets using (only) Java.
import java.io.IOException;
import java.io.InputStream;
import java.util.Iterator;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
import org.apache.poi.hssf.usermodel.HSSFCell;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.hssf.usermodel.HSSFRow;
...
POIFSFileSystem fs;
HSSFWorkbook wb;
InputStream input;
InputStream input = ExcelReader.class.getResourceAsStream(file );
try {
fs = new POIFSFileSystem( input );
}catch(Exception e){
e.printStackTrace();
System.exit(0);
}
try {
wb = new HSSFWorkbook(fs);
}catch(Exception e){
e.printStackTrace();
System.exit(0);
}
int number_of_sheets = wb.getNumberOfSheets();
for(int i =0; i!=number_of_sheets;i++){
HSSFSheet sheet = wb.getSheetAt(i);
Iterator rows = sheet.rowIterator();
while( rows.hasNext() ) {
HSSFRow row = (HSSFRow) rows.next();
Iterator cells = row.cellIterator();
while( cells.hasNext() ) {
HSSFCell cell = (HSSFCell) cells.next();
switch ( cell.getCellType() ) {
case HSSFCell.CELL_TYPE_NUMERIC:
System.out.println( cell.getNumericCellValue() );
//do something
break;
case HSSFCell.CELL_TYPE_STRING:
System.out.println( cell.getStringCellValue() );
//do something
break;
default:
System.out.println( "unsuported sell type" );
//do something
break;
}//end while cell
}//end while row
}//end for each sheet
...
Pentru exemplul de mai sus am folosit un build-ul poi-2.5.1-final-20040804.jar. Ultima versiune disponibila in acest moment este poi-bin-3.2-FINAL-20081019.tar.
Scrierea textului extras in Word se poate face tot cu ajutorul POI. Exemplul clasic se gaseste la adresa urmatoare: http://mail-archives.apache.org/mod_mbox/poi-dev/200311.mbox/%3C6.0.0.22.2.20031113134659.01e6d680@mail.jahia.com%3E
O alta metoda este folosirea clasei OutputStreamWriter pentru scrierea unui document :
FileOutputStream fs = new FileOutputStream(file);
OutputStreamWriter out = new OutputStreamWriter(fs);
out.write("Orice propozitie doresti");
Pentru formatarea HTML, un tutorial gasiti la adresa http://www.w3schools.com/html/default.asp.
Spor!