Java CAPS 5.1 and Java CAPS 6 - Streaming Large FTP Transfers

Transferring large payloads, on the order of tens or hundreds of megabytes, between a FTP server and a local file system, in either direction, requires selection of appropriate features of the Batch FTP and Batch Local File eWays, and tuning certain timing parameters.


Default timing parameter values result in timeout exceptions when transferring large payloads.


The Batch FTP eWay and the Batch Local File eWas are typically used to receive the entire payload before writing it out. This results in attempts to allocate memory many time the size of the payload being transferred and, for large files, causes memory exhaustion and application server failures.


Discussion in the attached document points out which timing parameters need to be tuned to facilitate transfer of large payloads. It also presents sample Java code that uses facilities of the Batch FTP and Batch Local File for streaming payload between the FTP server and the local file systems without using excessive amount of memory.


The material covered in the document was prepared using Java CAPS projects developed and tested in Java CAPS 5.1.0, exported, imported into Java CAPS Release 6 and tested again. It is expected that the code will work in all versions of Java CAPS from 5.1.0 up.


Streaming Large FTP Transfers with CAPS 5.1 and 6.pdf


FTPtoLocalFileStreaming_5.1.0_project_export.zip


LocalFiletoFTPStreaming_JC6_project_export.zip

Comments:

Hi Michael,
Thanks for the screen shots of the new IDE. Anycase quick question, if i am using streaming, and use a JMS queue to store that message, should I have to still factor in the max size of the file I am polling for or can be something significantly smaller?

Thanks
Ravi

Posted by Ravi on April 07, 2008 at 05:45 PM EST #

I meant for the segment size setting.

Posted by Ravi on April 07, 2008 at 05:46 PM EST #

Hello, Ravi.

I assume you are referring to my blog entry on FTP to local file streaming. If not then the response may not make sense.

The streaming solution relies on the fact that at no point is the entire payload in memory. The collaboration (or rather the eWay infrastructure) reads the file a chunk at a time and writes each chunk to the local file until finished. If you need to put that payload into a JMS message then it _will_ be in memory, several times. Furthermore, it does not make much sense, if you need to handle the payload as a message, to use streaming. Streaming is for either getting the content of the file from source to a destination file as efficiently as possible or for getting the content and breaking it us into individual messages as efficiently as possible. For the latter you would also use a Batch Record eWay. If the payload is on the order of a few Mb then reading it in its entirety and passing it around as messages is likely to not be harmful. If it gets much bigger then that then you will likely have issues. So, in short, you have to factor the size of he payload if you need to handle it in its entirety. Depending on the solution you have, how many times the message is processed by different components, etc., you may have to factor the size several times over.

Regards

Michael

Posted by Michael Czapski on April 08, 2008 at 12:04 AM EST #

Hi Michael,
This streaming looks like it can be used only for FTP to BatchLocal.
Can this be done similarly for FTP to Database table? I mean a bulk insert of the received .CSV file to Database table.

Thank you,
Regards,
Meenakshi Mandal

Posted by Meenakshi on May 06, 2008 at 03:34 PM EST #

Hello, Meenakshi.

To accomplish FTP tyo a DB table streaming you will need to strem from BatchFTP to Batch Record then inert into a DB table. Batch Record will give you a record, for example CR/LF delimited. You will need to parse the record, map fiellds of the record to columns in the DB OTD then do an Inert and, possibly, a commit after each recoord or after several records.
Get record, parse, insert will execute in a loop.
This will give you the ability to process arbitrarily large files without havinf to read their contents into memory for parsing.

Regards

Michael

Posted by Michael Czapski on May 07, 2008 at 05:38 AM EST #

Hi Michael,

We are actually trying to accomplish the opposite : make sure the file is completed loaded into memory before process it. We are using BatchInboud eWay to pull the directory of the file and then use BatchLocal to read the file.

However, the JCD unmarshal the data before the file is completely transferred. Is there anyway to make sure the entire file is loaded to the memory before processing the data?

Posted by David on October 28, 2008 at 02:14 AM EST #

Hello, David.

By default the batch Local File eWay does precisely that. The example in this blog entry shows a way to override this default behavior. The Batch eWay documentation for Java CAPS 5.1.3, http://docs.sun.com/app/docs/doc/820-0981, Chapter 7, Using the Batch eWay with Java Collaborations, discusses this and provides examples that may help you.

Regards

Michael

Posted by Michael Czapski on October 28, 2008 at 02:32 AM EST #

Michael,

Thanks for your reply. I wasn't clear on my question - the trigger file is large,and therefore the process gets triggered before the entire file has been created. In this situation, the BatchLocal eWay seems to load the partially transferred file. Can we override this behavior?

Additionally, Batch SFTP doesn't seem to support data streaming suggested in this post. How can we use data streaming if we need secure FTP?

Posted by David on October 28, 2008 at 06:07 AM EST #

Hello, David.

Alas, the only way to prevent Batch Inbound from triggering the Batch Local File is to make sure the writer of the file write the file completely and only then renames it to what the Batch Inbound is looking for.

I have not spent the time looking into SFTP so I have no advice to offer on this topic.

Regards

Michael

Posted by Michael Czapski on October 28, 2008 at 07:34 AM EST #

Hi michael,
Is there any way to streaming the data between BatchFTP External system to another BatchFTP External System.Here my requirement is Source and target FTP should handle size with unlimited size.

Posted by Naveen on November 21, 2008 at 02:18 PM EST #

Hello, Naveen.

There is no way, of which I am aware, to get a StreamAdpater from the FTP OTD so there is no way to use a StreamAdapter from one FTP OTD in another FTP OTD, so the only way to do this that I know of is to stage through the local file system.

Regards

Michael

Posted by Michael on November 22, 2008 at 02:21 AM EST #

Hi Michael,
Could u please tell me how to calculate the file size from a file in FTP Server.Here FTP server can be anything like HP UNIX,AIX UNIX or Windows.Here the restriction is we should not archive it because File size is unlimited.I am using data streaming from one Batch FTP External System to Batch Local and transfer from Batch Local to another Batch External System.Here my idea is split the Data in to equal data segments based on the file size and put the data segments in to Batch Local System and inturn it will put in to another Batch External System.But the problem here is how to calculate the file size from Input file...?

Posted by naveen on November 24, 2008 at 12:05 PM EST #

Hello, Naveen.

I am not aware that there is a way to calculate a file size using the Batch eWay. There are a bunch of assumptions in FTP to Local File (staging) to FTP streaming. One assumption is that the file is 'complete' at the time the transfer starts - that is, whatever wrote the file finished writing it before the Batch eWay gets hold of it. Another assumption is that you can process the complete file - stream it to the local file system - before another batch eWay instance starts processing it (for the scenario you are describing). Again, this is for the same reason - must have complete file otherwise a premature transfer termination will occur if the writer is slower then reader - incomplete file will be transferred. This, in turn, assumes that the staging area has sufficient space to accommodate the file.

There is a way in which you can stream a file and break it into records of fixed size or at a delimiter. This is described in product documentation and in the Java CAPS Book, with examples. For the fixed size transfer the file must be a multiple of the 'size', otherwise transfer will abort on reading teh last 'short' record. Since you don't know the size of the file ahead of time, and there in any case may be files whose size is a prime, breaking the file at record size boundaries does not appear to be appropriate or workable for your case. If the files are not delimited then this option is not available to you either. I don't know what kind of files you are dealing with.

Regards

Michael

Posted by Michael on November 24, 2008 at 12:22 PM EST #

Hi michael,
Thanks for your reply....
Here my file can be any type and any size.Here i have to devolop an application which is absolutely dynamic. and FTP server from where we are picking files can be either AIX UNIX,HP UNIX or Windows.I have got the idea to transfer the data from one BATCH FTP External system to another(as i have described earlier) but it is completely depends on Size of the input file from source FTP External System.Here my problem is File can be unlimited size.In that case it is not good approach to archive the entire file in Local system using Batch Local because it may leads to 'Out of Memory Exception'.
Thanks in advance...

Regards,
Naveen

Posted by Naveen on November 24, 2008 at 12:34 PM EST #

Hello, Nareen.

You get 'Out of Memory' exceptions form loading large payloads into memory, as you normally would with Batch eWay _NOT_ in streaming mode. In streaming mode the file is transferred from the remote server to the local file system in such a way that at no point in time is the entire file, or even a significant part of it, loaded into memory. As I write in the blog, I transferred payloads ranging from a few K to over 1GB without a visible change in Application Server memory utilization. I would have been able to transfer files several Gigabytes in size had I had the disk space on my laptop :-) It is the disk space you need to accommodate the file in teh local staging file system, not memory.

Regards

Michael

Posted by Michael on November 24, 2008 at 12:48 PM EST #

Yeah Michael.what ever u told is right.But in the dynamic environment we may not expect local staging file system
should have that enough disk space.Because file sizes can be in Tera Byte.And we can deploy this application any where.
otherwise i could have used like this.
byte[] b=otdLocalStaging.getConfiguration().getPayload();
int size=b.length();
And can divide the Payload based on the Payload size.

Thanks and Regards,
Naveen

Posted by Naveen on November 24, 2008 at 01:03 PM EST #

Hello, Nareen.

Before you can issue: .getPayload() you will have to have the payload in memory via non-streaming get - precisely _NOT_ what you desire to do.

The Java CAPS Book has an example solution which reads a file a buffer-full at a time, using a programmer-controlled buffer. This solution does not suffer from the limitation of the file having to be multiple buffer size or having to be delimited.

Review section 2.6, Data Streaming, in Part II of the Java CAPS Book, specifically Listing 2-44. This may or may not help you.

It sounds like a very strange requirement - transferring files, terabytes in size, over a network. I think solution architecture review is in order.

Regards

Michael

Posted by Michael on November 24, 2008 at 01:19 PM EST #

Hi,
Could anyone please tell me how to calculate the size of the file from Batch FTP External System with out reading the contents from the file? The Batch FTP External System can be either in HP UNIX,AIX UNIX or Windows environment.

Posted by NaveenPRP on December 03, 2008 at 01:50 PM EST #

Hi michael,
This is regarding Streaming.My requirement is first i need to stream from BatchFTP to BatchLocal and Then BatchLocal to BatchFTP.i.e

BatchFTP1--->BatchLocal
BatchLocal--->BatchFTP2

Is there any way in which we can transfer only a part of data(Chunk of data) from BatchFTP to BatchLocal and then BatchLocal to BatchFTP and then next chunk and goes on...
i.e.
Chunk of data Chunk of data
BatchFTP1--------------------->BatchLocal------------------------>BatchFTP2
.
.
.
.

Chunk of data Chunk of data
BatchFTP1--------------------->BatchLocal------------------------>BatchFTP2
Till the end of data from BatchFTP1.
So that i can transfer entire data from BatchFTP1 to BatchFTP2 irrespective of filesize.

Awaiting for your reply.

Posted by naveenprp on December 11, 2008 at 08:08 AM EST #

Hello, Naveen.

I am afraid I don't know of a way to do what you describe.

Regards

Michael

Posted by Michael Czapski on December 11, 2008 at 08:27 AM EST #

Hi Michael,
Is it possible to transfer a file from source FTP system to Destination FTP system without using local Batch FTP..?If not,why it is possible can you explain..?

Posted by Nagireddy Patil on May 28, 2009 at 10:58 AM EST #

Hello, Nagireddy.

Using Batch FTP eWay or Batch FTP JCA Adapter it is possible to transfer a file between two remote FTP systems but only if the entire payload can be kept in1 memory during transfer. This severely limits the size of the payload that can be so transferred. The basic idea is that inbound Batch FTP eWay/JCA Adapter reads the content of the remote file (it is represented as a byte array) and writes it to the remote FTP server using another instance of the Batch FTP eWay/JCA Adapter.

Does this help?

Regards

Michael

Posted by Michael Czapski on May 28, 2009 at 11:50 AM EST #

Hello Michael,
Thanks for your reply.
I understand the memory constraints involved in transferring a file between two Batch FTP systems without using data streaming. I wanted to know if streaming could be achieved without using/involving the Batch Local File, i.e., instead of streaming data to Batch Local File can't we use a Batch FTP?
Is it possible to set Batch FTP OTD instance’s OutputStreamAdapter to the value of another Batch FTP OTD instance’s OutputStreamAdapter???

Thanks and regards,
Nagireddy.

Posted by Nagireddy Patil on May 28, 2009 at 12:22 PM EST #

Hello, Nagireddy.

Alas, no. This question popped up before. The only possibility I can think of, and I have not tried this, is to use Batch FT -> Batch Record -> Batch FTP. That would require the payload to be breakable into records either on delimiters or on a fixed size boundaries. If you payload is of this kind perhaps you can give it a try and post a comment here.

Regards

Michael

Posted by Michael Czapski on May 28, 2009 at 12:37 PM EST #

Hello Michael,

Thanks for your reply,As you told i will try with Batch Record if it sucesseds then i will definatly post it..

Thanks
Nagireddy Patil

Posted by Nagireddy Patil on May 28, 2009 at 01:01 PM EST #

Hi Michael,

Thanks for the sample project.

I had imported your project and added Scheduler eWay to trigger the col.
it work fine if I define file name in Target Location cBatchFTP eways connector properties. When I changed this properties to:
Target File Name: [Sc]CR.\*\\rtf
target File Name is Pattern : Yes

got exception error

message=[BATCH-MSG-M0172: FtpFileClientImpl.get(): No qualified file is available for retrieving.].Nested exception follows: ---
java.io.FileNotFoundException: BATCH-MSG-M0172: FtpFileClientImpl.get(): No qualified file is available for retrieving.--- End of nested exception.

I also added following codes to jcdFTPtoLocalFile:

vFTPIn.getConfiguration().setTargetFileName(vFTPIn.getClient().getResolvedNamesForGet().getTargetFileName());
vFTPIn.getClient().get();

Regards

Kevin

Posted by kevin on July 30, 2009 at 07:30 AM EST #

Hello, Kevin.

The error "No qualified file is available for retrieving" means that your regular expression does not match any files.

The regular expression you are using for the target file name does not seem right. Try testing your regular expression with one of the online regex testing sites, for exmaple, http://www.regexplanet.com/simple/index.jsp, to see if your returns the kinds of file names you are expecting. I don't knwo what kinds of file names you need to match so I can not suggest a regular expression that would work for you.

Regards

Michael

Posted by Michael Czapski on July 30, 2009 at 08:04 AM EST #

Hi Michael

Thanks for your reply

I had fixed the issue by changing the following

1. added $ to file pattern [Ss]CR.\*\\.rtf$
2. in jcdFTPtoLocalFile changed setTargetFileNamePattern to true
vFTPIn.getConfiguration().setTargetFileNameIsPattern( true );
Regards

Kevin

Posted by Kevin on August 03, 2009 at 06:06 AM EST #

Hi Michael,
Thanks for the good example.
I wonder how we stream more than one file?

In the payload-based approach, we can do:

BatchLocalFile_in.getClient().getIfExists();
while (BatchLocalFile_in.getClient().getPayload() != null) {
BatchFTP_out.getClient().setPayload( BatchLocalFile_in.getClient().getPayload() );
BatchFTP_out.getClient().put();
BatchLocalFile_in.reset();
BatchLocalFile_in.getClient().getIfExists();

}

I am not sure how to do similar coding in streaming approach?

Thanks.
lt

Posted by las on October 15, 2009 at 02:03 AM EST #

Hello, lt.

The entry "Getting Hundreds of Files using Batch Local File eWay in Java CAPS 6", at http://blogs.sun.com/javacapsfieldtech/entry/getting_hundreds_of_files_using, discusses how multiple file can be rocessed, one after another, in a single collaboration.
This entry discusses how a file content can br streaed from a source to a destinaiton. Putting the two together should address your requirement.

Regards

Michael

Posted by guest on October 15, 2009 at 08:27 AM EST #

Hi Michael,
Thanks for your quick reply.

I have a question, when is the content of the file loaded into memory?
right after:

G_BLFIn.getClient().get();

or right after:

G_BLFIn.getClient().getPayload();

My guessing is right after: G_BLFIn.getClient().get(), and if it is then combine the two approaches does not help much.

Thanks.
lt

lt

Posted by las on October 16, 2009 at 01:40 AM EST #

The following JCD will stream multiple files, identified by a name pattern, from a local file system to a remote FTP server, as fast as it can, until it runs out of files. The name pattern and source directory are configured in the connectivity map. The target FTP server and directory are configured in the connectivity map.

Beware, the JCD opens and closes connections for each file. If there is a firewall between the app server and teh FTP server it may consider this a DoS attack and refuse to allow connections.

package Stream100sLocal2FTP;

import com.stc.eways.batchext.LocalFileException;
import com.stc.eways.common.eway.standalone.streaming.StreamingException;
import com.stc.eways.batchext.BatchException;
import java.io.FileNotFoundException;

public class jcdFilesProcessor
{
long lStartMillis = System.currentTimeMillis();
int iTimeoutMillis = 40 \* 60 \* 1000;

public com.stc.codegen.logger.Logger logger;

public com.stc.codegen.alerter.Alerter alerter;

public com.stc.codegen.util.CollaborationContext collabContext;

public com.stc.codegen.util.TypeConverter typeConverter;

public void receive( com.stc.connectors.jms.Message input, com.stc.eways.batchext.BatchLocal G_BLFIn, com.stc.eways.batchext.BatchFtp vFTPOut )
throws Throwable
{
;
long lNow = System.currentTimeMillis();
java.util.Date dtNow = new java.util.Date( lNow );
logger.debug( "\\n===>>> Received trigger " + input.getTextMessage() + " at " + lNow + ", " + dtNow );
;
int i = 0;
boolean blMore = true;
com.stc.eways.common.eway.standalone.streaming.InputStreamAdapter isa = null;
;
while (blMore) {
try {
// log current file
//
logger.debug( "\\n===>>> got file " + ++i + " " + G_BLFIn.getClient().getResolvedNamesToGet().getTargetFileName() );
;
vFTPOut.getConfiguration().setTargetFileName(G_BLFIn.getClient().getResolvedNamesToGet().getTargetFileName());
vFTPOut.getConfiguration().setTargetFileNameIsPattern( false );
;
vFTPOut.getConfiguration().setDataConnectionTimeout( iTimeoutMillis );
vFTPOut.getProvider().setDataSocketTimeout( iTimeoutMillis );
vFTPOut.getProvider().setSoTimeout( iTimeoutMillis );
;
isa = G_BLFIn.getClient().getInputStreamAdapter();
vFTPOut.getClient().setInputStreamAdapter( isa );
vFTPOut.getClient().put();
;
// prepapre for next file
//
isa.releaseInputStream(true);
;
if (!vFTPOut.reset()) {
logger.error( "\\n===>>> Failed to reset FTP" );
throw new Exception( "Failed to reset FTP" );
}
if (!G_BLFIn.getClient().reset()) {
logger.error( "\\n===>>> Failed to reset Local File" );
throw new Exception( "Failed to reset Local File" );
}
} catch ( com.stc.eways.batchext.BatchException be ) {
//
// File Not Found is expected and benign
// That exception is so deeply nexted that the following code
// is needed to determine if this is what cause the excetion
//
logger.debug("\\n===>>> Nexted Exception: " + be.getNestedException().getClass().getName());
if (be.getNestedException() instanceof FileNotFoundException) {
FileNotFoundException lfe4 = (FileNotFoundException) be.getNestedException();
logger.error( "\\n===>>> Ignoring expected File Not Found Exception: " + lfe4.getMessage() );
blMore = false;
} else {
logger.error( "\\n===>>> Unexpected BatchException:" + be.getClass() + "\\n", be );
blMore = false;
}
} catch ( Exception e ) {
logger.debug("\\n===>>> Exception name: " + e.getClass().getName());
logger.error( "\\n===>>> Exception getting file " + e.getCause() + "\\n", e );
blMore = false;
}
}
}

}

Posted by Michael Czapski on October 16, 2009 at 06:40 AM EST #

Thanks Michael! It works great.

lt

Posted by las on October 17, 2009 at 12:54 AM EST #

Where can I get the attachments to this blog post? The URLs do not work.

Thanks.

Posted by FrankCSC on March 24, 2011 at 08:17 PM EST #

Hello, Frank.

blogs.sun.com is being decomissioned, as far as I am aware. mediacast.sun.com is already gone.
I moved all my blog articles and the related archives to my own blog site, blogs.czapski.id.au. The old articles are there with teh same titles. This article is there as well at http://blogs.czapski.id.au/2008/04/java-caps-5-1-and-java-caps-6-streaming-large-ftp-transfers

Regards

Michael

Posted by Michael Czapski on March 25, 2011 at 01:53 AM EST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

In this Blog I post abstracts of articles / writeups / notes on various aspects of Java CAPS and SOA Suite including solutions, discussions and screencasts. The links to the referenced material are included in the bodies of the abstracts.

Search

Categories
Archives
« July 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  
       
Today