TPT 14.10 output to named pipe and then gzip to final files - response (2) by feinholz
TPT 14.10 output to named pipe and then gzip to final files - response (3) by ericsun2
Oh, I have an old wrapper shell script to compress TPT output. Just upgrade to 14.10 TTU lately, it did not work anymore.
How to WRITE to *.gz directly? I tried to specify FileName = xxx.gz but the output file can't be read by gzip.
$ cat tpt/export.fact_abc_segment.jobvars UserName='tpt_reader' UserPassword='tpt_password' TdpId='tddev' TechnicalSubjectArea='segment' DataFilePath='/mnt/data/fastload_local/FACT_ABC_SEGMENT' DataFileName='fact_abc_segment.gz' SourceTableName='FACT.FACT_ABC_SEGMENT'
Output files look like:
$ ls -1 fact_abc_segment.gz-1 fact_abc_segment.gz-2 fact_abc_segment.gz-3 fact_abc_segment.gz-4 fact_abc_segment.gz-5 fact_abc_segment.gz-6 fact_abc_segment.gz-7 fact_abc_segment.gz-8 $ gzip -d -c fact_abc_segment.gz-2 gzip: fact_abc_segment.gz-2: not in gzip format
The file content looks like uncompressed fastload formatted indicator file:
$ hexdump -C -n 128 fact_abc_segment.gz-1 00000000 12 00 00 dd 42 11 00 e5 4b f0 03 00 00 00 00 7a |....B...K......z| 00000010 00 00 00 04 0a 12 00 00 dd 42 11 00 32 8e 2d 12 |.........B..2.-.| 00000020 00 00 00 00 7a 00 00 00 04 0a 12 00 00 dd 42 11 |....z.........B.| 00000030 00 99 cd 8b 12 00 00 00 00 7a 00 00 00 04 0a 12 |.........z......| 00000040 00 00 dd 42 11 00 c6 d6 38 01 00 00 00 00 7a 00 |...B....8.....z.| 00000050 00 00 04 0a 12 00 00 dd 42 11 00 3e 6f de 03 00 |........B..>o...| 00000060 00 00 00 7a 00 00 00 04 0a 12 00 00 dd 42 11 00 |...z.........B..| 00000070 7e e5 f7 03 00 00 00 00 7a 00 00 00 04 0a 12 00 |~.......z.......|
TPT 14.10 output to named pipe and then gzip to final files - response (4) by ericsun2
Hi Steven,
The original goal is to generate *.gz files directly to save disk space and transportation network overhead later. Generally, *.gz files are 3~5x smaller than the plain FASTLOAD output.
Here is the TPT template:
USING CHARACTER SET UTF8 DEFINE JOB EXPORT_TO_FASTLOAD_FORMAT DESCRIPTION 'Export from ' || @SourceTableName || ' to the INDICDATA file: ' || @DataFileName ( DEFINE SCHEMA DATA_FILE_SCHEMA ( "DATE_ID" IntDate, "MEMBER_ID" BigInt, "CUSTOM_SEGMENT_ID" Int, "PRIORITY" ByteInt ); DEFINE OPERATOR EXPORT_OPERATOR TYPE EXPORT SCHEMA DATA_FILE_SCHEMA ATTRIBUTES ( VARCHAR PrivateLogName = @SourceTableName || '_log', VARCHAR TdpId = @TdpId, VARCHAR UserName = @UserName, VARCHAR UserPassword = @UserPassword, VARCHAR QueryBandSessInfo = 'Action=TPT_EXPORT; Format=Fastload;', VARCHAR SpoolMode = 'noSpool', INTEGER MaxDecimalDigits = 18, VARCHAR DateForm = 'INTEGERDATE', VARCHAR SelectStmt = 'select * from ' || @SourceTableName ); DEFINE OPERATOR FILE_WRITER TYPE DATACONNECTOR CONSUMER SCHEMA * ATTRIBUTES ( VARCHAR PrivateLogName = 'indicdata_writor_log', VARCHAR DirectoryPath = @DataFilePath, VARCHAR FileName = @DataFileName, VARCHAR Format = 'Formatted', VARCHAR OpenMode = 'Write', VARCHAR IndicatorMode = 'Y' ); APPLY TO OPERATOR (FILE_WRITER[@DataFileCount]) SELECT * FROM OPERATOR (EXPORT_OPERATOR[@NumOfReader]); );
Thank you so much for the help!
Zip/GZip Support in TTU 14? - response (3) by ericsun2
TDCH can read from compressed HDFS files such as part-*.gz and part-*.deflate directly. "internal.fastlaod" is only subject to the session limit of workload management.
TPT 14.10 output to named pipe and then gzip to final files - response (5) by ericsun2
It seems that gzip file can be generated if I write to a single outpuf file with ".gz" file extension.
But I need to generate multiple *.gz files, in order to:
- utilize multiple CPU cores to compress the data stream
- gzip is quite CPU instensive, a single gzip process can easily use up 100% of a core
- while load via TPT's multi-readers, multiple *.gz files are more efficient than single *.gz file
- all readers don't have to read from the beginning of the same big *.gz file
TPT used to allow writing to named pipes on Linux, so we can launch multiple gzip process in the background to compress the data stream.
TPT 14.10 output to named pipe and then gzip to final files - response (6) by ericsun2
┌nmon─12c──────[H for help]───Hostname=xxxx-xxxxx───Refresh= 2secs ───21:03.35─────────────────────────────────────────┐ │ CPU Utilisation ─────────────────────────────────────────────────────────────────────────────────────────────────────│ │ +-------------------------------------------------+ │ │CPU User% Sys% Wait% Idle|0 |25 |50 |75 100| │ │ 1 1.0 6.1 0.5 92.4|sss > │ │ 2 2.0 2.0 3.0 93.1|W > │ │ 3 0.5 0.5 0.5 98.5| > │ │ 4 1.0 2.0 0.0 97.0| > │ │ 5 0.0 0.6 57.4 42.0|WWWWWWWWWWWWWWWWWWWWWWWWWWWW > │ │ 6 3.0 1.0 4.5 91.5|UWW > │ │ 7 0.0 0.0 0.0 100.0| > │ │ 8 1.0 0.5 0.0 98.5| > │ │ 9 100.0 0.0 0.0 0.0|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU> │ │10 2.5 0.5 0.0 97.0|U > │ │11 0.0 0.0 0.0 100.0| > │ │12 0.5 0.0 0.0 99.5| > │ │13 0.5 0.5 0.0 99.0| > │ │14 3.0 0.5 96.5 0.0|UWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW> │ │15 1.2 2.5 1.2 95.1|s > │ │16 5.2 5.2 2.1 87.6|UUssW > │ │17 0.0 1.1 0.0 98.9| > | │ │18 3.0 4.0 1.5 91.6|Us > │ │19 0.0 0.5 0.0 99.5| > | │ │20 3.0 1.5 0.0 95.5|U > │ │21 0.0 0.0 0.0 100.0| > │ │22 0.0 1.6 0.0 98.4| > | │ │23 0.0 0.0 0.0 100.0| > | │ │24 0.0 0.0 0.0 100.0| > | │ │ +-------------------------------------------------+ │ │Avg 4.8 1.1 7.7 86.5|UUWWW > | │ │ +-------------------------------------------------+ │ │ Network I/O ─────────────────────────────────────────────────────────────────────────────────────────────────────────│ │I/F Name Recv=KB/s Trans=KB/s packin packout insize outsize Peak->Recv Trans │ │ lo 0.0 0.0 0.0 0.0 0.0 0.0 14.7 14.7 │ │ eth0 26250.0 83880.6 46598.8 59032.9 576.8 1455.0 749912.6 290677.8 │ │ eth1 0.1 0.1 1.5 0.5 60.0 179.0 1.8 0.1 │ │ bond0 26250.1 83880.7 46600.3 59033.4 576.8 1455.0 749914.4 290677.8 │ │──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────│
nmon output to illustrate when TPT is writing to a single *.gz file without using named pipe. One of the cores is maxed out, but Teradata server has a lot of potential to pump more data to the client machine.
TPT 14.10 output to named pipe and then gzip to final files - response (7) by feinholz
You can specify multiple instances of the DataConnector operator (as the file writer) and use the -C option on the command line. TPT will round-robin the data to each instance and each instance will write to its own file.
This will help you generate multiple .gz files (all should be roughly the same size).
Zip/GZip Support in TTU 14? - response (4) by feinholz
TPT's implementation with TDCH will most likely not support .gz files.
The TPT implementation of the HDFS API will support .gz files.
TPT 14.10 output to named pipe and then gzip to final files - response (8) by ericsun2
Hi Steven,
When I "APPLY
TO
OPERATOR (FILE_WRITER[@DataFileCount])
" and use -C to generate multiple output files, all the files will have ***.gz-n as extension, but the actual file contents are the uncompressed Formatted Fastload instead of compressed Formatted Fastload.
However, when I output to a single file, the content is indeeded compressed with gzip codec.
Is this a bug or intended behavior?
How to Run Fast Export - forum topic by Jugalkishorebhatt1
Hello All,
I am trying to run the below script in fxp:
.LOGTABLE DB.errors
.LOGON jugalDB/JBhatt,jugal
.BEGIN EXPORT
.EXPORT OUTFILE samples111
Sel top 10 * from db.table1
.END EXPORT
.LOGOFF
In the Interactive Mode i gave it as below:
fxp < fxpExample.txt
I am getting an error. This is the 1st time i am learning about FastExport. Please help me how to proceed.. THanks
SELECT statement in a TPT EXPORT script - forum topic by terasum
Hi,
I am trying to export data from a sql uery to a file using TPT Export operator.
My Sql query is running fine in sql assistant but when it is run through TPT it is giving non-existent syntax errors.
I have used two single quotes wherever I have one single quote in my query.
My requirements:
1) Want a TAB delimited file
2) Union of two sql queries is to be given as input
Query Snippet:
############################################
VARCHAR SelectStmt = '
seleSELECT
c1,c2,
CASE WHEN c3 IS NOT NULL AND c4<>0 THEN c4
ELSE c5
END AS AGE
from tablea
join tableb on a.c1=b.c1
union
sel c1,c2,c4 from tableg
where c2=''abc'' ;'
############################################
Error: something is expected between c5END and AS
Thanks,
How to Run Fast Export - response (1) by dnoeth
You should always specify which error you get.
FastExport commands must start with a period. Both Fexp and SQL commands must be terminated by a semicolon:
.LOGTABLE DB.errors; .LOGON jugalDB/JBhatt,****; .BEGIN EXPORT; .EXPORT OUTFILE samples111; Sel top 10 * from db.table1; .END EXPORT; .LOGOFF;
How to Run Fast Export - response (2) by Adeel Chaudhry
Dieter's example should work, otherwise whats the error? Something to do with HOSTS file?
How to Run Fast Export - response (3) by Jugalkishorebhatt1
Hi Dnoeth,
I had forgot to include the semicolon while posting this issue. Sorry for that.
While runnin the script in my PC i had included period in the beginnin and a semicolon in the end of the code.
I got the answer from google.
fxp < /home/jugal/fexport.txt
TPT 14.10 output to named pipe and then gzip to final files - response (9) by feinholz
Yup. Found a bug.
The DC operator is looking at the .gz-x extension and not the .gz extension (prior to appending the instance number).
Plus, putting the instance number on the file extension means the user would have to rename their files prior to loading. Not ideal. We will have to fix that too.
Thanks for your patience.
TPT 14.10 output to named pipe and then gzip to final files - response (10) by ericsun2
Hi Steven,
It will be great for TPT to generate files like below (instead of abc.gz-3, abc.gz-12, ...)
- abc.1.gz
- abc.3.gz
- abc.8.gz
I will look forward to a fix in 14.10.00.07+
So for now, would you be able to comment on named pipe usage?
TPT 14.10 output to named pipe and then gzip to final files - response (11) by feinholz
Not sure on the exact naming convention yet.
We have to be careful with backwards compatibility issues.
What specifically do you want to know about named pipes?
Are you using the named pipe access module?
Studio Express installer can't find java - response (4) by sizormohanty
Just selected C:\Program Files\Java\jre7 and it worked fine
Thank You Ramesh
TPT 14.10 output to named pipe and then gzip to final files - response (12) by feinholz
Pipe issues resulting in "getpos" errors was fixed in 14.10.00.003 and 14.00.00.011.
TPT 14.10 output to named pipe and then gzip to final files - response (13) by ericsun2
named pipe access module can be used for READER only. it does not work with WRITER.
my 14.10.00.02 clearly has problem of "getpos" against named pipe. I will ask DBA to get the latest TTU soon.
Thanks a lot, Steven. You are always always prompt and helpful!
What are you trying to accomplish here?
TPT can both read and write gzip files.
Might be easier to do that than to use pipes.