Data exchange package
In order to effect some kind of cooperation or exchange of services, either in treatment or use of information, as in the case of BIREME, SciELO, a secure data exchange needs to be addressed, resulting in thepackage exchange of data.
As experience has shown that the simple use of FTP (protocol and tool) does not guarantee that the data has the desired quality after his journey through the media, sometimes it would be best to continue, or complete certain processes with a outdated data set, but not corrupt.
- 1 Objectives
- 2 Required Resources and Technologies
- 3 General Boundaries
- 4 Applicable standards
- 5 Applicability
- 6 Basic Protocol Description
- 7 User Interface
- 8 Development Plan
- 8.1 Step 1 - Send Data Command
- 8.2 Step 2 - Receive Data Command
- 8.3 Step 3 - Complementary Activities
- 9 Deployment and Configuration
- 10 Conclusion
- Facilitate the independence of human intervention in the exchange of data between BIREME and SciELO
- Ensure fault-free data exchange
- Provide the supply of data even in the absence of connectivity
- Define a protocol for data exchange between entities
Required Resources and Technologies
- With active Internet connection to query the URL http://homolog.webservices.scielo.org/scieloorg/_design/couchdb/_view/network, reporting instances SciELO and their status;
- Own FTP server for each institution (SciELO and BIREME);
- Disk space in servers at least 100G bytes;
- Linux Operating Environment (bash shell version 3 or higher);
- Minimum Package: tar, gzip, md5sum; CISIS *; python (with JSON module).
* CISIS the utilities 'taste' that appears necessary in the directory 'path' of the machine
- Human or automatic operation (by crontab)
- Send and Receive Routines separated (forming a package)
- Configuration files (like plain text)
- Possibility of using argument call
- Auto-tuning operation by sensing the environment (adding / deleting data directories)
Apply the standards of internal development of shell-scripts, provide calls to the mechanism of LOG (source log or . Log) and interruption of flow for failure / runtime error, and standardized in the header template file in the directory of existing shell miscellaneous path /usr/local/BIREME/misc
This package can be used in two situations of data exchange as follows:
- Data exchanges between BIREME and SciELO;
- SciELO exchange of data between your network and components.
To do so just make adjustments in the configuration file mentioned above.
Currently is employed to make part of Envia2Medline.bat described here and will be replaced by envia2ORG.sh, which will be part of the distribution package.
Basic Protocol Description
The protocol for sending and receiving data packets being implemented aims to ensure the delivery of data from sender to receiver, so that if some malfunction occurs in the process, and it is possible to detect it, the last data packet successfully sent will be delivered in place of the current, thereby ensuring operation of processes which include the reception of data by the use of this protocol.
For this purpose a sequence of actions should be performed on each side of the transmission medium. These are tasks the sender:
- Check if the destination directory exists;
- If there is to create it;
- Test if target directory in the traffic "in-use" is active;
- If it is, to test for this condition "X" (configurable) times;
- If you are still "IN-USE" terminates execution with error output;
- If it is, to test for this condition "X" (configurable) times;
- Delete (if any) the traffic light "DATA READY";
- For each item on the "SEND LIST" generate a MD5;
- Package the components of the mailing list with MD5 files generated in a compressed tar-ball (tgz);
- Write the compressed tar-ball in the directory of the ftp server;
- Turn on the lights "DATA READY";
- End of job submission.
These are tasks of the receiver:
- Test whether the directory exists expected to receive the data (reported by calling parameter);
- If there ends execution with error output;
- Test if there is light "DATA-READY" active directory
- If not, test for this condition to "Y" (configurable) times;
- If there are no ends to continue execution with error output;
- If not, test for this condition to "Y" (configurable) times;
- Turn lights "IN-USE"
- Eliminates the flag "DATA READY" (not implemented by having two data receivers);
- Le compressed tar-ball available in the directory;
- After reading the compressed tar-ball, eliminating the traffic "IN-USE";
- Unzip and open the tar-ball;
- Test the MD5 of each component received;
- Those who have OK result should be sent to the ftp server in the subdirectory specific to the data buffer;
- Those with Nok result should be eliminated and in lieu of the corresponding subdirectory found in the data buffer on the server of ftp.
How to call the command line options and can be used in so-called parameters of both the sending and reception of data, as shown in the following syntax:
envia2.sh [[-h|--help]|[-v|--version]|[-c config_file]] <Sigla3_SciELO> <arquivo_controle>
envia2.sh: shell script which performs all the tasks of sending data packets, and calculation of MD5 and packaging compressed tar file
options: -h | --help - help screen to use the command -V | --version - display version of the running query, your date and responsible -c <file> - applies settings file instead of those of the default file (vaivem.conf) parameters: Sigla3_SciELO - (mandatory) one of several three-letter acronyms that identify instances SciELO (Scl = Brazil; spa = Public Health; sss = Social Sciences; SZA = South Africa, ...) arquivo_controle - (mandatory) text file with the names of files to be sent, one per line
recebf.sh [[-h|--help]|[-v|--version]|[-c config_file]] <Sigla3_SciELO>
recebf.sh: shell script which performs all the tasks of receiving the package (s) of data, with unpacking and more unpacking the tar file MD5 conference
options: -h | --help - help screen to use the command -V | --version - display version of the running query, your date and responsible -c <file> - applies settings file instead of those of the default file (vaivem.conf) parameters: Sigla3_SciELO - (mandatory) one of several three-letter acronyms that identify instances SciELO (Scl = Brazil; spa = Public Health; sss = Social Sciences; SZA = South Africa, ...)
The development of a pair of routines will be divided into three steps:
- Step 1 - Send Data Command;
- Step 2 - Receive Data Command;
- Step 3 - Additional Activities.
Step 1 - Send Data Command
The development of the command to send data had three (3) phases (each with its battery of functional tests):
- Phase 1 - implementation of ancillary functions;
- Phase 2 - basic core packing;
- Phase 3 - Automation of transmission.
Phase 1 - Implementation of ancillary functions
Here were implemented interpretation capabilities of options, or use of a configuration file other than the standard default (vaivem.conf), showing the use of help command, besides the player version.
Phase 2 - Core Basic Packaging
This phase is the interpretation of the so-called two parameter (File Control List 'arquivo_controle') indicates that the pieces of data to be sent, tests the availability of parts, calculates the MD5 of each, package and compress the data set .
Phase 3 - Automation Submission
During this phase included the interpretation of a parameter (a) of the call (instance target) FTP connection, sending the data packet, and guaranteed placement on the server.
On 29 June 2011 was given as completed the development of routines for sending data over the test set with the routines of receipt.
(On 20 June 2011 was given as completed the development of routines related to sending data, leaving pending the completion of the development of routines to start receiving massive functional tests (stress test).)
Step 2 - Receive Data Command
The development of the command to send data had five (5) phases (each with its battery of functional tests):
- Phase 1 - interpretation of options;
- Phase 2 - defining instance;
- Phase 3 - reception of packets;
- Phase 4 - unpacking the data;
- Stage 5 - activation of reserve data and exit signs.
Phase 1 - Interpretation Options
Here were implemented (for playing the shipping package, as the differences due) the capacity of interpretation of options, ie use of a configuration file other than the standard default (vaivem.conf), showing the use of help command besides the player version.
Phase 2 - Scoping Instance
This phase is the interpretation of the parameter of the call (Sigla3_SciELO) which limits the reception of data packets on the specified instance call.
Phase 3 - Reception Packages
During this phase included the receipt of the package instance assessed as due in Phase 2. All the signs and treatment of traffic lights was contemplated here.
Phase 4 - Unpacking Data
At this stage checks were provided with successful decompression of data, conference code MD5 and compared with data from the last set received successfully.
Phase 5 - Activation of Reserve Data and Signaling Output
According to Phase 4 to determine the data packet as valid, they are saved in the reserve and the successful return code is provided. If on the other side Step 4 determines the data packet as not valid the reservation will be taken there the last set and recorded with an error return code is provided.
On 29 June 2011 was given as completed the development of routine reception and battery of tests performed in conjunction with the routines for sending data.
(On 21 June 2011 will start the development of routines for receiving data for subsequent modular test functional testing and then mass together with the routines of shipping.)
Step 3 - Complementary Activities
The development of complementary activities (mounting calls pre-configured, additional documentation, packaging, general test on the machines 'target', etc.). Resulted in the following calls for end use:
- Bir2sci.sh - sender data in order BIREME -> SciELO
- Envia2ORG.sh - replacement of the current Envia2Medline.bat
- Envia2.sh - generic data sender configured by default to the sense SciELO -> Network
- Recebf.sh - recipient of donor generic
- Seguro.sh - Encoder password file
- Segura.sh - decoding the password file for database CDS-ISIS (for verification)
Also resulted in README.1STdocumentthat describes the technical components of the package exchange data via FTP, and finally the file compressed tar-balltr_x.tgz that contains all the necessary parts and components to implement the package.
Deployment and Configuration
The statements contained herein are minimal and are not intended to replace a technical manual for the package, but rather serve as a guide to place the package in use.
By taking the tar-ball (tr_x.tgz) of this package should beopenand its components should be transferred to the directory to use, in accordance with the routines practiced inOperation of Sources, typically , a tpl.xxx. In principle, all components must be in the same directory and can not (without customization of shell-scripts') be separated into different directories.
The only operation to be performed that resembles a plant (also referred to the copy) is to recreate aMaster-File(M / F) of the database gizmo calledgunians' that should be accomplished with theutilityof CISIS id2i as shown in the following command (assuming that the utilities are on CISISpathof running the machine):
create = id2i gunians.id gunians
The package configuration is simple and based on a plain text file (plain textfile), following the style of the configuration file used in Apache, where we have a statement (statment) associating an end to value by an equal sign (= TERM value). Comments can permeate the file as a form of documentation if desired, but are not required, as well as the statements themselves are optional assuming, in case of absence,defaultvalues previously programmed in shell-scripts.
|Term||Explanation||Default value||Limits / Scope|
|PY||Allows the use of lescielos.py (faster)||(Undeclared)||TRUE or undeclared|
|HOSTSERVER||Identification of the server command execution||SciELO||BIREME / SciELO / NETWORK|
|TXTIMO||Maximum number of attempts to control the transmission channel||31||Integer value greater than zero|
|RXTIMO||Maximum number of attempts for data-ready in the receive channel||6||Integer value greater than zero|
|TXSERVER||FTP server URL used in the transmission channel||ftp.scielo.br||Qqr valid FTP URL|
|RXSERVER||URL of the FTP server used to receive channel||ftp.scielo.br||Any valid FTP URL|
|TXUSER||Username to connect to the sendng FTP server||usr.bireme||Any valid username|
|TXPASS||Password user's FTP server sending||123deoliveira4||Any valid password|
|RXUSER||Username to connect to the receiving FTP server||joao.silva||Any valid username|
|RXPASS||Password FTP server user reception||b!r3n3||Any valid password|
|USER||Username to connect to the FTP server (sender and receiver)||jane.Doe||Any valid username|
|PASS||Password user's FTP server (sender and receiver)||#s3nh@f0rt3!||Any valid password|
Contained in the package are eight (8) configuration files, identified by the extension .Conf (not compulsory) that can / should be taken as examples to create other configuration files, use the package exchange data via FTP.
Below is shown the contents of the configuration file vaivem.conf, that the file is assumed in the absence of other specific indication, by using the option -c in the call the shell-script sending or receiving data:
# Category of server serving data exchange (TX or RX) HostServ = SciELO # Time-out for trying to send with 'channel' busy in seconds TXTIMO = 31 # Time-out for reception in an attempt to 'channel' unavailable in seconds RXTIMO = 6 # Address of the FTP server to be used TXSERVER = ftp.scielo.br RXSERVER = ftp.scielo.br
At the end of the development cycle will be sent an email to "list OFI" giving science of task completion and early life of the product development to their inclusion in the systematic versioning of the institution.