Data exchange package

From Wiki.bireme.org/en
Jump to: navigation, search

In order to effect some kind of cooperation or exchange of services, either in treatment or use of information, as in the case of BIREME, SciELO, a secure data exchange needs to be addressed, resulting in thepackage exchange of data.

As experience has shown that the simple use of FTP (protocol and tool) does not guarantee that the data has the desired quality after his journey through the media, sometimes it would be best to continue, or complete certain processes with a outdated data set, but not corrupt.


Objectives

  • Facilitate the independence of human intervention in the exchange of data between BIREME and SciELO
  • Ensure fault-free data exchange
  • Provide the supply of data even in the absence of connectivity
  • Define a protocol for data exchange between entities

Required Resources and Technologies

  • With active Internet connection to query the URL http://homolog.webservices.scielo.org/scieloorg/_design/couchdb/_view/network, reporting instances SciELO and their status;
  • Own FTP server for each institution (SciELO and BIREME);
  • Disk space in servers at least 100G bytes;
  • Linux Operating Environment (bash shell version 3 or higher);
    • Minimum Package: tar, gzip, md5sum; CISIS *; python (with JSON module).
* CISIS the utilities 'taste' that appears necessary in the directory 'path' of the machine

General Boundaries

  • Human or automatic operation (by crontab)
  • Send and Receive Routines separated (forming a package)
  • Configuration files (like plain text)
  • Possibility of using argument call
  • Auto-tuning operation by sensing the environment (adding / deleting data directories)

Applicable standards

Apply the standards of internal development of shell-scripts, provide calls to the mechanism of LOG (source log or . Log) and interruption of flow for failure / runtime error, and standardized in the header template file in the directory of existing shell miscellaneous path /usr/local/BIREME/misc

Applicability

This package can be used in two situations of data exchange as follows:

  1. Data exchanges between BIREME and SciELO;
  2. SciELO exchange of data between your network and components.

To do so just make adjustments in the configuration file mentioned above.

Currently is employed to make part of Envia2Medline.bat described here and will be replaced by envia2ORG.sh, which will be part of the distribution package.

Basic Protocol Description

The protocol for sending and receiving data packets being implemented aims to ensure the delivery of data from sender to receiver, so that if some malfunction occurs in the process, and it is possible to detect it, the last data packet successfully sent will be delivered in place of the current, thereby ensuring operation of processes which include the reception of data by the use of this protocol.

For this purpose a sequence of actions should be performed on each side of the transmission medium. These are tasks the sender:

  1. Check if the destination directory exists;
    1. If there is to create it;
  2. Test if target directory in the traffic "in-use" is active;
    1. If it is, to test for this condition "X" (configurable) times;
      1. If you are still "IN-USE" terminates execution with error output;
  3. Delete (if any) the traffic light "DATA READY";
  4. For each item on the "SEND LIST" generate a MD5;
  5. Package the components of the mailing list with MD5 files generated in a compressed tar-ball (tgz);
  6. Write the compressed tar-ball in the directory of the ftp server;
  7. Turn on the lights "DATA READY";
  8. End of job submission.

These are tasks of the receiver:

  1. Test whether the directory exists expected to receive the data (reported by calling parameter);
    1. If there ends execution with error output;
  2. Test if there is light "DATA-READY" active directory
    1. If not, test for this condition to "Y" (configurable) times;
      1. If there are no ends to continue execution with error output;
  3. Turn lights "IN-USE"
  4. Eliminates the flag "DATA READY" (not implemented by having two data receivers);
  5. Le compressed tar-ball available in the directory;
  6. After reading the compressed tar-ball, eliminating the traffic "IN-USE";
  7. Unzip and open the tar-ball;
  8. Test the MD5 of each component received;
    1. Those who have OK result should be sent to the ftp server in the subdirectory specific to the data buffer;
    2. Those with Nok result should be eliminated and in lieu of the corresponding subdirectory found in the data buffer on the server of ftp.

User Interface

How to call the command line options and can be used in so-called parameters of both the sending and reception of data, as shown in the following syntax:

Sending Data

 envia2.sh [[-h|--help]|[-v|--version]|[-c config_file]] <Sigla3_SciELO> <arquivo_controle>

envia2.sh: shell script which performs all the tasks of sending data packets, and calculation of MD5 and packaging compressed tar file

options:
 -h | --help    - help screen to use the command
 -V | --version - display version of the running query, your date and responsible
 -c <file>      - applies settings file instead of those of the default file (vaivem.conf)
 
parameters:
 Sigla3_SciELO - (mandatory) one of several three-letter acronyms that identify instances SciELO
                    (Scl = Brazil; spa = Public Health; sss = Social Sciences; SZA = South Africa, ...)
 arquivo_controle - (mandatory) text file with the names of files to be sent, one per line

Receive Data

 recebf.sh [[-h|--help]|[-v|--version]|[-c config_file]] <Sigla3_SciELO>

recebf.sh: shell script which performs all the tasks of receiving the package (s) of data, with unpacking and more unpacking the tar file MD5 conference

options:
 -h | --help    - help screen to use the command
 -V | --version - display version of the running query, your date and responsible
 -c <file>      - applies settings file instead of those of the default file (vaivem.conf)

parameters:
 Sigla3_SciELO - (mandatory) one of several three-letter acronyms that identify instances SciELO
                 (Scl = Brazil; spa = Public Health; sss = Social Sciences; SZA = South Africa, ...)

Development Plan

The development of a pair of routines will be divided into three steps:

  • Step 1 - Send Data Command;
  • Step 2 - Receive Data Command;
  • Step 3 - Additional Activities.

Step 1 - Send Data Command

The development of the command to send data had three (3) phases (each with its battery of functional tests):

  • Phase 1 - implementation of ancillary functions;
  • Phase 2 - basic core packing;
  • Phase 3 - Automation of transmission.

Phase 1 - Implementation of ancillary functions

Here were implemented interpretation capabilities of options, or use of a configuration file other than the standard default (vaivem.conf), showing the use of help command, besides the player version.

Phase 2 - Core Basic Packaging

This phase is the interpretation of the so-called two parameter (File Control List 'arquivo_controle') indicates that the pieces of data to be sent, tests the availability of parts, calculates the MD5 of each, package and compress the data set .

Phase 3 - Automation Submission

During this phase included the interpretation of a parameter (a) of the call (instance target) FTP connection, sending the data packet, and guaranteed placement on the server.

Current Status

On 29 June 2011 was given as completed the development of routines for sending data over the test set with the routines of receipt.

(On 20 June 2011 was given as completed the development of routines related to sending data, leaving pending the completion of the development of routines to start receiving massive functional tests (stress test).)

Step 2 - Receive Data Command

The development of the command to send data had five (5) phases (each with its battery of functional tests):

  • Phase 1 - interpretation of options;
  • Phase 2 - defining instance;
  • Phase 3 - reception of packets;
  • Phase 4 - unpacking the data;
  • Stage 5 - activation of reserve data and exit signs.

Phase 1 - Interpretation Options

Here were implemented (for playing the shipping package, as the differences due) the capacity of interpretation of options, ie use of a configuration file other than the standard default (vaivem.conf), showing the use of help command besides the player version.

Phase 2 - Scoping Instance

This phase is the interpretation of the parameter of the call (Sigla3_SciELO) which limits the reception of data packets on the specified instance call.

Phase 3 - Reception Packages

During this phase included the receipt of the package instance assessed as due in Phase 2. All the signs and treatment of traffic lights was contemplated here.

Phase 4 - Unpacking Data

At this stage checks were provided with successful decompression of data, conference code MD5 and compared with data from the last set received successfully.

Phase 5 - Activation of Reserve Data and Signaling Output

According to Phase 4 to determine the data packet as valid, they are saved in the reserve and the successful return code is provided. If on the other side Step 4 determines the data packet as not valid the reservation will be taken there the last set and recorded with an error return code is provided.

Current Status

On 29 June 2011 was given as completed the development of routine reception and battery of tests performed in conjunction with the routines for sending data.

(On 21 June 2011 will start the development of routines for receiving data for subsequent modular test functional testing and then mass together with the routines of shipping.)

Step 3 - Complementary Activities

The development of complementary activities (mounting calls pre-configured, additional documentation, packaging, general test on the machines 'target', etc.). Resulted in the following calls for end use:

  • Bir2sci.sh - sender data in order BIREME -> SciELO
  • Envia2ORG.sh - replacement of the current Envia2Medline.bat
  • Envia2.sh - generic data sender configured by default to the sense SciELO -> Network
  • Recebf.sh - recipient of donor generic
  • Seguro.sh - Encoder password file
  • Segura.sh - decoding the password file for database CDS-ISIS (for verification)

Also resulted in README.1STdocumentthat describes the technical components of the package exchange data via FTP, and finally the file compressed tar-balltr_x.tgz that contains all the necessary parts and components to implement the package.

Deployment and Configuration

The statements contained herein are minimal and are not intended to replace a technical manual for the package, but rather serve as a guide to place the package in use.

Implementation

By taking the tar-ball (tr_x.tgz) of this package should beopenand its components should be transferred to the directory to use, in accordance with the routines practiced inOperation of Sources, typically , a tpl.xxx. In principle, all components must be in the same directory and can not (without customization of shell-scripts') be separated into different directories.

The only operation to be performed that resembles a plant (also referred to the copy) is to recreate aMaster-File(M / F) of the database gizmo calledgunians' that should be accomplished with theutilityof CISIS id2i as shown in the following command (assuming that the utilities are on CISISpathof running the machine):

create = id2i gunians.id gunians

Configuration

The package configuration is simple and based on a plain text file (plain textfile), following the style of the configuration file used in Apache, where we have a statement (statment) associating an end to value by an equal sign (= TERM value). Comments can permeate the file as a form of documentation if desired, but are not required, as well as the statements themselves are optional assuming, in case of absence,defaultvalues ​​previously programmed in shell-scripts.

Terms for use in a config-file (case-sensitive)
Term Explanation Default value Limits / Scope
PY Allows the use of lescielos.py (faster) (Undeclared) TRUE or undeclared
HOSTSERVER Identification of the server command execution SciELO BIREME / SciELO / NETWORK
TXTIMO Maximum number of attempts to control the transmission channel 31 Integer value greater than zero
RXTIMO Maximum number of attempts for data-ready in the receive channel 6 Integer value greater than zero
TXSERVER FTP server URL used in the transmission channel ftp.scielo.br Qqr valid FTP URL
RXSERVER URL of the FTP server used to receive channel ftp.scielo.br Any valid FTP URL
TXUSER Username to connect to the sendng FTP server usr.bireme Any valid username
TXPASS Password user's FTP server sending 123deoliveira4 Any valid password
RXUSER Username to connect to the receiving FTP server joao.silva Any valid username
RXPASS Password FTP server user reception b!r3n3 Any valid password
USER Username to connect to the FTP server (sender and receiver) jane.Doe Any valid username
PASS Password user's FTP server (sender and receiver) #s3nh@f0rt3! Any valid password

Contained in the package are eight (8) configuration files, identified by the extension .Conf (not compulsory) that can / should be taken as examples to create other configuration files, use the package exchange data via FTP.

Below is shown the contents of the configuration file vaivem.conf, that the file is assumed in the absence of other specific indication, by using the option -c in the call the shell-script sending or receiving data:

# Category of server serving data exchange (TX or RX)
HostServ = SciELO

# Time-out for trying to send with 'channel' busy in seconds
TXTIMO = 31

# Time-out for reception in an attempt to 'channel' unavailable in seconds
RXTIMO = 6

# Address of the FTP server to be used
TXSERVER = ftp.scielo.br
RXSERVER = ftp.scielo.br

Conclusion

At the end of the development cycle an email will be sent to "list OFI" giving science of task completion and early life of the product development to their inclusion in the systematic versioning of the institution.