7.2. Reference converters¶
7.2.1. Summary¶
|
|
|
Convert PDB to FAA format |
7.2.2. All converters documentation¶
Convert ABI format to FASTA format
- class ABI2FASTA(infile, outfile, *args, **kargs)[source]¶
Convert ABI file to FASTQ file
ABI files are created by ABI sequencing machine and includes PHRED quality scores for base calls. This allows the creation of FastA files.
Method implemented is based on BioPython [BIOPYTHON].
constructor
- _default_method = 'biopython'¶
Default value
Convert ABI format to FASTQ format
- class ABI2FASTQ(infile, outfile, *args, **kargs)[source]¶
Convert ABI file to FASTQ file
ABI files are created by ABI sequencing machine and includes PHRED quality scores for base calls. This allows the creation of FastQ files.
Method implemented is based on BioPython [BIOPYTHON].
constructor
- _default_method = 'biopython'¶
Default value
Convert ABI format to QUAL format
- class ABI2QUAL(infile, outfile, *args, **kargs)[source]¶
-
ABI files are created by ABI sequencing machine and includes PHRED quality scores for base calls. This allows the creation of QUAL files.
Method implemented is based on BioPython [BIOPYTHON].
constructor
- _default_method = 'biopython'¶
Default value
Convert BAM format to BEDGRAPH format
- class BAM2BEDGRAPH(infile, outfile)[source]¶
Convert sorted BAM file into BEDGRAPH file
Compute the coverage (depth) in BEDGRAPH. Regions with zero coverage are also reported.
Note that this BEDGRAPH format is of the form:
chrom chromStart chromEnd dataValue
Note that consecutive positions with same values are compressed.
chr1 0 75 0 chr1 75 176 1 chr1 176 177 2
Warning
the BAM file must be sorted. This can be achieved with bamtools.
Methods available are based on bedtools [BEDTOOLS] and mosdepth [MOSDEPTH].
Constructor
- Parameters:
- _default_method = 'bedtools'¶
Default value
Convert BAM format to COV format
- class BAM2COV(infile, outfile)[source]¶
Convert sorted BAM file into COV file
Note that the COV format is of the form:
chr1 1 0 chr1 2 0 chr1 3 0 chr1 4 0 chr1 5 0
that is contig name, position, coverage.
Warning
the BAM file must be sorted. This can be achieved with bamtools using bamtools sort -in INPUT.bam
Methods available are based on samtools [SAMTOOLS] or bedtools [BEDTOOLS].
Constructor
- Parameters:
Convert BAM file to BIGWIG format
- class BAM2BIGWIG(infile, outfile, *args, **kargs)[source]¶
Convert BAM file to BIGWIG file
Convert BAM into a binary version of WIG format.
Methods are base on bamCoverage [DEEPTOOLS] and bedGraphToBigWig from wiggletools [WIGGLETOOLS]. Wiggletools method requires an extra argument (--chrom-sizes) therefore default one is bamCoverage for now.
Moreover, the two methods do not return exactly the same info!
You can check this by using bioconvert to convert into a human readable file such as wiggle. We will use the bamCoverage as our default conversion.
constructor
- _default_method = 'bamCoverage'¶
Default value
Convert BAM file to CRAM format
- class BAM2CRAM(infile, outfile, *args, **kargs)[source]¶
-
The conversion requires the reference corresponding to the input file It can be provided as an argument with the standalone (--reference). Otherwise, users are asked to provide it.
Methods available are based on samtools [SAMTOOLS].
constructor
- _default_method = 'samtools'¶
Default value
Convert BAM format to FASTA format
- class BAM2FASTA(infile, outfile)[source]¶
Convert sorted BAM file into FASTA file
Methods available are based on samtools [SAMTOOLS] or bedtools [BEDTOOLS].
Warning
Using the bedtools method, the R1 and R2 reads must be next to each other so that the reads are sorted similarly
Warning
there is no guarantee that the R1/R2 output file are sorted similarly in paired-end case due to supp and second reads
constructor
- _default_method = 'samtools'¶
Default value
Convert BAM format to FASTQ foarmat
- class BAM2FASTQ(infile, outfile)[source]¶
Convert sorted BAM file into FASTQ file
Methods available are based on samtools [SAMTOOLS] or bedtools [BEDTOOLS].
Warning
Using the bedtools method, the R1 and R2 reads must be next to each other so that the reads are sorted similarly
Warning
there is no guarantee that the R1/R2 output file are sorted similarly in paired-end case due to supp and second reads
constructor
- _default_method = 'samtools'¶
Default value
Convert BAM format to JSON format
- class BAM2JSON(infile, outfile)[source]¶
Convert BAM format to JSON file
Methods available are based on bamtools [BAMTOOLS].
constructor
- _default_method = 'bamtools'¶
Default value
Convert SAM file to BAM format
- class BAM2SAM(infile, outfile, *args, **kargs)[source]¶
-
Methods available are based on samtools [SAMTOOLS] , sam-to-bam [SAMTOBAM] , sambamba [SAMBAMBA] and pysam [PYSAM].
constructor
- _default_method = 'sambamba'¶
default value
- _method_sambamba(*args, **kwargs)[source]¶
Here we use the Sambamba tool. This is the default method because it is the fastest.
Convert BAM file to TSV format
- class BAM2TSV(infile, outfile, *args, **kargs)[source]¶
Convert sorted BAM file into TSV stats
This is not a conversion per se but the extraction of BAM statistics saved into a TSV format. The 4 columns of the TSV file are:
Reference sequence name, Sequence length,Mapped reads, Unmapped reads
Methods are based on samtools [SAMTOOLS] and pysam [PYSAM].
constructor
Methods are based on samtools [SAMTOOLS] and pysam [PYSAM].
- _default_method = 'samtools'¶
Default value
- class BAM2WIGGLE(infile, outfile)[source]¶
Convert sorted BAM file into WIGGLE file
Methods available are based on wiggletools [WIGGLETOOLS].
- Parameters:
Convert BCF file to VCF format
- class BCF2VCF(infile, outfile, *args, **kargs)[source]¶
-
Methods available are based on bcftools [BCFTOOLS].
constructor
Convert BCF format to WIGGLE format
- class BCF2WIGGLE(infile, outfile)[source]¶
Convert sorted BCF file into WIGGLE file
Methods available are based on wiggletools [WIGGLETOOLS].
- Parameters:
- _default_method = 'wiggletools'¶
Default value
Convert BED format to WIGGLE format
- class BED2WIGGLE(infile, outfile)[source]¶
Convert sorted BED file into WIGGLE file
Methods available are based on wiggletools [WIGGLETOOLS].
- Parameters:
Convert BEDGRAPH file to COV format
- class BEDGRAPH2COV(infile, outfile)[source]¶
Converts a BEDGRAPH (4 cols) to COV format (3 cols)
Input example:
chr19 49302000 4930205 -1 chr19 49302005 4930210 1
becomes:
chr19 4930201 -1 chr19 4930202 -1 chr19 4930203 -1 chr19 4930204 -1 chr19 4930205 -1 chr19 4930206 1 chr19 4930207 1 chr19 4930208 1 chr19 4930209 1 chr19 4930210 1
Method available is a Bioconvert implementation (Python).
constructor
- _default_method = 'python'¶
Default value
Convert BEDGRAPH to BIGWIG format
- class BEDGRAPH2BIGWIG(infile, outfile)[source]¶
Converts BEDGRAPH format to BIGWIG format
Conversion is based on bedGraph2BigWig tool. Note that an argument --chrom-sizes is required.
constructor
- _default_method = 'ucsc'¶
Default value
Convert BEDGRAPH format to WIGGLE format
- class BEDGRAPH2WIGGLE(infile, outfile)[source]¶
Convert sorted BEDGRAPH file into WIGGLE file
Methods available are based on wiggletools [WIGGLETOOLS].
- Parameters:
- _default_method = 'wiggletools'¶
Default value
Convert BIGBED format to WIGGLE format
- class BIGBED2WIGGLE(infile, outfile)[source]¶
Convert sorted BIGBED file into WIGGLE file
Methods available are based on wiggletools [WIGGLETOOLS].
- Parameters:
- _default_method = 'wiggletools'¶
Default value
Convert BIGBED format to BED format
- class BIGBED2BED(infile, outfile)[source]¶
Converts a sequence alignment in BIGBED format to BED4 format
Methods available are based on pybigwig [DEEPTOOLS].
constructor
- _default_method = 'pybigwig'¶
Default value
Convert BIGWIG to BEDGRAPH format
- class BIGWIG2BEDGRAPH(infile, outfile)[source]¶
Converts a sequence alignment in BIGWIG format to BEDGRAPH format
Conversion is based on ucsc bigWigToBedGraph tool or pybigwig (default) [DEEPTOOLS].
constructor
- _default_method = 'pybigwig'¶
Default value
- _method_pybigwig(*args, **kwargs)[source]¶
In this method we use the python extension written in C, pyBigWig.
Convert BIGWIG format to WIGGLE format
- class BIGWIG2WIGGLE(infile, outfile)[source]¶
Convert sorted BIGWIG file into WIGGLE file
Methods available are based on pybigwig [DEEPTOOLS].
- Parameters:
- _default_method = 'wiggletools'¶
Default value
Convert BPLINK to PLINK format
- class BPLINK2PLINK(infile, outfile=None, *args, **kwargs)[source]¶
Converts a genotype dataset bed+bim+fam in BPLINK format to ped+map PLINK format.
Conversion is based on plink [PLINK] executable.
Warning
plink takes several inputs and outputs and does not need extensions. What is required is a prefix. Bioconvert usage is therefore:
bioconvert bplink2plink plink_toy
Since there is no extension, you must be explicit by providing the conversion name (bplink2plink). This command will search for 3 input files plink_toy.bed, plink_toy.bim and plink_toy.fam. It will then create two output files named plink_toy.ped and plink_toy.map
constructor
- _default_method = 'plink'¶
Default value
- class BZ22GZ(infile, outfile, *args, **kargs)[source]¶
-
Methods based on bunzip2 or zlib/bz2 Python libraries.
constructor
- _default_method = 'bz2_gz'¶
Default value
Convert CLUSTAL to FASTA format
- class CLUSTAL2FASTA(infile, outfile=None, alphabet=None, *args, **kwargs)[source]¶
Converts a sequence alignment from CLUSTAL to FASTA format.
Methods available are based on squizz [SQUIZZ] or biopython [BIOPYTHON], and goalign [GOALIGN].
constructor
- _default_method = 'biopython'¶
Default value
Convert CLUSTAL to PHYLIP format
- class CLUSTAL2PHYLIP(infile, outfile=None, alphabet=None, *args, **kwargs)[source]¶
Converts a sequence alignment from CLUSTAL format to PHYLIP format.
Methods available are based on squizz [SQUIZZ] or biopython [BIOPYTHON], and goalign [GOALIGN].
constructor
- _default_method = 'biopython'¶
Default value
Convert CLUSTAL to STOCKHOLM format
- class CLUSTAL2STOCKHOLM(infile, outfile=None, alphabet=None, *args, **kwargs)[source]¶
Converts a sequence alignment from CLUSTAL format to STOCKHOLM format.
Methods available are based on squizz [SQUIZZ] or biopython [BIOPYTHON], and goalign [GOALIGN].
constructor
- _default_method = 'biopython'¶
Default value
Convert CRAM file to BAM format
- class CRAM2BAM(infile, outfile, *args, **kargs)[source]¶
-
The conversion requires the reference corresponding to the input file It can be provided as an argument with the standalone (--reference). Otherwise, users are asked to provide it.
Methods available are based on samtools [SAMTOOLS].
constructor
- _default_method = 'samtools'¶
Default value
Convert CRAM file to FASTQ format
- class CRAM2FASTA(infile, outfile, *args, **kargs)[source]¶
Convert CRAM file to FASTA file
Methods available are based on samtools [SAMTOOLS].
constructor
- _default_method = 'samtools'¶
Default value
Convert CRAM file to FASTQ format
- class CRAM2FASTQ(infile, outfile, *args, **kargs)[source]¶
Convert CRAM file to FASTQ file
Methods available are based on samtools [SAMTOOLS].
constructor
- _default_method = 'samtools'¶
Default value
Convert CRAM file to SAM format
- class CRAM2SAM(infile, outfile, *args, **kargs)[source]¶
-
The conversion requires the reference corresponding to the input file It can be provided as an argument with the standalone (--reference). Otherwise, users are asked to provide it.
Methods available are based on samtools [SAMTOOLS].
constructor
- _default_method = 'samtools'¶
Default value
Convert CSV format to TSV format
- class CSV2TSV(infile, outfile)[source]¶
Convert CSV file into TSV file
Available methods: Python, Pandas
Methods available are based on python or Pandas [PANDAS].
See also
TSV2CSV
Constructor
- _default_method = 'python'¶
Default value
- _method_pandas(in_sep=',', out_sep='\t', line_terminator='\n', *args, **kwargs)[source]¶
- _method_python(in_sep=',', out_sep='\t', line_terminator='\n', *args, **kwargs)[source]¶
- class CSV2XLS(infile, outfile, *args, **kargs)[source]¶
-
Methods available are based on python, pyexcel [PYEXCEL], or pandas [PANDAS].
constructor
- _default_method = 'pandas'¶
Default value
- _method_pandas(in_sep=',', sheet_name='Sheet 1', *args, **kwargs)[source]¶
Convert a compressed FASTQ from DSRC to FASTQ format
- class DSRC2GZ(infile, outfile, *args, **kargs)[source]¶
Convert a compressed FASTQ from DSRC to GZ format
Methods available are based on dsrc [DSRC] and pigz [PIGZ].
constructor
- _default_method = 'dsrcpigz'¶
Default value
- _method_dsrcpigz(*args, **kwargs)[source]¶
Do the conversion dsrc -> GZ. Method that uses pigz and dsrc.
pigz documentation dsrc documentation
option threadig does not work with the dsrc version from conda so we do not add the -t threads option
Convert EMBL file to FASTA format
- class EMBL2FASTA(infile, outfile, *args, **kargs)[source]¶
Convert EMBL file to FASTA file
Methods available are based on squizz [SQUIZZ] or biopython [BIOPYTHON].
constructor
- _default_method = 'biopython'¶
Default value
Convert EMBL file to GENBANK format
- class EMBL2GENBANK(infile, outfile, *args, **kargs)[source]¶
Convert EMBL file to GENBANK file
Methods available are based on squizz [SQUIZZ] and biopython [BIOPYTHON].
constructor
- _default_method = 'biopython'¶
Default value
Convert FASTA format to FASTQ format
- class FASTA_QUAL2FASTQ(infile, outfile)[source]¶
Convert FASTA and QUAL back into a FASTQ file
Method based on pysam [PYSAM].
- Parameters:
- _default_method = 'pysam'¶
Default value
Convert FASTA to CLUSTAL format
- class FASTA2CLUSTAL(infile, outfile=None, alphabet=None, *args, **kwargs)[source]¶
Converts a sequence alignment from FASTA to CLUSTAL format
Methods available are based on squizz [SQUIZZ] or biopython [BIOPYTHON], and goalign [GOALIGN].
constructor
- _default_method = 'biopython'¶
Default value
- _method_biopython(*args, **kwargs)[source]¶
Convert FASTA interleaved file in CLUSTAL format using biopython.
Convert FASTA format to FAA format
- class FASTA2FAA(infile, outfile)[source]¶
Methods available is a bioconvert implementation.
- Parameters:
- _default_method = 'bioconvert'¶
Default value
Convert FASTA format to FASTQ format
- class FASTA2FASTQ(infile, outfile)[source]¶
Methods available are based on pysam [PYSAM].
- Parameters:
- _default_method = 'pysam'¶
Default value
Convert FASTA to GENBANK format
- class FASTA2GENBANK(infile, outfile, *args, **kargs)[source]¶
Convert FASTA file to GENBANK file
Methods available are based on squizz [SQUIZZ] or biopython [BIOPYTHON] or Bioconvert pure implementation (default).
constructor
- _default_method = 'bioconvert'¶
Default value
- class FASTA2NEXUS(infile, outfile=None, alphabet=None, *args, **kwargs)[source]¶
Converts a sequence alignment in FASTA format to NEXUS format
Methods available are based on squizz [GOALIGN].
constructor
- _default_method = 'goalign'¶
Default value
Convert FASTA to PHYLIP format
- class FASTA2PHYLIP(infile, outfile=None, alphabet=None, *args, **kwargs)[source]¶
Converts a sequence alignment in FASTA format to PHYLIP format
Conversion is based on Bio Python modules
Methods available are based on squizz [SQUIZZ] or biopython [BIOPYTHON] or goalign [GOALIGN]. Squizz is the default (https://github.com/bioconvert/bioconvert/issues/149). Phylip created is a strict phylip that is with 10 characters on the first column.
constructor
- _default_method = 'biopython'¶
Default value
Convert FASTA to TWOBIT format
- class FASTA2TWOBIT(infile, outfile=None, alphabet=None, *args, **kwargs)[source]¶
Converts a sequence alignment in FASTA format to TWOBIT format
Methods available are based on UCSC faToTwoBit [UCSC].
constructor
- _default_method = 'ucsc'¶
default value
- class FASTQ2FASTA(infile, outfile)[source]¶
-
This converter has lots of methods. Some of them have also been removed or commented with time. BioPython for instance is commented due to poo performance compared to others. Does not mean that it is not to be considered. Performances are decrease due to lot of sanity checks.
Similarly, bioawk and python_external method are commented because redundant with other equivalent method.
- Parameters:
- _default_method = 'bioconvert'¶
default value
- _method_awk(*args, **kwargs)[source]¶
Here we are using the awk method.
Note
Another method with awk has been tested but is less efficient. Here is which one was used:
box.awkcmd = """awk '{{if(NR%4==1) {{printf(">%s\n",substr($0,2));}} else if(NR%4==2) print;}}' """
- _method_bioconvert(*args, **kwargs)[source]¶
Bioconvert implementation in pure Python. This is the default method because it is the fastest.
- _method_mappy(*args, **kwargs)[source]¶
This method provides a fast and accurate C program to align genomic sequences and transcribe nucleotides.
- _method_mawk(*args, **kwargs)[source]¶
This variant of the awk method uses mawk, a lighter and faster implementation of awk.
Note
Other methods with mawk have been tested but are less efficient. Here are which ones were used:
mawkcmd_v2 = """mawk '{{if(NR%4==1) {{printf(">%s\n",substr($0,2));}} else if(NR%4==2) print;}}' """ mawkcmd_v3 = """mawk '(++n<=0){next}(n!=1){print;n=-2;next}{print">"substr($0,2)}'"""
- _method_perl(*args, **kwargs)[source]¶
This method uses the perl command which will call the "fastq2fasta.pl" script.
- _method_sed(*args, **kwargs)[source]¶
This method uses the UNIX function sed which is a non-interactive editor.
Note
Another method with sed has been tested but is less efficient. Here is which one was used:
cmd = """sed -n 's/^@/>/p;n;p;n;n'"""
Convert GENBANK to EMBL format
- class GENBANK2EMBL(infile, outfile, *args, **kargs)[source]¶
Convert GENBANK file to EMBL file
Some description.
constructor
- _default_method = 'biopython'¶
Default value
Convert GENBANK to EMBL format
- class GENBANK2FASTA(infile, outfile, *args, **kargs)[source]¶
Convert GENBANK file to FASTA file
Methods are based on biopython [BIOPYTHON], squizz [SQUIZZ] and our own Bioconvert implementation.
constructor
- _default_method = 'biopython'¶
Default value
Convert GENBANK to GFF3 format
- class GENBANK2GFF3(infile, outfile, *args, **kargs)[source]¶
Convert GENBANK file to GFF3 file
Method based on biocode.
constructor
- _default_method = 'biocode'¶
Default value
- _method_biocode(*args, **kwargs)[source]¶
Uses scripts from biocode copied and modified in bioconvert.utils.biocode
Please see Main entry
- class GFF22GFF3(infile, outfile, *args, **kargs)[source]¶
-
constructor
Method available is pure Python.
- _default_method = 'bioconvert'¶
Default value
- class GFF32GFF2(infile, outfile, *args, **kargs)[source]¶
-
Method available is Python-based.
constructor
- _default_method = 'bioconvert'¶
Default value
- class GFA2FASTA(infile, outfile)[source]¶
Convert sorted GFA file into FASTA file
Available methods are based on awk or python (default)
See also
bioconvert.simulator.gfa
- Parameters:
- _default_method = 'python'¶
Default value
- _method_awk(*args, **kwargs)[source]¶
For this method, we use the awk tools.
- Returns:
the standard output
- Return type:
io.StringIO
object.
Note
this method fold the sequence to 80 characters
- class GZ2BZ2(infile, outfile, *args, **kargs)[source]¶
-
Unzip input file using pigz or gunzip and compress using pbzip2. Default is pigz/pbzip2.
constructor
- _default_method = 'pigz_pbzip2'¶
Default value
- _method_gunzip_bzip2(*args, **kwargs)[source]¶
Single theaded conversion. Method that uses gunzip bzip2.
- class GZ2DSRC(infile, outfile, *args, **kargs)[source]¶
Convert compressed fastq.gz file into DSRC compressed file
constructor
- _default_method = 'pigzdsrc'¶
Default value
- _method_pigzdsrc(*args, **kwargs)[source]¶
do the conversion gz -> DSRC
- Returns:
the standard output
- Return type:
io.StringIO
object.
Method that uses pigz and dsrc.
- class JSON2YAML(infile, outfile, *args, **kargs)[source]¶
Convert JSON file into YAML file
Conversion is based on yaml and json standard Python modules Indentation is set to 4 by default and affects the sections (not the list). For example:
fruits_list: - apple - orange section1: do: true misc: 1
constructor
- _default_method = 'yaml'¶
Default value
Convert MAF file to SAM format
- class MAF2SAM(infile, outfile)[source]¶
This is the Multiple alignment format or MIRA assembly format
This is not Mutation Annotation Format (somatic)
pbsim creates this kind of data
Some references:
https://github.com/arq5x/nanopore-scripts/master/maf-convert.py
http://bioperl.org/formats/alignment_formats/MAF_multiple_alignment_format.html
Those two codes were in Py2 at the time of this implementation. We re-used some of the information from maf-convert but the code in bioconvert.io.maf can be considered original.
constructor
- Parameters:
- _default_method = 'python'¶
Default value
Converts NEWICK file to NEXUS format.
- class NEWICK2NEXUS(infile, outfile=None, *args, **kwargs)[source]¶
Converts a tree file from NEWICK format to NEXUS format.
Methods available are based on gotree [GOTREE].
constructor
- _default_method = 'gotree'¶
Default value
Converts NEWICK file to PHYLOXML format.
- class NEWICK2PHYLOXML(infile, outfile=None, alphabet=None, *args, **kwargs)[source]¶
Converts a tree file from NEWICK format to PHYLOXML format.
Methods available are based on gotree [GOTREE].
constructor
- _default_method = 'gotree'¶
Default value
- class NEXUS2FASTA(infile, outfile=None, alphabet=None, *args, **kwargs)[source]¶
Converts a sequence alignment from NEXUS format to FASTA format.
constructor
- _default_method = 'biopython'¶
Default value
- _method_biopython(*args, **kwargs)[source]¶
- Convert NEXUS interleaved or sequential file in FASTA format using biopython.
The FASTA output file will be an aligned FASTA file.
For instance:
We have a Nexus input file that look like
#NEXUS [TITLE: Test file] begin data; dimensions ntax=3 nchar=123; format interleave datatype=DNA missing=N gap=-; matrix read3 -AT--------CCCGCTCGATGGGCCTCATTGCGTCCACTAGTTGATCTT read2 -----------------------GGAAGCCCACGCCACGGTCTTGATACG read4 ---------------------AGGGATGAACGATGCTCGCAGTTGATGCT read3 CTGGAGTAT---T----TAGGAAAGCAAGTAAACTCCTTGTACAAATAAA read2 AATTTTTCTAATGGCTATCCCTACATAACCTAACCGGGCATGTAATGTGT read4 CAGAAGTGCCATTGCGGTAGAAACAAATGTTCCCAGATTGTTGACTGATA read3 GATCTTA-----GATGGGCAT-- read2 CACCGTTGTTTCGACGTAAAGAG read4 AGTAGGACCTCAGTCGTGACT-- ; end; begin assumptions; options deftype=unord; end;
the output file will look like
>read3 -AT--------CCCGCTCGATGGGCCTCATTGCGTCCACTAGTTGATCTTCTGGAGTAT- --T----TAGGAAAGCAAGTAAACTCCTTGTACAAATAAAGATCTTA-----GATGGGCA T-- >read2 -----------------------GGAAGCCCACGCCACGGTCTTGATACGAATTTTTCTA ATGGCTATCCCTACATAACCTAACCGGGCATGTAATGTGTCACCGTTGTTTCGACGTAAA GAG >read4 ---------------------AGGGATGAACGATGCTCGCAGTTGATGCTCAGAAGTGCC ATTGCGGTAGAAACAAATGTTCCCAGATTGTTGACTGATAAGTAGGACCTCAGTCGTGAC T--
and not
>read3 ATCCCGCTCGATGGGCCTCATTGCGTCCACTAGTTGATCTTCTGGAGTATTTAGGAAAGC AAGTAAACTCCTTGTACAAATAAAGATCTTAGATGGGCAT >read2 GGAAGCCCACGCCACGGTCTTGATACGAATTTTTCTAATGGCTATCCCTACATAACCTAA CCGGGCATGTAATGTGTCACCGTTGTTTCGACGTAAAGAG >read4 AGGGATGAACGATGCTCGCAGTTGATGCTCAGAAGTGCCATTGCGGTAGAAACAAATGTT CCCAGATTGTTGACTGATAAGTAGGACCTCAGTCGTGACT
Converts NEXUS file to NEWICK format.
- class NEXUS2NEWICK(infile, outfile=None, alphabet=None, *args, **kwargs)[source]¶
Converts a tree file from NEXUS format to NEWICK format.
Methods available are based on biopython [BIOPYTHON] or goalign [GOALIGN].
constructor
- _default_method = 'gotree'¶
Default value
Converts NEXUS file to PHYLIP format.
- class NEXUS2PHYLIP(infile, outfile=None, alphabet=None, *args, **kwargs)[source]¶
Converts a sequence alignment from NEXUS format to PHYLIP format.
Methods available are based on goalign [GOALIGN].
constructor
- _default_method = 'goalign'¶
Default value
Converts NEXUS file to PHYLOXML format.
- class NEXUS2PHYLOXML(infile, outfile=None, alphabet=None, *args, **kwargs)[source]¶
Converts a tree file from NEXUS format to PHYLOXML format.
Methods available are based on squizz [SQUIZZ] or biopython [BIOPYTHON], and goalign [GOALIGN].
constructor
- _default_method = 'gotree'¶
Default value
Convert XLS format to CSV format
- class ODS2CSV(infile, outfile)[source]¶
Convert XLS file into CSV file
Method based on pyexcel [PYEXCEL].
constructor
- _default_method = 'pyexcel'¶
Default value
Converts PHYLIP file to CLUSTAL format.
- class PHYLIP2CLUSTAL(infile, outfile=None, alphabet=None, *args, **kwargs)[source]¶
Converts a sequence alignment from PHYLIP format to CLUSTAL format
Methods available are based on biopython [BIOPYTHON], squiz [SQUIZZ].
constructor
- _default_method = 'biopython'¶
Default value
Converts PHYLIP file to FASTA format.
- class PHYLIP2FASTA(infile, outfile=None, alphabet=None, *args, **kwargs)[source]¶
Converts a sequence alignment in PHYLIP format to FASTA format
Methods available are based on biopython [BIOPYTHON], squiz [SQUIZZ].
constructor
- _default_method = 'biopython'¶
default value
Converts PHYLIP file to NEXUS format.
- class PHYLIP2NEXUS(infile, outfile=None, *args, **kwargs)[source]¶
Converts a sequence alignment from PHYLIP format to NEXUS format.
Methods available are based on goalign [GOALIGN].
constructor
- _default_method = 'goalign'¶
Default value
Converts PHYLIP file to STOCKHOLM format.
- class PHYLIP2STOCKHOLM(infile, outfile=None, alphabet=None, *args, **kwargs)[source]¶
Converts a sequence alignment from PHYLIP interleaved to STOCKHOLM
Methods available are based on biopython [BIOPYTHON], squiz [SQUIZZ].
constructor
- _default_method = 'biopython'¶
Default value
Converts PHYLIP file to XMFA format.
- class PHYLIP2XMFA(infile, outfile=None, *args, **kwargs)[source]¶
Converts a sequence alignment from PHYLIP format to XMFA
Methods available are based on biopython [BIOPYTHON].
constructor
- _default_method = 'biopython'¶
Default value
Converts PHYLOXML file to NEWICK format.
- class PHYLOXML2NEWICK(infile, outfile=None, *args, **kwargs)[source]¶
Converts a tree file from PHYLOXML format to NEWICK format.
Methods available are based on gotree [GOTREE].
constructor
- _default_method = 'gotree'¶
Default value
Converts PHYLOXML file to NEXUS format.
- class PHYLOXML2NEXUS(infile, outfile=None, *args, **kwargs)[source]¶
Converts a tree file from PHYLOXML format to NEXUS format.
Methods available are based on gotree [GOTREE].
constructor
- _default_method = 'gotree'¶
Default value
- class PLINK2BPLINK(infile, outfile=None, *args, **kwargs)[source]¶
Converts a genotype dataset ped+map in PLINK format to bed+bim+fam BPLINK format
Conversion is based on plink executable
constructor
- _default_method = 'plink'¶
Default value
Convert SAM file to BAM format
- class SAM2BAM(infile, outfile, *args, **kargs)[source]¶
-
constructor
- _default_method = 'samtools'¶
Default value
Convert SAM file to CRAM format
- class SAM2CRAM(infile, outfile, reference=None, *args, **kargs)[source]¶
-
The conversion requires the reference corresponding to the input file It can be provided as an argument with the standalone (--reference). Otherwise, users are asked to provide it.
Methods available are based on samtools [SAMTOOLS].
constructor
- _default_method = 'samtools'¶
Default value
- class SAM2PAF(infile, outfile, *args, **kargs)[source]¶
-
The SAM and PAF formats are described in the Formats section.
Description:
The header of the SAM file (lines starting with @) are dropped. However, the length of the target is retrieved from the @SQ line that must be present.
Consider this SAM file with two alignements only. One is aligned on the target (first) while the other is not (indicated by the
*
characters):@SQ SN:ENA|K01711|K01711.1 LN:15894 @PG ID:minimap2 PN:minimap2 VN:2.5-r572 CL:minimap2 -a measles.fa Hm2_GTGAAA_L005_R1_001.fastq.gz HISEQ:426:C5T65ACXX:5:2302:1943:2127 0 ENA|K01711|K01711.1 448 60 101M * 00 CTTACCTTCGCATCAAGAGGTACCAACATGGAGGATGAGGCGGACCAATACTTTTCACATGATGATCCAATTAGTAGTGATCAATCCAGGTTCGGATGGTT BCCFFFFFHHHHHIIJJJJJJIIJJJJJJJJFHIHIJJJIJIIIIGHFFFFFFEEEEEEEDDDDDFDDDDDDDDD>CDDEDEEDDDDDDCCDDDDDDDDCD NM:i:0 ms:i:202 AS:i:202 nn:i:0 tp:A:P cm:i:14 s1:i:94 s2:i:0 HISEQ:426:C5T65ACXX:5:2302:4953:2090 4 * 0 0 * * 0 0 AGATCGGAAGAGCACACGTCTGAACTCCAGTCACGTGAAAATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAAACAACCAAAAAGAGACGAACAA CCCFFDDFAFFBHJHGGGIHIJBGGHIIJJJJJJJHGEIJGIFIIIHCBGHIJIIIIIJJHHHHEF@D@=;=,0)0&5&))+(((+((((&+(((()&&)(
The equivalent PAF file is
HISEQ:426:C5T65ACXX:5:2302:1943:2127 101 0 101 + ENA|K01711|K01711.1 15894 447 548 101 101 60 NM:i:0 ms:i:202 AS:i:202 nn:i:0 tp:A:P cm:i:14 s1:i:94 s2:i:0 cg:Z:101M
In brief, the sequences are dropped. The final file is therefore smaller. Extra fields (starting from NM:i:0) can be dropped or kept using the keep_extra_field argument. Alignement with
*
characters are dropped. The first line (@SQ) is used to retrieve the length of the contigs that is stored in the PAF file (column 6).The 12 compulsary PAF fields are:
Col
Type
Description
1
string
Query sequence name
2
int
Query sequence length
3
int
Query start (0-based)
4
int
Query end (0-based)
5
char
Relative strand: "+" or "-"
6
string
Target sequence name
7
int
Target sequence length
8
int
Target start on original strand (0-based)
9
int
Target end on original strand (0-based)
10
int
Number of residue matches
11
int
Alignment block length
12
int
Mapping quality (0-255; 255 for missing)
For developesr:
Get the measles data from Sequana library (2 paired fastq files):
minimap2 measles.fa R1.fastq > approx-mapping.paf
You can ask minimap2 to generate CIGAR at the cg tag of PAF with:
minimap2 -c measles.fa R1.fastq > alignment.paf
or to output alignments in the SAM format:
minimap2 -a measles.fa R1.fastq > alignment.sam
The SAM lines must contains 11 positional element and the NM:i and nn:i fields (see example above).
constructor
- Parameters:
- Reference:
This function is a direct translation of https://github.com/lh3/miniasm/blob/master/misc/sam2paf.js (Dec. 2017).
- _default_method = 'python'¶
Default value
Convert SCF file to FASTA file
- class SCF2FASTA(infile, outfile)[source]¶
Converts a binary SCF/ABI file to Fasta format.
constructor
- Parameters:
- _default_method = 'python'¶
Default value
Convert SCF file to FASTQ file
- class SCF2FASTQ(infile, outfile)[source]¶
Converts a binary SCF file to FastQ file
constructor
- Parameters:
- _default_method = 'python'¶
Default value
Convert SRA format to FASTA format
- class SRA2FASTQ(infile, outfile, test=False)[source]¶
Download FASTQ from SRA archive
bioconvert sra2fastq ERR043367
This may take some times since the files are downloaded from SRA website.
constructor
https://edwards.flinders.edu.au/fastq-dump/
library used: sra-toolkit
- _default_method = 'fastq_dump'¶
Default value
Converts STOCKHOLM file to CLUSTAL file.
- class STOCKHOLM2CLUSTAL(infile, outfile=None, alphabet=None, *args, **kwargs)[source]¶
Converts a sequence alignment from STOCKHOLM format to CLUSTAL format
Methods available are based on squizz [SQUIZZ] and biopython [BIOPYTHON].
constructor
- _default_method = 'biopython'¶
Default value
Converts STOCKHOLM to PHYLIP format.
- class STOCKHOLM2PHYLIP(infile, outfile=None, *args, **kwargs)[source]¶
Converts a sequence alignment from STOCKHOLM format to PHYLIP interleaved format
Methods available are based on squizz [SQUIZZ], and biopython [BIOPYTHON].
constructor
- _default_method = 'biopython'¶
Default value
Convert TSV format to CSV format
- class TSV2CSV(infile, outfile)[source]¶
Convert TSV file into CSV file
Available methods: Python, Pandas
Methods available are based on python or Pandas [PANDAS].
See also
CSV2TSV
Constructor
- _default_method = 'python'¶
Default value
- _method_pandas(in_sep='\t', out_sep=',', line_terminator='\n', *args, **kwargs)[source]¶
- _method_python(in_sep='\t', out_sep=',', line_terminator='\n', *args, **kwargs)[source]¶
Conversion from TWOBIT to FASTA format
- class TWOBIT2FASTA(infile, outfile=None, alphabet=None, *args, **kwargs)[source]¶
Converts a sequence alignment in TWOBIT format to FASTA format
Conversion is based on UCSC [UCSC] and py2bit.
constructor
- _default_method = 'py2bit'¶
Default value
- class VCF2BCF(infile, outfile=None, *args, **kwargs)[source]¶
Convert VCF file to BCF format
Method based on bcftools [BCFTOOLS].
- Parameters:
- _default_method = 'bcftools'¶
Default value
- class VCF2BED(infile, outfile)[source]¶
Convert VCF file to BED3 file by extracting positions.
The awk method implemented here below reports an interval of 1 for SNP, the length of the insertion or the length of the deleted part in case of deletion.
constructor
- Parameters:
- _default_method = 'awk'¶
Default value
- _method_awk(*args, **kwargs)[source]¶
do the conversion VCF -> BED using awk
- Returns:
the standard output
- Return type:
io.StringIO
object.
Convert VCF format to WIGGLE format
- class VCF2WIGGLE(infile, outfile)[source]¶
Convert sorted VCF file into WIGGLE file
- Parameters:
- _default_method = 'wiggletools'¶
Default value
Convert XLS format to CSV format
- class XLS2CSV(infile, outfile)[source]¶
Convert XLS file into CSV file
Extra arguments when using Bioconvert executable.
name
Description
--sheet-name
The name or id of the sheet to convert
--out-sep
The separator used in the output file
--line-terminator
The line terminator used in the output file
Methods available are based on pandas [PANDAS] and pyexcel [PYEXCEL].
Constructor
- _method_pandas(out_sep=',', line_terminator='\n', sheet_name=0, *args, **kwargs)[source]¶
Convert XLS format to CSV format
- class XLSX2CSV(infile, outfile)[source]¶
Convert XLSX file into CSV file
Extra arguments when using Bioconvert executable.
name
Description
--sheet-name
The name or id of the sheet to convert
--out-sep
The separator used in the output file
--line-terminator
The line terminator used in the output file
Methods available are based on pandas [PANDAS] and pyexcel [PYEXCEL].
Constructor
- _default_method = 'pandas'¶
Default value
- _method_pandas(out_sep=',', line_terminator='\n', sheet_name=0, *args, **kwargs)[source]¶
- class XMFA2PHYLIP(infile, outfile=None, *args, **kwargs)[source]¶
Converts a sequence alignment from XMFA to PHYLIP format.
Method available based on biopython [BIOPYTHON].
constructor
- _default_method = 'biopython'¶
Default value