Title: | R Utilities for GFF Files |
---|---|
Description: | R utilities for gff files, either general feature format (GFF3) or gene transfer format (GTF) formatted files. This package includes functions for producing summary stats, check for consistency and sorting errors, conversion from GTF to GFF3 format, file sorting, visualization and plotting of feature hierarchy, and exporting user defined feature subsets to SAF format. This tool was developed by the BioinfoGP core facility at CNB-CSIC. |
Authors: | Juan Antonio Garcia-Martin [cre, aut] , Juan Carlos Oliveros [aut, ctb] , Rafael Torres-Perez [aut, ctb] |
Maintainer: | Juan Antonio Garcia-Martin <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.1.6 |
Built: | 2024-12-07 06:57:50 UTC |
Source: | CRAN |
This function tests the consistency and order of a GFF file.
check_gff(inFile, fileType = c("AUTO", "GFF3", "GTF"))
check_gff(inFile, fileType = c("AUTO", "GFF3", "GTF"))
inFile |
Path to the input GFF file |
fileType |
Version of the input file (GTF/GFF3). Default AUTO: determined from the file name. |
The following list indicates the code and description of the issues detected in GFF3 files
Input file contains lines with more than 9 fields
Input file contains lines with less than 9 fields
Input file contains too many (more than 100) different feature types
ID attribute not found in any feature
There are duplicated IDs
The same ID has been found in more than one chromosome
Parent attribute not found in any feature
There are missing Parent IDs
There are features whose Parent is located in a different chromosome
Feature ids referenced in Parent attribute before being defined as ID
Features are not grouped by chromosome
Features are not sorted by start coordinate
File cannot be recognized as valid GFF3. Parsing warnings.
File cannot be recognized as valid GFF3. Parsing errors.
The following list indicates the code and description of the issues detected in GTF files
Input file contains lines with more than 9 fields
Input file contains lines with less than 9 fields
Input file contains too many (more than 100) different feature types
gene_id attribute not found in any feature
There are features without gene_id attribute
Gene features are not included in this GTF file
There are duplicated gene_ids
The same gene_id has been found in more than one chromosome
transcript_id attribute not found in any feature There are no elements with transcript_id attribute
There are features without transcript_id attribute
Transcript features are not included in this GTF file
There are duplicated transcript_ids
The same transcript_id has been found in more than one chromosome
Same id has been defined as gene_id and transcript_id
Features are not grouped by chromosome
Features are not sorted by start coordinate
File cannot be recognized as valid GTF. Parsing warnings.
File cannot be recognized as valid GTF. Parsing errors.
A data frame of detected issues, including a short code name, a description and estimated severity each. In no issues are detected the function will return an empty data frame.
test_gff3<-system.file("extdata", "eden.gff3", package="Rgff") check_gff(test_gff3)
test_gff3<-system.file("extdata", "eden.gff3", package="Rgff") check_gff(test_gff3)
Based on the feature type hierarchy a GFF file, this function creates and returns a feature tree or a feature dependency table.
get_features( inFile, includeCounts = FALSE, outFormat = c("tree", "data.frame", "JSON"), fileType = c("AUTO", "GFF3", "GTF") )
get_features( inFile, includeCounts = FALSE, outFormat = c("tree", "data.frame", "JSON"), fileType = c("AUTO", "GFF3", "GTF") )
inFile |
Path to the input GTF/GFF3 features file |
includeCounts |
Include number of occurrences of each feature and subfeature |
outFormat |
Output format of the function. Available formats are: tree (DEFAULT), data.frame and JSON. |
fileType |
Version of the input file (GTF/GFF3). Default AUTO: determined from the file name. |
Depending on the outFormat selected returns a feature tree (tree), a feature dependency table as data.frame (data.frame) or a feature dependency table as JSON object (JSON)
test_gff3<-system.file("extdata", "AthSmall.gff3", package="Rgff") get_features(test_gff3)
test_gff3<-system.file("extdata", "AthSmall.gff3", package="Rgff") get_features(test_gff3)
This function summarizes the number of features of each type in a GFF file and returns the statistics
gff_stats(inFile)
gff_stats(inFile)
inFile |
Path to the input GFF file |
A tibble with the summary data
test_gff3<-system.file("extdata", "AthSmall.gff3", package="Rgff") gff_stats(test_gff3)
test_gff3<-system.file("extdata", "AthSmall.gff3", package="Rgff") gff_stats(test_gff3)
This function summarizes the number of features of each type in each chromosome of a GFF file and returns the statistics
gff_stats_by_chr(inFile)
gff_stats_by_chr(inFile)
inFile |
Path to the input GFF file |
A tibble with the summary data
test_gff3<-system.file("extdata", "AthSmall.gff3", package="Rgff") gff_stats_by_chr(test_gff3)
test_gff3<-system.file("extdata", "AthSmall.gff3", package="Rgff") gff_stats_by_chr(test_gff3)
This function converts a GTF file into a GFF3 file mantaining the feature hierarchy defined by the gene_id and transcript_id attributes. The remaining attributes of each feature will be kept with the same name and value.
gtf_to_gff3(gtfFile, outFile, forceOverwrite = FALSE)
gtf_to_gff3(gtfFile, outFile, forceOverwrite = FALSE)
gtfFile |
Path to the input GTF file |
outFile |
Path to the output GFF3 file, inf not provided the output will be gtfFile.gff3 |
forceOverwrite |
If output file exists, overwrite the existing file. (default FALSE) |
Path to the generated GFF3 file
## Not run: test_gtf<-system.file("extdata", "AthSmall.gtf", package="Rgff") gtf_to_gff3(test_gtf) ## End(Not run)
## Not run: test_gtf<-system.file("extdata", "AthSmall.gtf", package="Rgff") gtf_to_gff3(test_gtf) ## End(Not run)
This function plots the feature tree from a GFF file or, if an output file name is provided, exports an image of in the desired format ("png", "pdf" or "svg"). Packages "DiagrammeR", "DiagrammeRsvg" and "rsvg" must be installed to use this function.
plot_features( inFile, outFile, includeCounts = FALSE, fileType = c("AUTO", "GFF3", "GTF"), exportFormat = c("png", "pdf", "svg") )
plot_features( inFile, outFile, includeCounts = FALSE, fileType = c("AUTO", "GFF3", "GTF"), exportFormat = c("png", "pdf", "svg") )
inFile |
Path to the input GFF file |
outFile |
Path to the output features image file, if not provided the tree will be plotted |
includeCounts |
Include number of occurrences of each subfeature |
fileType |
Version of the input file (GTF/GFF3). If not provided it will be determined from the file name. |
exportFormat |
Output image format when it is not possible to deduce it from the extension of outFile ("png", "pdf" or "svg"). Default, "png" |
Path of the output features image file
test_gff3<-system.file("extdata", "AthSmall.gff3", package="Rgff") plot_features(test_gff3)
test_gff3<-system.file("extdata", "AthSmall.gff3", package="Rgff") plot_features(test_gff3)
This function creates a SAF file from a GTF/GFF3 features for the given blocks and features
saf_from_gff( inFile, outFile, fileType = c("AUTO", "GFF3", "GTF"), forceOverwrite = FALSE, features = c("gene > exon"), sep = ">" )
saf_from_gff( inFile, outFile, fileType = c("AUTO", "GFF3", "GTF"), forceOverwrite = FALSE, features = c("gene > exon"), sep = ">" )
inFile |
Path to the input GFF file |
outFile |
Path to the output SAF file, if not provided the output path will be the input path with the suffix ".feature1-block1.feature2-block2(...).saf" |
fileType |
Version of the input file (GTF/GFF3). Default AUTO: determined from the file name. |
forceOverwrite |
If output file exists, overwrite the existing file. (default FALSE) |
features |
Vector of pairs of features/blocks, separated by '>' (see sep argument). In the case of features without defined blocks, only the feature is needed (see example) |
sep |
Separator of each "feature" and "block" provided in the feature argument (default '>') |
Path to the generated SAF file
test_gff3<-system.file("extdata", "AthSmall.gff3", package="Rgff") ## Default usage, extract gene features by exon blocks saf_from_gff(test_gff3) ## Define only feature without block to count reads within the whole genomic locus saf_from_gff(test_gff3, features=c("gene")) ## Define multiple features for counting readsoverlapping only in exonic regions saf_from_gff(test_gff3, features=c("gene > exon", "ncRNA_gene > exon"))
test_gff3<-system.file("extdata", "AthSmall.gff3", package="Rgff") ## Default usage, extract gene features by exon blocks saf_from_gff(test_gff3) ## Define only feature without block to count reads within the whole genomic locus saf_from_gff(test_gff3, features=c("gene")) ## Define multiple features for counting readsoverlapping only in exonic regions saf_from_gff(test_gff3, features=c("gene > exon", "ncRNA_gene > exon"))
This function produces a sorted GFF file from an unsorted GFF file. The default order is by Chromosome, Start, End (reverse) and feature (based on the precedency in feature tree)
sort_gff( inFile, outFile, fileType = c("AUTO", "GFF3", "GTF"), forceOverwrite = FALSE )
sort_gff( inFile, outFile, fileType = c("AUTO", "GFF3", "GTF"), forceOverwrite = FALSE )
inFile |
Path to the input GFF file |
outFile |
Path to the output sorted file, if not provided the output will be the input path (without extension) with the suffix sorted.gtf/gff3 |
fileType |
Version of the input file (GTF/GFF3). Default AUTO: determined from the file name. |
forceOverwrite |
If output file exists, overwrite the existing file. (default FALSE) |
Path to the sorted feature file
test_gff3<-system.file("extdata", "eden.gff3", package="Rgff") sort_gff(test_gff3)
test_gff3<-system.file("extdata", "eden.gff3", package="Rgff") sort_gff(test_gff3)