最新要闻

广告

手机

iphone11大小尺寸是多少?苹果iPhone11和iPhone13的区别是什么?

iphone11大小尺寸是多少?苹果iPhone11和iPhone13的区别是什么?

警方通报辅警执法直播中被撞飞:犯罪嫌疑人已投案

警方通报辅警执法直播中被撞飞:犯罪嫌疑人已投案

家电

全球即时:gget: 一款强大的基因组参考数据库的高效查询工具

来源:博客园

开源 Python 和命令行程序 gget 可以高效、轻松地以编程方式访问存储在各种大型公共基因组参考数据库中的信息。 gget 与可获取用户生成的测序数据的现有工具一起使用 ,以取代在基因组数据分析过程中效率低下、可能容易出错的手动网络查询。虽然 gget 模块的灵感来自于繁琐的单细胞 RNA-seq 数据分析任务),但我们预计它们可用于广泛的生物信息学任务。

可以通过运行“pip install gget”从命令行安装 gget。下图描述了每个 gget 工具的一个用例和相应的输出。每个 gget 工具都有一个详尽的手册,可作为 Python 环境中的函数文档或在命令行中使用帮助标志 [-h] 作为标准输出。

gget工具地址

gget地址:https://pachterlab.github.io/gget/gget 示例存储库:https://github.com/pachterlab/gget_examples


【资料图】

gget安装

pip install --upgrade gget

或者

conda install -c bioconda gget

在 Jupyter Lab / Google Colab中调用

import gget

gget模块

    • gget refFetch File Transfer Protocols (FTPs) and metadata for reference genomes and annotations fromEnsemblby species.

    • gget searchFetch genes and transcripts fromEnsemblusing free-form search terms.

    • gget infoFetch extensive gene and transcript metadata fromEnsembl,UniProt, andNCBIusing Ensembl IDs.

    • gget seqFetch nucleotide or amino acid sequences of genes or transcripts fromEnsemblorUniProt, respectively.

    • gget blastBLAST a nucleotide or amino acid sequence to anyBLASTdatabase.

    • gget blatFind the genomic location of a nucleotide or amino acid sequence usingBLAT.

    • gget muscleAlign multiple nucleotide or amino acid sequences to each other usingMuscle5.

    • gget enrichrPerform an enrichment analysis on a list of genes usingEnrichr.

    • gget archs4Find the most correlated genes to a gene of interest or find the gene"s tissue expression atlas usingARCHS4.

    • gget pdbGet the structure and metadata of a protein from theRCSB Protein Data Bank.

    • gget alphafoldPredict the 3D structure of a protein from its amino acid sequence using a simplified version ofDeepMind’sAlphaFold2.

gget快速入门

命令行

# Fetch all Homo sapiens reference and annotation FTPs from the latest Ensembl release$ gget ref homo_sapiens# Get Ensembl IDs of human genes with "ace2" or "angiotensin converting enzyme 2" in their name/description$ gget search -s homo_sapiens "ace2" "angiotensin converting enzyme 2"# Look up gene ENSG00000130234 (ACE2) and its transcript ENST00000252519$ gget info ENSG00000130234 ENST00000252519# Fetch the amino acid sequence of the canonical transcript of gene ENSG00000130234$ gget seq --translate ENSG00000130234# Quickly find the genomic location of (the start of) that amino acid sequence$ gget blat MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS# BLAST (the start of) that amino acid sequence$ gget blast MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS# Align nucleotide or amino acid sequences stored in a FASTA file$ gget muscle path/to/file.fa# Use Enrichr for an ontology analysis of a list of genes$ gget enrichr -db ontology ACE2 AGT AGTR1 ACE AGTRAP AGTR2 ACE3P# Get the human tissue expression of gene ACE2$ gget archs4 -w tissue ACE2# Get the protein structure (in PDB format) of ACE2 as stored in the Protein Data Bank (PDB ID returned by gget info)$ gget pdb 1R42 -o 1R42.pdb# Predict the protein structure of GFP from its amino acid sequence$ gget setup alphafold # setup only needs to be run once$ gget alphafold MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK

Python (Jupyter Lab / Google Colab):

import ggetgget.ref("homo_sapiens")gget.search(["ace2", "angiotensin converting enzyme 2"], "homo_sapiens")gget.info(["ENSG00000130234", "ENST00000252519"])gget.seq("ENSG00000130234", translate=True)gget.blat("MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS")gget.blast("MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS")gget.muscle("path/to/file.fa")gget.enrichr(["ACE2", "AGT", "AGTR1", "ACE", "AGTRAP", "AGTR2", "ACE3P"], database="ontology", plot=True)gget.archs4("ACE2", which="tissue")gget.pdb("1R42", save=True)gget.setup("alphafold") # setup only needs to be run oncegget.alphafold("MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK")

Callggetfrom R usingreticulate:

system("pip install gget")install.packages("reticulate")library(reticulate)gget <- import("gget")gget$ref("homo_sapiens")gget$search(list("ace2", "angiotensin converting enzyme 2"), "homo_sapiens")gget$info(list("ENSG00000130234", "ENST00000252519"))gget$seq("ENSG00000130234", translate=TRUE)gget$blat("MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS")gget$blast("MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS")gget$muscle("path/to/file.fa", out="path/to/out.afa")gget$enrichr(list("ACE2", "AGT", "AGTR1", "ACE", "AGTRAP", "AGTR2", "ACE3P"), database="ontology")gget$archs4("ACE2", which="tissue")gget$pdb("1R42", save=TRUE)

关键词: 数据分析 标准输出 生物信息学