LLMs in Focus: The Easy Guide & Your AI Glossary — 2024
“The future is not something that happens to us, but something we create.” — Vivek
Hey everyone! 👋
Imagine having a chat with a computer that really gets you. That’s what LLMs, or Large Language Models, are all about. It’s 2024, and these smart programs are changing the game. They can write stories, answer questions, and even make jokes!
I’ve been busy gathering all the latest LLMs out there and I’m putting together a guide for each one. It’s going to be super straightforward and clear, so everyone can understand.
I’ve had the privilege of being guided and mentored by Dr. Alan D. Thompson. Dr. Thompson is a renowned expert in artificial intelligence, particularly known for his work in augmenting human intelligence and advancing the evolution of ‘integrated AI’. His extensive background includes serving as the former chairman for Mensa International’s gifted families and providing AI advisory to Fortune 500 companies, major governments, and intergovernmental entities.
Dr. Thompson’s contributions to the field of AI are vast, including the creation of a complete database of large language models and authoring the world’s most comprehensive analysis of datasets used for OpenAI GPT. His technical papers and resources on LLMs are used by all major AI labs, and his insights are featured across international media.
It’s his rigorous approach to AI and his commitment to making complex concepts accessible that made Dr. Thompson an invaluable mentor for this project. His guidance has ensured that the information presented here is not only accurate but also understandable for everyone, regardless of their background in AI.
For those interested in delving deeper into Dr. Thompson’s work and his contributions to the AI community, I encourage you to visit Life Architect. His expertise has been instrumental in shaping this guide, and for that, I extend my deepest gratitude.
So, if you’re curious about these AI wonders, hang tight. I’m working on making all this information easy to digest and fun to explore.
Ready to dive into the world of LLMs? Let’s go!
Let’s Start:
Grok-1.5:
Origin: Developed by xAI.
Description: A 314 billion parameter Mixture-of-Experts model trained from scratch. Released as an open-source model.
Notable Features: Large-scale, not fine-tuned for specific applications.
Llama 3:
Origin: Developed by Meta.
Description: An open-source LLM designed to empower developers and advance safety.
Notable Features: Powering innovation through access, suitable for research and commercial use.
Meta GPT (Ajax):
Origin: Developed by Amazon.
Description: A large language model with 2 trillion parameters, aiming to compete with OpenAI’s GPT models.
Notable Features: Intended for corporate customers, powering features across Amazon services.
G3PO:
Origin: Developed by OpenAI.
Description: An open-source LLM, rumored to be in development to compete with Meta’s LLaMA and Google’s models.
Notable Features: Not yet officially released.
Arrakis (GPT-4.5?):
Origin: Not specified.
Description: Potentially an iteration of the GPT series.
Notable Features: Unknown.
Source: Not available.
Gobi (GPT-5?):
Origin: Amazon.
Description: A large language model with 2 trillion parameters, potentially one of the largest LLMs.
Notable Features: Expected to outperform Titan (AWS’s current model) and GPT-4.
GPT-5:
Origin: Not specified.
Description: The next iteration in the GPT series.
Notable Features: Unknown.
Source: Not available.
Olympus:
Origin: Amazon.
Description: Being developed for corporate customers, expected to power features across Amazon services.
Notable Features: Larger than GPT-4.
EvoLLM-JP:
Origin: Developed by Sakana AI.
Description: An experimental general-purpose Japanese LLM created using the Evolutionary Model Merge method.
Notable Features: Merged from various source models, including Shisa Gamma 7B, WizardMath 7B V1.1, and Abel 7B 002.
Parakeet:
Origin: Not specified.
Description: Details about Parakeet are limited, but it is likely a language model with specific characteristics.
Notable Features: Unknown.
Source: Not available.
RWKV-v5 EagleX:
Origin: Developed by Covariant.
Description: A Robotics Foundation Model (RFM) trained on both general internet data and rich physical real-world interactions.
Notable Features: Enhanced in-context learning, multi-image reasoning, and few-shot chain-of-thought prompting.
MM1:
Origin: Developed by Apple.
Description: A multimodal LLM trained on both text and images, aiming to achieve human-like reasoning capabilities.
Notable Features: Combines large-scale multimodal pre-training with real-world robotics data.
Source: MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training3.
RFM-1:
Origin: Developed by Covariant.
Description: A Robotics Foundation Model trained on multimodal data, bridging the gap between language understanding and physical world interaction.
Notable Features: Accurate simulation and operation in demanding real-world conditions.
Source: Introducing RFM-1: Giving robots human-like reasoning capabilities4.
DeepSeek-VL:
Origin: Developed by DeepSeek AI.
Description: An open-source Vision-Language (VL) Model designed for real-world vision and language understanding applications. It possesses general multimodal understanding capabilities, capable of processing logical diagrams, web pages, formula recognition, scientific literature, natural images, and embodied intelligence in complex scenarios.
Feature: Multimodal understanding across various data types.
Source: DeepSeek-VL on GitHub
AnyGPT:
Origin: Developed by a team of researchers.
Description: A unified multimodal LLM with discrete sequence modeling. It can handle various modalities, including speech, text, images, and music.
Feature: Discrete representations for unified processing of different modalities.
Source: AnyGPT on arXiv
Stable Beluga 2.5:
Origin: Developed by Stability AI.
Description: A fine-tuned version of LLaMA 2 70B, outperforming ChatGPT 3.5 in select benchmarks.
Feature: High performance in specific evaluation tasks.
Source: Stability AI News
Inflection-2.5:
Origin: Developed by Inflection AI.
Description: A multilingual Vision-Language Model designed for real-world vision and language understanding applications.
Feature: High efficiency, improved performance in coding and mathematics.
Source: Inflection AI
Apollo:
Origin: Developed by Anthropic.
Description: A lightweight multilingual medical LLM designed for democratizing medical AI to 6 billion people.
Feature: High comprehension and fluency on complex medical tasks.
Source: Apollo on arXiv
Claude 3 Opus:
Origin: Developed by Anthropic.
Description: An advanced model in the Claude 3 family, excelling in analysis, forecasting, content creation, and multilingual conversations.
Feature: High intelligence levels.
Source: Anthropic News
Samba-1:
Origin: Developed by SambaNova Systems.
Description: A model enabling trillion+ parameter models while delivering access control, security, and data privacy.
Feature: High accuracy, lower cost of ownership, and fine-tuning capabilities.
Source: SambaNova Systems
StarCoder 2:
Origin: Developed by BigCode.
Description: A transparently trained open code LLM, trained on The Stack v2 dataset.
Feature: Trained on 4 trillion tokens, supporting code generation.
Source: Geeky Gadgets
Megatron 530B:
Origin: Developed by NVIDIA.
Description: The world’s largest customizable language model, optimized for large-scale accelerated computing infrastructure.
Feature: Scalability and performance for sophisticated NLP models.
Source: NVIDIA News
Mistral Small:
Origin: Developed by Mistral AI.
Description: A decoder-based LLM with 7 billion parameters, optimized for low latency and cost efficiency.
Feature: Utilizes sliding window attention, grouped query attention, and byte-fallback BPE tokenizer.
Mistral Large:
Origin: Also developed by Mistral AI.
Description: A high-performance model for multilingual reasoning, text comprehension, and code generation.
Feature: Incorporates innovations like RAG-enablement and function calling.
Source: Mistral AI unveils Mistral Large.
Hanooman:
Origin: Developed by the BharatGPT group in collaboration with Seetha Mahalaxmi Healthcare (SML).
Description: Responds in 11 Indian languages, including Hindi, Tamil, and Marathi.
Feature: Designed for healthcare, governance, financial services, and education.
Source: BharatGPT group unveils ‘Hanooman’.
Reka Edge:
Origin: Reka AI.
Description: An LLM optimized for local deployments.
Feature: Outperforms comparable models like Llama 2 7B and Mistral 7B.
Source: Reka AI Releases Reka Flash.
Reka Flash:
Origin: Reka AI.
Description: An efficient and capable multimodal LLM.
Feature: Achieves competitive results against models like GPT-4, Claude, and Gemini Pro.
Source: Reka AI Releases Reka Flash.
Gemma:
Origin: Google.
Description: A family of state-of-the-art open LLMs.
Feature: Comes in 2B and 7B parameter sizes, with base and instruction-tuned variants.
Source: Introducing Gemini 1.5.
Gemini 1.5 Pro:
Origin: Google.
Description: A mid-size multimodal model optimized for scaling across various tasks.
Feature: Achieves similar performance to Gemini 1.0 Ultra while using less compute.
Source: Introducing Gemini 1.5.
Qwen-1.5:
Origin: Alibaba Cloud.
Description: Part of the Qwen AI series, with varying sizes from 0.5 billion to 72 billion parameters.
Feature: Enhancements in language model technology.
Source: Alibaba Qwen 1.5.
GOODY-2
Origin: Developed by Microsoft, GOODY-2 is an AI model built with next-gen adherence to ethical principles.
Description: It’s designed to avoid answering controversial, offensive, or problematic queries.
Feature: Unbreakable ethical adherence, ensuring conversations stay within bounds.
Source: GOODY-2
Natural-SQL-7B
Origin: Developed by ChatDB, Natural-SQL-7B excels in converting natural language queries into SQL commands.
Description: It bridges the gap between non-technical users and complex database interactions.
Feature: Trained on 980B tokens of text data from 11 Southeast Asian languages.
Source: GitHub
Sea-Lion
Origin: Developed by AI Singapore, Sea-Lion focuses on Southeast Asia’s diverse contexts and languages.
Description: It’s designed to better represent the region’s breadth of cultures and languages.
Feature: Specialized vocabulary for optimal performance on SEA languages.
Source: AI Singapore
OLMo (Open Language Model)
Origin: Developed by AI2, OLMo is intentionally designed for open research.
Description: It provides access to data, training code, models, and evaluation code.
Feature: Full pretraining data, training code, and evaluation suite.
Source: AI2
FLOR-6.3B
Origin: Developed by Aina Tech, FLOR-6.3B has 6.3 trillion parameters.
Description: It allows advanced users and developers to explore AI capabilities.
Feature: Tailored for AI needs based on local languages and requirements.
Source: Projecte Aina Tech
Weaver
Origin: Developed by AI2, Weaver is dedicated to content creation.
Description: Pre-trained on a corpus to improve writing capabilities.
Feature: Models of different sizes (Mini, Base, Pro, Ultra).
Source: Hugging Face
Miqu 70B
Origin: Developed by Meta, Miqu-1–70B is a 70 billion parameter model.
Description: Quantized to run on less than 24GB of VRAM.
Feature: More accessible to users without high-end hardware.
Source: Cheatsheet
iFlytekSpark-13B
Origin: Developed by iFlytek and Huawei, SparkDesk helps build LLMs.
Description: Allows enterprises to create exclusive LLMs.
Feature: Parameters include 13 billion, 65 billion, and 175 billion.
Source: Yicai
Xinghuo 3.5 (Spark)
Origin: Developed by Chinese scientists, Spark v3.5 surpasses GPT-3.5.
Description: Outpaces GPT-4 Turbo in language, math, coding, and multimodal tasks.
Feature: Synthesizes speech conveying different emotions and tones.
Source: Live Science
MGIE:
Origin: Developed by Apple.
Description: MGIE (MLLM-Guided Image Editing) is designed to facilitate edit instructions and provide explicit guidance for image editing. It learns to derive expressive instructions and performs manipulation through end-to-end training1.
Feature: Focuses on image editing tasks.
Source: GitHub Repository
CodeLlama-70B:
Origin: Clarifai.
Description: CodeLlama-70B-Instruct is a state-of-the-art LLM specialized in code synthesis and understanding. It represents the pinnacle of the Code Llama series with 70 billion parameters, optimized for processing and generating code based on natural language instructions23.
Feature: Code synthesis and understanding.
Source: Clarifai Model Page
RWKV-v5 Eagle 7B:
Origin: RWKV.
Description: Eagle 7B is a 7.52B parameter model built on the RWKV-v5 architecture. It ranks as the world’s greenest 7B model per token and is trained on 1.1 trillion tokens across 100+ languages. It outperforms other 7B class models in multi-lingual benchmarks4.
Feature: Multilingual capabilities and energy efficiency.
Source: Hugging Face Model Page
MaLA-500:
Origin: MaLA-LM.
Description: MaLA-500 is designed to cover an extensive range of 534 languages. It builds upon LLaMA 2 7B, integrates continued pretraining with vocabulary extension, and boasts an extended vocabulary size of 260,1645.
Feature: Multilingual adaptation and vocabulary extension.
Source: Hugging Face Model Page
MambaByte:
Origin: Mamba.
Description: MambaByte is a token-free adaptation of the Mamba state space model, trained autoregressively on byte sequences. It achieves computational efficiency compared to other byte-level models and competes with state-of-the-art subword Transformers6.
Feature: Token-free language modeling.
Source: arXiv Paper
DeepSeek-Coder:
Origin: DeepSeek-AI.
Description: DeepSeek-Coder is a series of code language models trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese. It provides various sizes of code models, ranging from 1B to 33B versions7.
Feature: Code synthesis and understanding.
Source: GitHub Repository
FuseLLM:
Origin: Meta AI.
Description: FuseLLM focuses on merging and stacking LLMs to create more powerful models. It explores fusing models from a probabilistic distribution perspective and aims to overcome individual knowledge gaps8.
Feature: Knowledge fusion across multiple LLMs.
Source: Medium Article
Fuyu-Heavy:
Origin: Adept AI.
Description: Fuyu-8B is a multimodal architecture for AI agents, trained from scratch on 2 trillion tokens in both English and Chinese. It provides a small version of the multimodal model that powers their product9.
Feature: Multimodal capabilities.
Source: Adept AI Blog
DeepSeek:
Origin: Developed by DeepSeek AI.
Description: An advanced language model with 67 billion parameters, trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese.
Features:
Outperforms Llama2 70B Base in reasoning, coding, math, and Chinese comprehension.
Proficient in coding and math (HumanEval Pass@1: 73.78, GSM8K 0-shot: 84.1, Math 0-shot: 32.6).
Mastery in Chinese language, surpassing GPT-3.5.
Source: GitHub
LLaMA Pro:
Origin: Developed by Tencent Applied Research Center (ARC).
Description: An extension of LLaMA, specifically designed for enhancing instruction tuning of Code Large Language Models (LLMs).
Features:
Improves model nonlinearity through informed activation functions.
Achieves comparable performance to benchmarks with faster inference.
Source: GitHub
TinyLlama:
Origin: Derived from Llama2 LLM.
Description: A compact 1.1B language model pretrained on around 1 trillion tokens for approximately 3 epochs.
Features:
Leverages advances like FlashAttention for computational efficiency.
Remarkable performance in downstream tasks.
Source: GitHub
DocLLM:
Origin: Developed for reasoning over visual documents.
Description: Focuses on bounding box information to incorporate spatial layout structure.
Features:
Avoids expensive image encoders.
Enhances document layout understanding.
Source: GitHub
Unified-IO 2:
Origin: Developed by BAAI.
Description: An autoregressive multimodal model capable of understanding and generating images, text, audio, and action.
Features:
Tokenizes inputs and outputs into a shared semantic space.
Achieves state-of-the-art performance on various benchmarks.
Source: GitHub
WaveCoder-DS-6.7B:
Origin: Developed by an AI research team.
Description: Fine-tuned Code LLM with widespread and versatile enhanced instruction tuning.
Feature: Demonstrates impressive generalization ability across different code-related tasks.
Source: Research Paper
YunShan:
Origin: Developed by an undisclosed organization.
Description: A multilingual open-source LLM with a focus on improving Chinese language performance.
Feature: Pre-trained on meticulously cleaned and filtered data.
Source: Research Paper
PanGu-Pi:
Origin: Developed by a research team.
Description: Enhances language model architectures via nonlinearity compensation.
Feature: Achieves state-of-the-art performance in terms of accuracy and efficiency.
Source: Research Paper
YAYI 2:
Origin: Developed by an AI research team.
Description: Multilingual open-source LLM with a focus on improving Chinese language performance.
Feature: Pre-trained on meticulously cleaned and filtered data.
Source: Research Paper
Emu2:
Origin: Developed by an AI research team.
Description: Generative multimodal model for text and images.
Feature: Seamlessly generates images and texts in multimodal context.
Source: Research Paper
MedLM:
Origin: Developed by an AI research team.
Description: Fine-tuned for the healthcare industry, answering medical questions and drafting summaries.
Feature: High-quality generative AI output in the medical domain.
Source: Google Cloud Blog
SOLAR-10.7B:
Origin: Developed by an AI research team.
Description: Advanced LLM with 10.7 billion parameters, demonstrating superior performance.
Feature: Compact yet powerful, outperforming models with up to 30B parameters.
Source: Deci Model Zoo
Mistral-medium:
Origin: Developed by Mistral AI.
Description: A groundbreaking development in artificial intelligence.
Feature: Generative text model with 7 billion parameters.
Source: Medium Article
Mixtral-8x7B-32kseqlen:
Origin: Developed by Mistral AI.
Description: Sparse mixture-of-experts LLM.
Feature: Handles up to 32k tokens of context, multilingual, and follows instructions well.
Source: News Article
DeciLM-7B:
Origin: Developed by Deci.
Description: A 7.04 billion parameter decoder-only text generation model, outperforming other 7B base models in accuracy and computational efficiency.
Feature: Utilizes variable Grouped-Query Attention (GQA) for a balance between accuracy and efficiency.
Source: Deci/DeciLM-7B
StripedHyena 7B:
Origin: Developed by Together Research.
Description: Focuses on new architectures for long context, improved training, and inference performance.
Feature: Incorporates features from recurrent, convolutional, and continuous-time models.
Source: StripedHyena-Hessian-7B
NexusRaven-V2 13B:
Origin: Developed by Together Research.
Description: Surpasses GPT-4 in zero-shot function calling, converting natural language instructions into executable code.
Feature: Generalization to unseen functions.
Source: NexusRaven-V2–13B
Gemini Ultra 1.0:
Origin: Developed by Google.
Description: Google’s most capable AI model, optimized for complex tasks.
Feature: High-quality translation, efficient scaling, and versatility.
Source: Gemini Advanced
Mamba:
Origin: Developed by Google DeepMind.
Description: Enables expressive and multilingual translations in a streaming fashion.
Feature: End-to-end expressive and multilingual translations.
Source: SeamlessM4T-v2
LVM-3B:
Origin: Developed by Meta AI.
Description: A large language model based on the Transformer architecture.
Feature: Understands and generates human-like text for various tasks.
Source: Deciphering LLMs
SeaLLM-13b:
Origin: Developed by Alibaba DAMO Academy.
Description: Inclusive AI language models for Southeast Asian languages.
Feature: Provides accurate, up-to-date, and factual responses.
Source: TechWire Asia
pplx-70b-online:
Origin: Developed by Perplexity.
Description: Online LLMs with up-to-date information from the web.
Feature: Provides helpful, factual, and fresh responses.
Source: Perplexity
SeamlessM4T-Large v2:
Origin: Developed by Seamless Communication team from Meta AI.
Description: Enables expressive and streaming speech translation.
Feature: End-to-end expressive and multilingual translations.
Source: SeamlessM4T-v2
Q-Transformer:
Origin: Not specified.
Description: Enables understanding and answering questions, chat, and language translation.
Feature: Crucial for processing and understanding text.
Source: Q-Transformers
Yuan 2.0:
Origin: Not specified.
Description: Not provided.
Feature: Not provided.
Source: Not specified.
MEDITRON
Origin: MEDITRON is a suite of open-source medical Large Language Models (LLMs).
Description: MEDITRON-70B is a 70 billion parameters model adapted to the medical domain from Llama-2–70B through continued pretraining on a comprehensively curated medical corpus, including selected PubMed articles, abstracts, and a new dataset.
Feature: It is specifically tailored for medical applications.
Source: MEDITRON-70B on Hugging Face
Transformers-Arithmetic
Origin: Not specified.
Description: Transformers-Arithmetic is a model designed for arithmetic tasks using transformer-based architecture.
Feature: It specializes in arithmetic operations.
Source: Not specified.
Starling-7B
Origin: Developed by DALL·E 3.
Description: Starling-7B is an open-source LLM trained by Reinforcement Learning from AI Feedback (RLAIF). It harnesses the power of the GPT-4 labeled ranking dataset, Nectar, and a new reward training and policy tuning pipeline.
Feature: Strong reasoning abilities and performance on various tasks.
Source: Starling-7B on Hugging Face
Inflection-2
Origin: Not specified.
Description: Inflection-2 is a successor to Inflection-1, significantly more capable. It has an upgraded knowledge base for accurate user query responses.
Feature: Improved capabilities over its predecessor.
Source: Inflection-2 details
Claude 2.1
Origin: Developed by Microsoft Research.
Description: Claude 2.1 is an advanced LLM with a 200K token context window, reduced hallucination rates, and improved accuracy.
Feature: Enhanced reasoning abilities.
Source: Claude-2.1 on Hugging Face
TÜLU 2
Origin: Not specified.
Description: TÜLU 2 is a language model with specific capabilities.
Feature: Not specified.
Source: Not specified.
Orca 2
Origin: Developed by Microsoft Research.
Description: Orca 2 is a universal backbone for multimodal learning, addressing task diversity in computer vision and vision-language tasks.
Feature: Versatility across vision tasks.
Source: Orca-2–7b on Hugging Face
Phi-2
Origin: Not specified.
Description: Phi-2 is a small language model (SLM) with 2.7 billion parameters.
Feature: Improved reasoning abilities.
Source: Phi-2 on Hugging Face
Florence-2
Origin: Developed by Microsoft.
Description: Florence-2 is a vision foundation model addressing task diversity in computer vision and vision-language tasks.
Feature: Unified representation for various vision tasks.
Source: Florence-2 details
Mirasol3B
Origin: Developed by Microsoft Research.
Description: Mirasol3B is a multimodal autoregressive model for time-aligned and contextual modalities.
Feature: Combines audio, video, and text modalities.
Source: Mirasol3B on Hugging Face
OtterHD-8B:
Origin: Developed by researchers at Korea Superconducting Tokamak Advanced Research (KSTAR).
Description: OtterHD-8B is an innovative multimodal model evolved from Fuyu-8B. It’s engineered to interpret high-resolution visual inputs with granular precision. Unlike conventional models, OtterHD-8B can handle flexible input dimensions.
Feature: It excels in discerning minute details and spatial relationships of small objects.
Source: ArXiv preprint
Gauss:
Origin: Developed by Samsung Electronics.
Description: Gauss is named after Carl Friedrich Gauss, a 19th-century German mathematician. It’s designed to answer questions with a touch of humor and suggest relevant questions.
Feature: Gauss Language (text generation), Gauss Code (code generation), and Gauss Image (image generation).
Source: Computerworld
Grok-1:
Origin: Developed by xAI.
Description: Grok-1 is a mixture-of-experts (MoE) LLM with 314B parameters. It excels in reasoning and coding tasks.
Feature: Supports long contexts (up to 2 million Chinese characters).
Source: Prompt Engineering Guide
Yi-34B:
Origin: Developed by 01.AI.
Description: Yi-34B is a bilingual embedding model with 314B parameters.
Feature: Supports up to 200K context window.
Source: NVIDIA NGC
GPT-4 Turbo:
Origin: Developed by OpenAI.
Description: GPT-4 Turbo is an advanced LLM that rivals GPT-4. It’s more efficient and cost-effective.
Feature: 128K context window.
Source: Internet Public Library
Kimi Chat:
Origin: Developed by Moonshot AI.
Description: Kimi is a Chinese ChatGPT, designed for understanding complex questions and generating coherent answers.
Feature: Understands long messages (up to 2 million Chinese characters).
Source: Dataconomy
ERNIE 4.0:
Origin: Developed by Baidu.
Description: ERNIE 4.0 is an NLP model that understands complex questions, applies logic, remembers information, and generates relevant answers.
Feature: Strong reasoning and generation capabilities.
Source: Medium
Fuyu:
Origin: Developed by Moonshot AI.
Description: Fuyu-8B is a unique foundation model that simplifies handling complex visual data.
Feature: Supports arbitrary image resolutions.
Source: Towards AI
ERNIE 4.0:
Origin: Developed by Baidu.
Description: ERNIE 4.0 has improved core capacities in understanding, generation, reasoning, and memory.
Feature: Demonstrated capabilities in generative skills and reasoning abilities.
Source: Shanghaiist
Zephyr:
Origin: Developed by Hugging Face, Zephyr is part of a series of language models trained to act as helpful assistants.
Description: Zephyr-7B-α, the first model in the series, is fine-tuned on publicly available synthetic datasets using Direct Preference Optimization (DPO).
Feature: A 7B parameter GPT-like model primarily trained on English.
Source: GitHub Repository | Demo
PaLI-3:
Origin: Developed by Hugging Face, PaLI-3 is a multilingual language-image model.
Description: It generates text based on visual and textual inputs, performing vision, language, and multimodal tasks.
Feature: Jointly scales language and vision components.
Source: GitHub Repository | Demo
Retro 48B:
Origin: Introduced by NVIDIA, Retro 48B surpasses the original GPT model in perplexity.
Description: After instruction tuning (InstructRetro), it enhances zero-shot question answering.
Feature: Pre-trained with retrieval, suitable for QA tasks.
Source: MarkTechPost
Ferret:
Origin: Developed by Apple Inc. in collaboration with Cornell University.
Description: Ferret integrates language understanding with image analysis.
Feature: Understands and interprets both text and visual data.
Source: AIToolMall
Lemur:
Origin: Open Foundation Models for Language Agents.
Description: Lemur is optimized for both natural language and coding capabilities.
Feature: Serves as the backbone of versatile language agents.
Source: GitHub Repository
AceGPT:
Origin: Developed for Arabic language by Wayve.
Description: Sets the state-of-the-art standard for open Arabic LLMs across various benchmarks.
Feature: Optimized for Arabic language understanding.
Source: Papers with Code
GAIA-1:
Origin: Wayve’s generative world model for autonomous driving.
Description: Generates realistic driving videos, offering fine-grained control over ego-vehicle behavior and scene features.
Feature: Incorporates video, text, and action inputs.
Source: Wayve Blog
MotionLM:
Origin: Developed for multi-agent motion forecasting.
Description: Represents continuous trajectories as sequences of discrete motion tokens.
Feature: Casts multi-agent motion prediction as a language modeling task.
Source: arXiv
Yasa-1
Origin: Developed by Reka AI.
Description: Yasa-1 is a multimodal assistant with visual and auditory sensors that can take actions via code execution. It supports long-context document processing, fast retrieval augmented generation, multilingual support (20 languages), and a search engine interface.
Source: Reka AI Blog
RT-X
Origin: NVIDIA.
Description: RT-X is a demo app that lets you personalize a GPT large language model (LLM) connected to your own content — docs, notes, or other data. It uses retrieval-augmented generation (RAG) and TensorRT-LLM for contextually relevant answers.
Source: NVIDIA
Qwen
Origin: Developed by Alibaba Group.
Description: Qwen is a large language model (LLM) pretrained on large-scale multilingual and multimodal data. It has been upgraded to Qwen1.5 and excels in understanding and generating human-like text.
Source: Qwen Documentation
Llama 2 Long
Origin: ThirdAI.
Description: Llama 2 is a family of pre-trained and fine-tuned large language models (LLMs) released by Meta AI in 2023. It is capable of various natural language processing tasks.
Source: IBM
LeoLM
Origin: Not specified.
Description: LeoLM is an LLM engineered for superior performance and efficiency. It leverages grouped-query attention (GQA) and sliding window attention (SWA) for effective handling of sequences of arbitrary length.
Source: Medium
Mistral 7B
Origin: Developed by ThirdAI.
Description: Mistral 7B is a multimodal literate model for machine reading of text-intensive images. It excels in generating spatially-aware text blocks and producing structured text output.
Source: arXiv
Kosmos-2.5
Origin: Developed by ThirdAI.
Description: Kosmos-2.5 is a multimodal literate model trained on large-scale text-intensive images. It performs well in transcription tasks and vertical domains like medicine and law.
Source: ThirdAI
Baichuan 2
Origin: Developed by ThirdAI.
Description: Baichuan 2 is a series of large-scale multilingual language models containing 7 billion and 13 billion parameters. It matches or outperforms other open-source models on various benchmarks.
Source: Hugging Face
BOLT2.5B
Origin: Deci AI.
Description: BOLT2.5B is a CPU-only pre-trained 2.5-billion parameter Generative LLM. It achieves remarkable performance without GPU involvement.
Source: Deci AI
DeciLM
Origin: Deci AI.
Description: DeciLM-7B is a super-fast and super-accurate 7-billion-parameter LLM. It is licensed under Apache 2.0 and represents a transformative force in language processing.
Source: MarkTechPost
MoLM:
Origin: Not specified.
Description: MoLM is an LLM, but further details are not available.
Feature: Not specified.
Source: Not specified.
NExT-GPT:
Origin: Developed by the National University of Singapore.
Description: NExT-GPT is an any-to-any multimodal LLM that perceives input and generates output in arbitrary combinations of text, image, video, and audio.
Feature: Supports various modalities and is tuned with only a small amount of parameters.
Source: GitHub
Phi-1.5:
Origin: Not specified.
Description: Phi-1.5 is a Transformer-based LLM with 1.3 billion parameters, trained on a mix of data sources.
Feature: Achieves nearly state-of-the-art performance among models with less than 10 billion parameters.
Source: SuperAnnotate
UniLM:
Origin: Developed by Microsoft Research.
Description: UniLM is a unified pre-trained LLM for various NLP tasks.
Feature: Supports both single-sentence and document-level tasks.
Source: GitHub
Persimmon-8B:
Origin: Developed by Adept.
Description: Persimmon-8B is an open-source LLM with a 16k token context, outperforming known foundation models.
Feature: Handles longer prompts and addresses complex tasks.
Source: Neurohive
FLM-101B:
Origin: Developed by CofeAI.
Description: FLM-101B is a 101B-parameter LLM that achieves performance comparable to GPT-4 and LLaMA 2.
Feature: Cost-effective training and supports both Chinese and English.
Source: Hugging Face
Hunyuan:
Origin: Developed by Tencent.
Description: Hunyuan is a proprietary LLM with over 100 billion parameters and significant Mandarin comprehension capabilities.
Feature: Integrated into Tencent’s business lines.
Source: Tencent
phi-CTNL:
Origin: Satirical model.
Description: phi-CTNL is a fictional 1 million parameter transformer-based LLM.
Feature: Achieves perfect results across diverse academic benchmarks.
Source: Emergent Mind
Falcon 180B:
Origin: Developed by TII (ThirdAI).
Description: Falcon 180B is the largest openly available language model, boasting 180 billion parameters. It was trained on a massive 3.5 trillion tokens using TII’s RefinedWeb dataset, representing the longest single-epoch pretraining for an open model.
Performance:
Outperforms Llama 2 70B and OpenAI’s GPT-3.5 on various benchmarks.
Comparable to Google’s PaLM 2-Large (Bard) and GPT-4.
Capabilities: Achieves state-of-the-art results across natural language tasks.
License: Permits commercial usage with certain restrictions.
Hardware Requirements: Inference requires 640GB of memory (quantized to half-precision FP16) or 320GB (quantized to int4).
Source: Hugging Face Blog
Jais:
Origin: Developed by researchers at Korea Superconducting Tokamak Advanced Research (KSTAR).
Description: A powerful large language model (LLM) with 13 billion parameters, built upon the foundational architecture of GPT-3. It seamlessly integrates English text and code while bolstering Arabic language capabilities.
Feature: Sustains a nuclear fusion reaction for 30 seconds at temperatures exceeding 100 million°C.
Code Llama 34B:
Origin: Built on Code Llama, an LLM developed by Meta.
Description: Specialized in generating code and natural language about code from both code and natural language prompts.
Feature: Combines the best of RNN and transformer, offering great performance, fast inference, and efficient training.
IDEFICS:
Origin: Based on Flamingo, a visual language model developed by DeepMind.
Description: A multimodal model that accepts arbitrary sequences of image and text inputs, generating coherent text as output. Used for image-text tasks.
Feature: Answers questions about images, describes visual content, and creates stories grounded in multiple images.
Raven:
Origin: Developed by RWKV.
Description: An RNN with transformer-level LLM performance, combining the best of RNN and transformer.
Feature: Great performance, fast inference, and free sentence embedding.
DukunLM:
Origin: Developed by Boston University.
Description: Refines LLMs using a small amount of data to improve performance on specific tasks.
Feature: Corrects LLM predictions by training a smaller model to predict errors made by the LLM.
WizardLM:
Origin: Developed by nlpxucan.
Description: Empowers LLMs to follow complex instructions. Suitable for chat-like applications and code generation.
Feature: Superior performance in quantitative LLM metrics.
Japanese StableLM Alpha 7B:
Origin: Developed by Stability AI.
Description: Pre-trained on diverse Japanese and English datasets, maximizing Japanese language modeling performance.
Feature: Handles large amounts of code, making it ideal for learning and coding tasks.
StableCode:
Origin: Developed by Stability AI.
Description: Assists programmers with daily work, emphasizing code quality and correctness.
Feature: Provides autocomplete suggestions and handles larger code contexts.
Med-Flamingo:
Origin: Developed by researchers at Stanford University.
Description: A multimodal medical few-shot learner adapted to the medical domain. Pre-trained on paired and interleaved medical image-text data from publications and textbooks.
Features: Enables few-shot generative medical visual question answering (VQA) and rationale generation.
Alfred-40B-0723:
Origin: Developed by LightOn.
Description: An open-source language model based on Falcon-40B, designed for seamless integration of Generative AI into business workflows.
Features: Prompt engineering, no-code application development, and classic LLM tasks.
LLaMA-2–7B-32K:
Origin: Developed by Together.
Description: An open-source long-context language model fine-tuned from Meta’s original LLaMA-2 7B model.
Features: High-quality answers to medical questions.
Med-PaLM M:
Origin: Developed by Google Research.
Description: A large language model designed for the medical domain.
Features: High-quality answers to medical questions, surpassing the pass mark in the U.S. Medical Licensing Examination (USMLE)-style questions.
BTLM-3B-8K:
Origin: Developed by Cerebras Systems.
Description: A 3 billion parameter language model with an 8k context length trained on 627B tokens of SlimPajama.
Features: Licensed for commercial use, state-of-the-art performance, and supports 8k sequence length.
Stable Beluga 2:
Origin: Developed by Stability AI.
Description: Built upon the LLaMA 2 70B foundation model, achieving industry-leading performance.
Features: Exceptional reasoning ability, understanding linguistic subtleties, and answering complex questions.
Meta-Transformer:
Origin: Developed for scalable automatic modulation classification (AMC) tasks.
Description: A meta-learning framework based on few-shot learning (FSL) to acquire general knowledge.
Features: Identifies new unseen modulations using very few samples.
Llama 2:
Origin: Developed by Meta AI in 2023.
Description: A family of pre-trained and fine-tuned LLMs, released for research and commercial use. Capable of various natural language processing tasks, from text generation to programming code.
Feature: Offers base foundation models and fine-tuned “chat” models, available with 7 billion (7B), 13 billion (13B), or 70 billion (70B) parameters.
WormGPT:
Origin: Created by a Portuguese programmer and initially sold on HackForums.
Description: A malicious AI chatbot built on the open-source GPT-J LLM, designed for cybercrime activities.
Feature: Allows users to engage in illegal activities and create ransomware, phishing scams, etc.
Claude 2:
Origin: Developed by Anthropic.
Description: A language model with improved performance, longer responses, and the ability to handle input sequences up to 8,000 tokens.
Feature: Supports coding, math, reasoning, and fine-tuning.
Source: 78
LongLLaMA:
Origin: Fine-tuned with the Focused Transformer (FoT) method based on OpenLLaMA.
Description: Designed for long sequence modeling tasks.
Feature: Handles input sequences up to 8,000 tokens.
xTrimoPGLM:
Origin: Developed by Biomap in partnership with Tsinghua University.
Description: Unified 100B-scale pre-trained transformer for deciphering the language of proteins.
Feature: Trained on an extended sequence length of 8K, supporting protein understanding and generation tasks.
XGen:
Origin: Reka AI.
Description: A powerful LLM designed for long sequence modeling tasks.
Feature: Handles input sequences up to 8,000 tokens.
Zhinao (Intellectual Brain):
Origin: Developed by Refael Tikochinski et al.
Description: An incremental LLM for long text processing in the brain.
Feature: Integrates short contextual windows and summarizes information across long timescales.
Yasa:
Origin: Developed by Reka AI.
Description: A multimodal AI assistant with visual and auditory sensors.
Feature: Understands videos, audio, executes code, and supports long-context document processing.
Kosmos-2:
Origin: Developed by researchers at Korea Superconducting Tokamak Advanced Research (KSTAR).
Description: A Multimodal Large Language Model (MLLM) that integrates grounding and referring capabilities. It can perceive object descriptions (e.g., bounding boxes) and ground text to the visual world.
Features: Multimodal grounding, referring expression comprehension, perception-language tasks, and language understanding/generation.
Source: Kosmos-2 on arXiv
AudioPaLM:
Origin: Introduced by Google.
Description: A unified LLM combining text-based language models and audio prompting techniques. Excels in speech understanding and generation tasks, including voice recognition and voice-to-text conversion.
Features: Speech-related tasks.
Source: AudioPaLM on Times of India
Inflection-1:
Origin: Developed by Inflection AI.
Description: A best-in-class LLM with strong performance across various tasks. Trained using thousands of NVIDIA H100 GPUs.
Features: Multitask language understanding, including academic knowledge, reasoning, math, code, and more.
Source: Inflection-1 on Inflection AI
Phi-1:
Origin: Microsoft’s LLM specialized in Python coding tasks.
Description: Smaller-sized model compared to competing ones, focused on Python coding.
Features: Python code interpretation and data analysis.
Source: Phi-1 on OpenDataScience
InternLM:
Origin: Developed by Magic.
Description: Prototype neural network architecture with a 5,000,000 token context window. Can cover most code repositories.
Features: Large context windows for AI programming.
Source: LTM-1 on Magic
PassGPT:
Origin: Google AI.
Description: Trained on password leaks for password generation. Outperforms existing methods based on generative adversarial networks (GAN).
Features: Password modeling and guided password generation.
Source: PassGPT on arXiv
BlenderBot 3x:
Origin: Developed by Facebook AI.
Description: An update on the conversational model BlenderBot 3, trained using organic conversation and feedback data from participating users. It aims to improve both skills and safety.
Feature: Produces safer responses in challenging situations.
Orca (Logical and Linguistic Model):
Origin: Developed by Microsoft.
Description: A language model that combines logical reasoning and linguistic understanding. It uses the process of software development as training data.
Feature: Enhanced reasoning abilities by learning from developer activities.
PassGPT:
Origin: Developed by Magic.
Description: Trained on password leaks for password generation. Outperforms existing methods based on generative adversarial networks (GAN).
Feature: Guesses twice as many previously unseen passwords.
LTM-1 (Longterm Memory):
Origin: Developed by Magic.
Description: A Large Language Model with a 5,000,000 token context window. Enables AI to reference vast amounts of context when generating suggestions.
Feature: Can see an entire code repository, making it powerful for AI programming.
GPT-4 MathMix:
Origin: Developed by OpenAI.
Description: A Transformer-based model pre-trained to predict the next token in a document. Post-training alignment process improves factuality and adherence to desired behavior.
Feature: Improved performance on measures of factuality and behavior.
PandaGPT:
Origin: Developed by Meta AI.
Description: Empowers large language models with visual and auditory instruction-following capabilities. Can perform tasks like image description generation, writing stories inspired by videos, and answering audio-related questions.
Feature: Multimodal inputs and natural composition of semantics.
Falcon:
Origin: Developed by Technology Innovation Institute (UAE).
Description: LLaMa-based model fine-tuned using QLoRA on carefully curated prompts and responses. Achieves strong performance on various tasks.
Feature: Comparable to GPT-4 and outperforms instruction-tuned models like Vicuna.
202305-refact2b-mqa-lion:
Origin: Developed by Refact AI.
Description: A 65B parameter LLaMa language model fine-tuned with standard supervised loss on only 1,000 curated prompts and responses.
Feature: High-quality output with minimal instruction tuning data.
Source: LIMA on arXiv4.
Guanaco:
Origin: Developed by Meta AI.
Description: LLaMa-based model fine-tuned using QLoRA on 1K data points. Achieves competitive results with top LLMs.
Feature: Strong performance on Vicuna benchmark.
LIMA:
Origin: Developed by Refact AI.
Description: A 65B parameter LLaMa language model fine-tuned with standard supervised loss on only 1,000 curated prompts and responses.
Feature: High-quality output with minimal instruction tuning data.
Formosa (FFM):
Origin: Developed by TWS (Taiwan’s leading AI company).
Description: A large language model powered by the Taiwania 2 supercomputer with an impressive scale of 176 billion parameters.
Feature: Comprehends and generates text with traditional Chinese semantics, offering enterprise-level generative AI solutions.
Source: TWS Showcases Enterprise-level Large-scale Traditional Chinese Language Models2.
CodeT5+:
Origin: Developed by Google AI.
Description: An encoder-decoder LLM for code understanding and generation. Combines flexible architecture, diverse pretraining objectives, and efficient scaling.
Feature: Excels at advanced reasoning, multilingual translation, and coding tasks.
Source: CodeT5+: Open Code Large Language Models for Code Understanding and Generation.
PaLM 2:
Origin: Developed by Google AI.
Description: A state-of-the-art language model with improved multilingual, reasoning, and coding capabilities.
Feature: Strong performance in understanding nuanced text, logic, common sense reasoning, mathematics, and code generation.
Source: Google AI PaLM 24.
StarCoder:
Origin: Developed by NVIDIA.
Description: A Large Language Model for Code (Code LLM) trained on permissively licensed data from GitHub, including over 80 programming languages, Git commits, GitHub issues, and Jupyter notebooks.
Feature: Outperforms existing open Code LLMs on popular programming benchmarks.
GPT-2B-001:
Origin: Developed by NVIDIA, GPT-2B-001 is a transformer-based language model.
Description: It belongs to the GPT-2 family and has 2 billion trainable parameters.
Features:
SwiGLU activation function.
Rotary positional embeddings (RoPE).
Maximum sequence length of 4,096 tokens.
No dropout, no bias terms in linear layers.
Source: Hugging Face
Titan:
Origin: Developed by Amazon, Titan Text is a family of proprietary large language models (LLMs) designed for enterprise use cases.
Description: Titan Text LLMs generate text output in response to a given prompt.
Features:
Content creation, summarization, information extraction, and question answering.
Built-in support for responsible AI.
Easy customization through fine-tuning.
Source: Amazon Titan Text
WizardLM:
Origin: Developed by Stability AI, WizardLM is a series of language models.
Description: WizardLM enhances LLMs to follow complex instructions.
Features:
Evolved instructions for training.
Fine-tuned variants for specific tasks (e.g., coding, chat, storywriting).
Source: GitHub
MPT (MosaicPretrainedTransformer):
Origin: Developed by MosaicML, MPT models are pre-trained on 1T tokens.
Description: GPT-style decoder-only transformers with performance optimizations.
Features:
ALiBi for context length flexibility.
Efficient training and inference.
Source: Hugging Face
StableLM:
Origin: Developed by Stability AI, StableLM focuses on stability, compactness, and efficiency.
Description: Open-source LLMs with 3B and 7B parameters.
Features:
Efficient training and handling long inputs.
Instruction fine-tuning variants (e.g., chat, coding).
Source: GitHub
Dolly 2.0:
Origin: Developed by Databricks, an AI company.
Description: Dolly 2.0 is the first open-source, instruction-following LLM, fine-tuned on a human-generated instruction dataset licensed for research and commercial use.
Feature: It can generate human-like responses based on instructions and exhibits ChatGPT-like interactivity.
Source: Dolly Blog Post
Pythia:
Origin: Developed by EleutherAI.
Description: Pythia is a suite of 16 LLMs for in-depth research, spanning various model sizes.
Feature: It enables analysis of large language models across training and scaling.
Source: Pythia Paper
Koala-13B:
Origin: Developed by Berkeley AI Research.
Description: Koala is a chatbot trained by fine-tuning Meta’s LLaMA on dialogue data gathered from the web.
Feature: It can effectively respond to user queries and outperforms Stanford’s Alpaca in some cases.
Source: Koala Blog Post
BloombergGPT:
Origin: Developed by Bloomberg LP.
Description: BloombergGPT is a 50-billion parameter LLM, purpose-built from scratch for finance.
Feature: It outperforms similarly-sized open models on financial NLP tasks.
Source: BloombergGPT Press Release
OpenFlamingo-9B:
Origin: An open-source replication of DeepMind’s Flamingo models.
Description: OpenFlamingo is a family of autoregressive vision-language models ranging from 3B to 9B parameters.
Feature: It combines vision and language understanding for various tasks.
Source: OpenFlamingo Paper
GPT4All-LoRa:
Origin: Developed by Nomic AI.
Description: An auto-regressive language model based on the transformer architecture and fine-tuned from LLaMA.
Features: English language support, GPL-3.0 license.
Cerebras-GPT:
Origin: Developed by Google Research.
Description: A family of GPT-3-like models scaled from 111M to 13B parameters, trained on the Eleuther Pile dataset.
Features: Multimodal (accepts image and text inputs, emits text outputs).
PanGu-Sigma:
Origin: Developed by a team of researchers.
Description: A trillion-parameter language model trained on Ascend 910 AI processors using sparse heterogeneous computing.
Features: Sparse heterogeneity, strong performance in zero-shot learning for Chinese NLP tasks.
CoLT5:
Origin: Developed by a team of researchers.
Description: A long-input Transformer model with conditional computation for handling long documents efficiently.
Features: Enhanced text generation capabilities, optimal balance of quality, speed, and cost.
Med-PaLM 2:
Origin: Developed by Google Research.
Description: A Large Language Model (LLM) designed specifically for the medical domain.
Features: High-quality answers to medical questions.
GPT-4:
Origin: Developed by OpenAI.
Description: A large multimodal model exhibiting human-level performance on professional and academic benchmarks.
Features: Improved factuality, steerability, and safety.
Kosmos-1:
Origin: Launched by the Soviet Union in 1962.
Description: A technology demonstration and ionospheric research satellite.
Features: Ionospheric research and technology testing.
LLaMA-65B:
Origin: Developed by Google Research.
Description: LLaMA is a collection of foundation language models ranging from 7B to 65B parameters. It is trained on publicly available datasets exclusively, without proprietary data.
Features: LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with other large models.
Source: LLaMA on arXiv
MOSS:
Origin: Ancient archaeological site located in modern-day Syria.
Description: Originally founded near a fertile natural oasis, it became a leading city of the Near East and a major trading post on the Silk Road.
Features: Historical significance, ancient ruins.
Source: History.com
Luminous Supreme Control:
Origin: Developed by Aleph Alpha.
Description: Part of the Luminous series, a family of large language models ranging from 7B to 65B parameters. Luminous models are multimodal, handling both text and images.
Features: Multimodal capabilities, fine-tuned for specific use-cases.
Source: Aleph Alpha API
Toolformer+Atlas 11B+NLLB 54B:
Origin: Developed by Meta AI.
Description: Toolformer is trained to use external tools via simple APIs. Atlas 11B and NLLB 54B are part of the Toolformer framework.
Features: Self-supervised learning to decide which APIs to call, when to call them, and how to incorporate results into token prediction.
Source: Toolformer on arXiv
Multimodal-CoT:
Origin: Proposed by Zhuosheng Zhang et al.
Description: Incorporates both language (text) and vision (images) modalities into a two-stage framework for reasoning.
Features: Improved performance on ScienceQA benchmark, surpassing human performance.
Source: Multimodal-CoT on arXiv
FLAME:
Origin: Rapid oxidation of a material in the exothermic chemical process of combustion.
Description: Flames are visible, gaseous parts of a fire that give off heat and light.
Features: Heat conduction and diffusion theories explain flame propagation.
Source: Britannica
Med-PaLM 1:
Origin: Developed by Google Research.
Description: Large language model designed for medical questions, surpassing the pass mark in the U.S. Medical Licensing Examination (USMLE) style questions.
Features: Accurate answers to medical queries, long-form answers to consumer health questions.
Source: Med-PaLM
OPT-IML:
Origin: Research model.
Description: Scaled language model instruction meta-learning for medical applications.
Features: Improved zero-shot performance on USMLE-style questions.
Source: OPT-IML on arXiv
RL-CAI:
Origin: RL-CAI is a 175B language assistant fine-tuned using reinforcement learning with human feedback helpfulness data and AI feedback harmlessness data.
Description: RL-CAI is designed to follow instructions in a prompt and provide detailed responses.
Features:
Responds concisely and accurately, even in zero-shot scenarios.
Supports a longer context window (max prompt+completion length) compared to other models.
Trained on a large-scale, real-world robotics dataset.
Source: OpenAI
ERNIE-Code:
Origin: ERNIE-Code is a unified pre-trained language model for 116 natural languages (NLs) and 6 programming languages (PLs).
Description: ERNIE-Code bridges the gap between multilingual NLs and multilingual PLs for large language models.
Features:
Universal cross-lingual pre-training using span-corruption language modeling and pivot-based translation language modeling.
Outperforms previous multilingual LLMs for PL or NL across various code intelligence tasks.
Source: arXiv
RT-1:
Origin: RT-1 was an early intercontinental ballistic missile design tested but not deployed by the Soviet Union during the Cold War.
Description: RT-1 was not assigned a NATO reporting name and did not enter service.
Features:
Mass: 35,500 kg
Length: 18.3 m
Diameter: 2 m
Warhead: 600–1000 kt nuclear warhead
Source: Wikipedia
ChatGPT (gpt-3.5-turbo):
Origin: ChatGPT is a sibling model to InstructGPT, designed for conversational interactions.
Description: ChatGPT interacts in a conversational way, answering follow-up questions, admitting mistakes, and challenging incorrect premises.
Features:
Dialogue format for natural conversations.
Fine-tuned using reinforcement learning from human feedback.
Available for free during the research preview.
Source: OpenAI
GPT-JT:
Origin: GPT-JT (6B) is a variant forked off GPT-J (6B), fine-tuned on 3.53 billion tokens.
Description: GPT-JT performs exceptionally well on text classification and other tasks.
Features:
Supports a longer context window than GPT-J.
Trained on a more recent dataset.
Source: Together AI
RWKV-4
Origin: RWKV Language Model is an RNN with transformer-level performance.
Description: It combines the best of RNN and transformer architectures, offering great performance, fast inference, and training while saving VRAM.
Features: RWKV-4 is attention-free, supports “infinite” context length, and provides free text embedding.
Source: RWKV Official Website
Galactica
Origin: Developed by researchers and detailed in a paper on arXiv.
Description: A large language model focused on scientific knowledge, capable of tasks such as citation prediction and mathematical reasoning.
Features: Outperforms existing models on technical knowledge probes and reasoning tasks.
Source: arXiv Paper
SED
Origin: SED-ML is an XML-based format for encoding simulation setups.
Description: It ensures the exchangeability and reproducibility of simulation experiments.
Features: Not a language model but a tool for describing simulations in a standardized format.
Source: SED-ML Website
mT0
Origin: Part of the BLOOM & mT5 family of models.
Description: A multilingual language model fine-tuned on crosslingual task mixtures.
Features: Capable of following human instructions in dozens of languages zero-shot.
Source: Hugging Face Model Page
BLOOMZ
Origin: An enhanced version of the BLOOM model, fine-tuned on a mixture of tasks.
Description: It’s part of the BLOOM & mT0 model family, recommended for prompting in English.
Features: Strong few-shot performance even compared to larger models.
Source: Hugging Face Model Page
PACT
Origin: PACT is a smart contract language purpose-built for blockchains.
Description: It facilitates transactional logic with a mix of functionality in authorization, data management, and workflow.
Features: Open-source, Turing-incomplete language.
Source: GitHub Repository
Flan-T5
Origin: An enhanced version of T5, fine-tuned on a mixture of tasks.
Description: FLAN-T5 can be used directly without further fine-tuning.
Features: Improved performance on a variety of NLP tasks.
Source: Hugging Face Documentation
Flan-PaLM
Origin: Google Research’s approach to more generalizable language models.
Description: Fine-tuned on a large set of varied instructions for better task-solving abilities.
Features: Improved generalization to unseen tasks8.
Source: Google Research Blog
U-PaLM
Origin: Google AI’s next-generation large language model.
Description: Excels at advanced reasoning tasks, including code and math, classification, and question answering.
Features: Pre-trained on parallel multilingual text and a larger corpus of different languages.
Source: Google AI Page
VIMA
Origin: Officially introduced in an ICML’23 paper.
Description: A robot manipulation model with multimodal prompts, combining textual and visual tokens.
Features: Encodes input sequences and decodes robot control actions autoregressively.
Source: GitHub Repository
OpenChat
Origin: Developed by a student team at Tsinghua University.
Description: OpenChat is an open-source library of language models fine-tuned with C-RLFT, learning from mixed-quality data.
Features: It has two modes: Coding and Generalist, and supports mathematical reasoning.
Source: Hugging Face
WeLM
Origin: Presented in a paper on arXiv.
Description: A well-read pre-trained language model for Chinese, capable of zero or few-shot demonstrations.
Features: Outperforms existing models on monolingual tasks and exhibits strong multi-lingual capabilities.
Source: arXiv Paper
CodeGeeX
Origin: Introduced by Tsinghua University.
Description: A multilingual code generation model pre-trained on various programming languages.
Features: Offers code generation, translation, and comment generation capabilities.
Source: Tsinghua University
Sparrow
Origin: Developed by DeepMind4.
Description: A dialogue model that aims to be helpful, correct, and harmless.
Features: Follows 23 rules during dialogue to ensure safety and appropriateness.
Source: DeepMind
PaLI
Origin: Introduced by Google Research.
Description: A jointly-scaled multilingual language-image model that generates text based on visual and textual inputs.
Features: Performs vision, language, and multimodal tasks across many languages.
Source: Google Research
NeMo Megatron-GPT 20B
Origin: Developed by NVIDIA.
Description: A transformer-based language model with 20 billion parameters.
Features: Part of the NeMo Megatron series, optimized for large-scale language tasks.
Source: Hugging Face
Z-Code++
Origin: Created by Microsoft Azure AI and Microsoft Research.
Description: Optimized for abstractive summarization, it extends encoder-decoder models with new techniques.
Features: Outperforms larger models on summarization tasks and offers bilingual support.
Source: arXiv
Atlas
Origin: Developed by Facebook Research.
Description: A retrieval-augmented language model that excels in few-shot learning.
Features: Demonstrates strong performance on knowledge-intensive tasks.
Source: GitHub
BlenderBot 3
Origin: Released by Meta AI.
Description: A publicly available chatbot with 175 billion parameters that improves over time.
Features: Capable of internet searches and conversational long-term memory.
Source: Meta AI Blog
GLM-130B
Origin: Developed by Tsinghua University and other collaborators.
Description: A bilingual (English & Chinese) pre-trained model with 130 billion parameters.
Features: Supports fast inference and is quantization-friendly for efficient use on consumer GPUs.
Source: GitHub
AlexaTM 20B
Origin: Developed by Amazon.
Description: A large-scale multilingual seq2seq language model designed for few-shot learning.
Features: Outperforms GPT-3 in zero-shot learning tasks and excels in 1-shot summarization and machine translation, especially for low-resource languages.
Source: Amazon SageMaker JumpStart
6.9B FIM
Origin: Introduced by OpenAI.
Description: An autoregressive language model trained to fill in the middle of texts, maintaining left-to-right generative capabilities.
Features: Efficient training with a large fraction of data transformed for infill tasks without harming original generative performance.
Source: OpenAI Research
‘monorepo-Transformer’
Origin: The term ‘monorepo’ refers to a development strategy that involves a single repository containing multiple projects.
Description: While not a specific language model, the term may relate to training transformer models on a monorepo, which is a large, single codebase containing many projects.
Features: Benefits include simplified dependency management and easier collaboration across different projects within the same repository.
Source: Google Research Blog
PanGu-Coder
Origin: Developed by researchers and detailed in a paper on arXiv.
Description: A pretrained decoder-only language model designed for text-to-code generation.
Features: Demonstrates equivalent or better performance than similarly sized models like CodeX, while attending a smaller context window and training on less data.
Source: arXiv Paper
NLLB
Origin: Created by Meta AI.
Description: A machine translation model capable of translating sentences between any of the 202 language varieties.
Features: Aims to provide high-quality translations directly between 200 languages, including low-resource languages.
Source: Meta AI Research
J-1 RBG
Origin: Released by AI21.
Description: A high-quality, affordable language model with 17B parameters.
Features: Offers supreme quality text generation at a more affordable rate compared to larger models.
Source: AI21 Studio
BLOOM (tr11–176B-ml)
Origin: A collaboration of hundreds of researchers.
Description: An open-access multilingual language model with 176B parameters.
Features: Outputs coherent text in 46 languages and 13 programming languages.
Source: Hugging Face
Minerva
Origin: Introduced by Google Research.
Description: A language model capable of solving mathematical and scientific questions using step-by-step reasoning.
Features: Achieves state-of-the-art performance on technical benchmarks without the use of external tools.
Source: Google Research Blog
GODEL-XL
Origin: Developed by Microsoft.
Description: A large-scale pre-trained model for goal-directed dialog.
Features: Trained on multi-turn dialogs and instruction and knowledge grounded dialogs.
Source: GitHub Repository
YaLM 100B
Origin: Developed by Yandex.
Description: A GPT-like neural network for generating and processing text with 100 billion parameters.
Features: Can be used freely by developers and researchers worldwide.
Source: GitHub Repository
Unified-IO
Origin: Proposed by AI2.
Description: A unified model for vision, language, and multi-modal tasks.
Features: Performs a large variety of AI tasks spanning classical computer vision tasks, vision-and-language tasks, to natural language processing tasks.
Source: arXiv Paper
Perceiver AR
Origin: Developed by Google Research.
Description: An autoregressive, modality-agnostic architecture that uses cross-attention to map long-range inputs to a small number of latents.
Features: Can directly attend to over a hundred thousand tokens, enabling practical long-context density estimation.
Source: GitHub Repository
LIMoE
Origin: Introduced by Google Research.
Description: A sparse mixture of experts model capable of multimodal learning.
Features: Accepts both images and text simultaneously, trained using a contrastive loss.
Source: arXiv Paper
GPT-4chan
Origin: Developed by EleutherAI and Yannic Kilcher.
Description: A controversial AI model fine-tuned on a dataset from the /pol/ board of 4chan, known for its extremist content.
Features: Generates text mimicking the style and tone of /pol/ users, including offensive and nihilistic content.
Source: Wikipedia
Diffusion-LM
Origin: Created by researchers at Stanford University.
Description: A non-autoregressive language model based on continuous diffusions.
Features: Enables complex, controllable text generation tasks.
Source: arXiv Paper
UL2 20B
Origin: Developed by Google.
Description: A unified framework for pretraining models universally effective across datasets and setups.
Features: Combines diverse pre-training paradigms and introduces mode switching for downstream fine-tuning.
Source: Hugging Face
Gato (Cat)
Origin: Developed by DeepMind.
Description: A multi-modal, multi-task, multi-embodiment generalist policy.
Features: Can play Atari, caption images, chat, and more with the same network and weights.
Source: arXiv
LaMDA 2
Origin: Unveiled by Google.
Description: Uses text from various sources to formulate unique “natural conversations”.
Features: Engages in free-flowing conversation on numerous topics.
Source: Wikipedia
OPT-175B
Origin: Released by Meta AI Research.
Description: A language model trained on a dataset containing 180B tokens.
Features: Comparable performance with GPT-3 but with a lower training carbon footprint.
Source: Meta AI Blog
Tk-Instruct
Origin: Introduced by AI researchers.
Description: A transformer model trained to follow a variety of in-context instructions.
Features: Outperforms existing instruction-following models.
Source: GitHub
InCoder
Origin: Presented by Facebook AI researchers.
Description: A generative model for code infilling and synthesis.
Features: Hosts example code showing how to use the model using HuggingFace’s transformers library.
Source: GitHub
NOOR
Origin: Developed by the Technology Innovation Institute.
Description: The world’s largest Arabic NLP model.
Features: Capable of varied cross-domain tasks and learning from natural language instructions.
Source: NOOR Official Website
mGPT
Origin: Introduced by AI Forever.
Description: A multilingual variant of GPT-3, pretrained on 61 languages.
Features: Intrinsic and extrinsic evaluation on cross-lingual NLU datasets and benchmarks.
Source: Hugging Face
PaLM-Coder
Origin: Not found
Description: Not found
Features: Not found
Source: Not found
PaLM
Origin: Developed by Google AI.
Description: A large language model that excels at advanced reasoning tasks, including code and math, classification, and question answering.
Features: Pre-trained on parallel multilingual text and a larger corpus of different languages than its predecessor.
Source: Google AI
SeeKeR
Origin: Introduced by Facebook AI Research.
Description: A modular language model that uses a search engine to stay relevant and up-to-date.
Features: Outperforms BlenderBot 2 in dialogue consistency, knowledge, factual correctness, and engagingness.
Source: Facebook AI Research
CodeGen
Origin: Proposed by Salesforce.
Description: An autoregressive language model for program synthesis trained on The Pile, BigQuery, and BigPython.
Features: Outperforms OpenAI’s Codex on the HumanEval benchmark.
Source: Hugging Face
VLM-4
Origin: Not found
Description: Not found
Features: Not found
Source: Not found
CM3
Origin: Developed by Facebook AI Research.
Description: A causally masked generative model trained over a large corpus of structured multi-modal documents containing text and image tokenz.
Features: Enables rich structured, multi-modal outputs while conditioning on arbitrary masked document contexts.
Source: arXiv
Luminous
Origin: Developed by Aleph Alpha.
Description: A family of large language models capable of processing and producing human text in multiple languages.
Features: Multimodal capabilities, working with images as well as text.
Source: Aleph Alpha API
Chinchilla
Origin: Developed by DeepMind.
Description: A large language model that claims to outperform GPT-3 and simplifies downstream utilization.
Features: Trained with a hypothesis that for every doubling of model size, the number of training tokens should also be doubled.
Source: Wikipedia
GPT-NeoX-20B
Origin: Developed by EleutherAI.
Description: A 20 billion parameter autoregressive language model trained on the Pile.
Features: Intentionally resembles the architecture of GPT-3 and is almost identical to GPT-J-6B.
Source: Hugging Face
ERNIE 3.0 Titan
Origin: Developed by Baidu.
Description: A hundred-billion-parameter model trained to generate credible and controllable texts.
Features: Employs self-supervised adversarial loss and controllable language modeling loss.
Source: arXiv
XGLM
Origin: Developed by Facebook AI Research.
Description: A family of multilingual autoregressive language models trained on a balanced corpus covering a diverse set of languages.
Features: Sets new state of the art in few-shot learning in more than 20 representative languages.
Source: GitHub
Fairseq
Origin: Developed by Facebook AI Research.
Description: A sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks.
Features: Supports multi-GPU training, fast generation, and a variety of search algorithms for text generation.
Source: GitHub Repository
Gopher
Origin: Created by DeepMind.
Description: A 280-billion-parameter AI natural language processing model based on the Transformer architecture.
Features: Trained on a 10.5TB corpus called MassiveText, outperforms current state-of-the-art on various evaluation tasks.
Source: arXiv Paper
GLaM
Origin: Introduced by Google.
Description: A sparsely activated mixture-of-experts model named Generalist Language Model (GLaM), which scales model capacity while incurring substantially less training cost compared to dense variants.
Features: The largest GLaM has 1.2 trillion parameters and achieves better overall zero-shot and one-shot performance across 29 NLP tasks.
Source: arXiv Paper
Anthropic-LM 52B
Origin: Developed by Anthropic.
Description: Part of an effort to red team language models to reduce potentially harmful outputs.
Features: Investigates scaling behaviors across different model sizes and types, including plain LMs and those trained with reinforcement learning from human feedback.
Source: Anthropic Research
RETRO
Origin: Developed by DeepMind.
Description: Enhances auto-regressive language models by conditioning on document chunks retrieved from a large corpus.
Features: With a 2 trillion token database, RETRO obtains comparable performance to GPT-3 and Jurassic-1 despite using significantly fewer parameters.
Source: arXiv Paper
BERT-480
Origin: Not found
Description: Not found
Features: Not found
Source: Not found
BERT-200
Origin: Not found
Description: Not found
Features: Not found
Source: Not found
Cedille FR-Boris
Origin: Developed by Coteries.
Description: A 6B parameter autoregressive language model based on the GPT-J architecture.
Features: Trained on around 78B tokens of French text from the C4 dataset, named after French writer Boris Vian.
Source: Hugging Face
MT-NLG
Origin: A joint effort between Microsoft and NVIDIA.
Description: The Megatron-Turing Natural Language Generation model (MT-NLG) is the largest and most powerful monolithic transformer language model trained to date, with 530 billion parameters.
Features: Demonstrates unmatched accuracy in a broad set of natural language tasks such as completion prediction, reading comprehension, and commonsense reasoning.
Source: NVIDIA Technical Blog
FLAN
Origin: Introduced by Google Research.
Description: Fine-tuned LAnguage Net (FLAN) uses instruction fine-tuning to make models more amenable to solving NLP tasks in general.
Features: Performs various unseen tasks without the need for task-specific fine-tuning.
Source: Google Research Blog
Command xlarge
Origin: Developed by Cohere.
Description: A highly scalable language model that balances high performance with strong accuracy.
Features: Optimized for Retrieval Augmented Generation at production scale, offering leading accuracy for advanced AI applications.
Source: Cohere’s Command Model
PLATO-XL
Origin: Created by Baidu.
Description: A large-scale pre-training dialogue generation model trained on both Chinese and English social media conversations.
Features: Achieves state-of-the-art results across multiple conversational tasks.
Source: arXiv Paper
Macaw
Origin: Developed by Allen Institute for AI.
Description: A high-performance question-answering model capable of outperforming other popular current language models.
Features: Significantly smaller yet more efficient than other models.
Source: Macaw
CodeT5
Origin: Introduced by Salesforce.
Description: Encoder-decoder LLMs for code that can be flexibly combined to suit a wide range of downstream code tasks.
Features: State-of-the-art performance on various code-related tasks.
Source: arXiv Paper
Codex
Origin: Released by OpenAI.
Description: An AI system that translates natural language to code, proficient in over a dozen programming languages.
Features: Can interpret simple commands in natural language and execute them on the user’s behalf.
Source: OpenAI Codex
Jurassic-1
Origin: Launched by AI21 Labs.
Description: A 178B-parameter autoregressive language model trained on the Pile dataset.
Features: Highly versatile, capable of human-like text generation and solving complex tasks.
Source: AI21 Studio
BlenderBot 2.0
Origin: Built and open-sourced by Facebook AI Research.
Description: The first chatbot that can build long-term memory and search the internet.
Features: Can engage in sophisticated conversations on nearly any topic.
Source: Blender Bot 2.0
GPT-J
Origin: Developed by EleutherAI.
Description: A GPT-3-like causal language model trained on the Pile dataset.
Features: Capable of generating human-like text that continues from a prompt.
Source: GPT-J
LaMDA
Origin: Developed by Google.
Description: A family of conversational large language models aiming to make interactions with technology more natural.
Features: Can engage in a free-flowing way about a seemingly endless number of topics.
Source: LaMDA
ruGPT-3
Origin: Not found
Description: Not found
Features: Not found
Source: Not found
Switch
Origin: Developed by Google.
Description: A mixture-of-experts model that scales up to 1.6 trillion parameters.
Features: Improves training time up to 7x compared to the T5 NLP model, with comparable accuracy.
Source: Medium Article
GPT-3
Origin: Released by OpenAI.
Description: A large language model with 175 billion parameters.
Features: Demonstrates strong “zero-shot” and “few-shot” learning abilities on many tasks.
Source: Wikipedia
Megatron-11B
Origin: Introduced by Facebook AI Research labs.
Description: A unidirectional language model with 11B parameters based on Megatron-LM.
Features: Trained using intra-layer model parallelism with each layer’s parameters split across 8 GPUs.
Source: GitHub
Meena
Origin: Developed by Google.
Description: An end-to-end, neural conversational model that learns to respond sensibly to a given conversational context.
Features: Uses the Evolved Transformer seq2seq architecture, aiming to minimize perplexity.
Source: Google Research Blog
T5
Origin: Presented by Google.
Description: An encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks.
Features: Converts every language problem into a text-to-text format.
Source: Hugging Face
RoBERTa
Origin: Proposed by Facebook AI.
Description: A robustly optimized BERT pretraining approach.
Features: Removes the next-sentence pretraining objective and trains with much larger mini-batches and learning rates.
Source: Hugging Face
GPT-2
Origin: Released by OpenAI.
Description: A large transformer-based language model with 1.5 billion parameters.
Features: Trained to predict the next word in sentences.
Source: Hugging Face
BERT
Origin: Introduced by researchers at Google.
Description: A language model based on the transformer architecture, notable for its dramatic improvement over previous state-of-the-art models.
Features: Pre-trained on the Toronto BookCorpus and English Wikipedia.
Source: Wikipedia
GPT-1
Origin: Created by OpenAI.
Description: The first transformer-based language model, known as GPT-1.
Features: A causal (unidirectional) transformer pre-trained using language modeling on a large corpus.
Source: Wikipedia
ULMFiT
Origin: Developed by fast.ai.
Description: An architecture and transfer learning method for NLP tasks involving a 3-layer AWD-LSTM architecture.
Features: Uses discriminative fine-tuning and Slanted Triangular Learning Rates (STLR).
Source: Papers With Code
As we close this chapter on our exploration of Large Language Models (LLMs) up to 2024, we hope you’ve found the above information to be a clear window into the complex world of AI language technology. From the intricate workings of GPT-3 to the nuanced capabilities of BERT and beyond, each LLM we’ve discussed holds the potential to revolutionize how we interact with machines and data.
Our journey through the landscape of LLMs has been one of discovery and understanding, made possible by the invaluable insights from Dr. Alan D Thompson. His expertise has been a guiding light in demystifying the technicalities and presenting them in a way that’s accessible to all.
As I look forward to the future, we anticipate even more innovative breakthroughs in AI that will continue to shape our digital lives.
The documentation for each LLM, arriving in the coming week, will serve as a comprehensive resource to help you navigate these advancements with ease.
Thank you for joining us on this enlightening journey. We invite you to keep the conversation going, share your thoughts, and stay curious. The world of AI is vast and ever-growing, and together, we’ll keep learning and growing with it.
Until next time, embrace the possibilities that LLMs bring to our world, and may your path through the digital age be filled with knowledge and wonder.
Let’s Continue Learning! — Vivek !!