In knowledge representation and reasoning, a knowledge graph is a knowledge base that uses a graph-structured data model or topology to represent and operate on data. Knowledge graphs are often used to store interlinked descriptions of entities – objects, events, situations or abstract concepts – while also encoding the free-form semantics or relationships underlying these entities. Since the development of the Semantic Web, knowledge graphs have often been associated with linked open data projects, focusing on the connections between concepts and entities. They are also historically associated with and used by search engines such as Google, Bing, and Yahoo; knowledge engines and question-answering services such as WolframAlpha, Apple's Siri, and Amazon Alexa; and social networks such as LinkedIn and Facebook. Recent developments in data science and machine learning, particularly in graph neural networks, representation learning, and machine learning, have broadened the scope of knowledge graphs beyond their traditional use in search engines and recommender systems. They are increasingly used in scientific research, with notable applications in fields such as genomics, proteomics, and systems biology. == History == The term was coined as early as 1972 by the Austrian linguist Edgar W. Schneider, in a discussion of how to build modular instructional systems for courses. In the late 1980s, the University of Groningen and University of Twente jointly began a project called Knowledge Graphs, focusing on the design of semantic networks with edges restricted to a limited set of relations, to facilitate algebras on the graph. In subsequent decades, the distinction between semantic networks and knowledge graphs was blurred. Some early knowledge graphs were topic-specific. In 1985, Wordnet was founded, capturing semantic relationships between words and meanings – an application of this idea to language itself. In 2005, Marc Wirk founded Geonames to capture relationships between different geographic names and locales and associated entities. In 1998, Andrew Edmonds of Science in Finance Ltd in the UK created a system called ThinkBase that offered fuzzy-logic based reasoning in a graphical context. In 2007, both DBpedia and Freebase were founded as graph-based knowledge repositories for general-purpose knowledge. DBpedia focused exclusively on data extracted from Wikipedia, while Freebase also included a range of public datasets. Neither described themselves as a 'knowledge graph' but developed and described related concepts. In 2012, Google introduced their Knowledge Graph, building on DBpedia and Freebase among other sources. They later incorporated RDFa, Microdata, JSON-LD content extracted from indexed web pages, including the CIA World Factbook, Wikidata, and Wikipedia. Entity and relationship types associated with this knowledge graph have been further organized using terms from the schema.org vocabulary. The Google Knowledge Graph became a complement to string-based search within Google, and its popularity online brought the term into more common use. Since then, several large multinationals have advertised their use of knowledge graphs, further popularising the term. These include Facebook, LinkedIn, Airbnb, Microsoft, Amazon, Uber and eBay. In 2019, IEEE combined its annual international conferences on "Big Knowledge" and "Data Mining and Intelligent Computing" into the International Conference on Knowledge Graph. The development of large language models expanded interest in knowledge graphs as a way to structure information from unstructured text, with advances in language processing enabling their automatic or semi-automatic generation and expansion. The term knowledge graph has since broadened to include the dynamically constructed and adaptive graph structures, which support retrieval, reasoning, and summarization in generative systems. Microsoft Research's GraphRAG (2024) exemplified this development by integrating LLM-generated graphs into retrieval-augmented generation. == Definitions == There is no single commonly accepted definition of a knowledge graph. Most definitions view the topic through a Semantic Web lens and include these features: Flexible relations among knowledge in topical domains: A knowledge graph (i) defines abstract classes and relations of entities in a schema, (ii) mainly describes real world entities and their interrelations, organized in a graph, (iii) allows for potentially interrelating arbitrary entities with each other, and (iv) covers various topical domains. General structure: A network of entities, their semantic types, properties, and relationships. To represent properties, categorical or numerical values are often used. Supporting reasoning over inferred ontologies: A knowledge graph acquires and integrates information into an ontology and applies a reasoner to derive new knowledge. There are, however, many knowledge graph representations for which some of these features are not relevant. For those knowledge graphs, this simpler definition may be more useful: A digital structure that represents knowledge as concepts and the relationships between them (facts). A knowledge graph can include an ontology that allows both humans and machines to understand and reason about its contents. === Implementations === In addition to the above examples, the term has been used to describe open knowledge projects such as YAGO and Wikidata; federations like the Linked Open Data cloud; a range of commercial search tools, including Yahoo's semantic search assistant Spark, Google's Knowledge Graph, and Microsoft's Satori; and the LinkedIn and Facebook entity graphs. The term is also used in the context of note-taking software applications that allow a user to build a personal knowledge graph. The popularization of knowledge graphs and their accompanying methods have led to the development of graph databases such as Neo4j, GraphDB and AgensGraph. These graph databases allow users to easily store data as entities and their interrelationships, and facilitate operations such as data reasoning, node embedding, and ontology development on knowledge bases. In contrast, virtual knowledge graphs do not store information in specialized databases. They rely on an underlying relational database or data lake to answer queries on the graph. Such a virtual knowledge graph system must be properly configured in order to answer the queries correctly. This specific configuration is done through a set of mappings that define the relationship between the elements of the data source and the structure and ontology of the virtual knowledge graph. == Using a knowledge graph for reasoning over data == A knowledge graph formally represents semantics by describing entities and their relationships. Knowledge graphs may make use of ontologies as a schema layer. By doing this, they allow logical inference for retrieving implicit knowledge rather than only allowing queries requesting explicit knowledge. In order to allow the use of knowledge graphs in various machine learning tasks, several methods for deriving latent feature representations of entities and relations have been devised. These knowledge graph embeddings allow them to be connected to machine learning methods that require feature vectors like word embeddings. This can complement other estimates of conceptual similarity. Models for generating useful knowledge graph embeddings are commonly the domain of graph neural networks (GNNs). GNNs are deep learning architectures that comprise edges and nodes, which correspond well to the entities and relationships of knowledge graphs. The topology and data structures afforded by GNNs provide a convenient domain for semi-supervised learning, wherein the network is trained to predict the value of a node embedding (provided a group of adjacent nodes and their edges) or edge (provided a pair of nodes). These tasks serve as fundamental abstractions for more complex tasks such as knowledge graph reasoning and alignment. === Entity alignment === As new knowledge graphs are produced across a variety of fields and contexts, the same entity will inevitably be represented in multiple graphs. However, because no single standard for the construction or representation of knowledge graph exists, resolving which entities from disparate graphs correspond to the same real world subject is a non-trivial task. This task is known as knowledge graph entity alignment, and is an active area of research. Strategies for entity alignment generally seek to identify similar substructures, semantic relationships, shared attributes, or combinations of all three between two distinct knowledge graphs. Entity alignment methods use these structural similarities between generally non-isomorphic graphs to predict which nodes correspond to the same entity. In 2023, researchers found success in using large language models (LLMs) in the task of entity alignment. This was in particul
Anaconda (Python distribution)
Anaconda is an open source data science and artificial intelligence distribution platform for the Python programming language. Developed by Anaconda, Inc., an American company founded in 2012, the platform is used to develop and manage data science and AI projects. In 2024, Anaconda Inc. has about 300 employees and 45 million users. == History == Co-founded in Austin, Texas in 2012 as Continuum Analytics by Peter Wang and Travis Oliphant, Anaconda Inc. operates from the United States and Europe. Anaconda Inc. developed Conda, a cross-platform, language-agnostic binary package manager. It also launched PyData community workshops and the Jupyter Cloud Notebook service (Wakari.io). In 2013, it received funding from DARPA. In 2015, the company had two million users including 200 of the Fortune 500 companies and raised $24 million in a Series A funding round led by General Catalyst and BuildGroup. Anaconda secured an additional $30 million in funding in 2021. Continuum Analytics rebranded as Anaconda in 2017. That year, it announced the release of Anaconda Enterprise 5, an integration with Microsoft Azure, and had over 13 million users by year's end. In 2022, it released Anaconda Business; new integrations with Snowflake and others; and the open-source PyScript. It also acquired PythonAnywhere, while Anaconda's user base exceeded 30 million in 2022. In 2023, Anaconda released Python in Excel, a new integration with Microsoft Excel, and launched PyScript.com. The company made a series of investments in AI during 2024. That February, Anaconda partnered with IBM to import its repository of Python packages into Watsonx, IBM's generative AI platform. The same year, Anaconda joined IBM's AI Alliance and released an integration with Teradata and Lenovo. In 2024, Anaconda's user base reached 45 million users and Barry Libert was named company CEO, after serving on Anaconda's board of directors. He was succeeded as CEO in October 2025 by David DeSanto, who also became a company director. In May 2025, the company introduced the first unified AI platform for Open Source, Anaconda AI Platform, a central control for AI workflows that enables customization in Python-based enterprise AI development. That July, after reaching over $150 million in a Series C funding round, Anaconda was evaluated at about $1.5 billion. == Overview == Anaconda distribution comes with over 300 packages automatically installed, and over 7,500 additional open-source packages can be installed from the Anaconda repository as well as the Conda package and virtual environment manager. It also includes a GUI, Anaconda Navigator, as a graphical alternative to the command-line interface (CLI). Conda was developed to address dependency conflicts native to the pip package manager, which would automatically install any dependent Python packages without checking for conflicts with previously installed packages (until its version 20.3, which later implemented consistent dependency resolution). The Conda package manager's historical differentiation analyzed and resolved these installation conflicts. Anaconda is a distribution of the Python programming language (and previously also R) for scientific computing (data science, machine learning applications, large-scale data processing, predictive analytics, etc.), that aims to simplify package management and deployment. Anaconda distribution includes data-science packages suitable for Windows, Linux, and macOS. Other company products include Anaconda Free, and subscription-based Starter, Business and Enterprise. Anaconda's business tier offers Package Security Manager. Package versions in Anaconda are managed by the package management system Conda, which was spun out as a separate open-source package as useful both independently and for applications other than Python. There is also a small, bootstrap version of Anaconda called Miniconda, which includes only Conda, Python, the packages they depend on, and a small number of other packages. Open source packages can be individually installed from the Anaconda repository, Anaconda Cloud (anaconda.org), or the user's own private repository or mirror, using the conda install command. Anaconda, Inc. compiles and builds the packages available in the Anaconda repository itself, and provides binaries for Windows 32/64 bit, Linux 64 bit and MacOS 64-bit (Intel, Apple Silicon). Anything available on PyPI may be installed into a Conda environment using pip, and Conda will keep track of what it has installed and what pip has installed. Custom packages can be made using the conda build command, and can be shared with others by uploading them to Anaconda Cloud, PyPI or other repositories. The default installation of Anaconda2 includes Python 2.7 and Anaconda3 includes Python 3.7. However, it is possible to create new environments that include any version of Python packaged with Conda. === Anaconda Navigator === Anaconda Navigator is a desktop graphical user interface (GUI) included in Anaconda distribution that allows users to launch applications and manage Conda packages, environments and channels without using command-line commands. Navigator can search for packages on Anaconda Cloud or in a local Anaconda Repository, install them in an environment, run the packages and update them. It is available for Windows, macOS and Linux. The following applications are available by default in Navigator: JupyterLab Jupyter Notebook QtConsole Spyder Glue Orange RStudio Visual Studio Code === Conda === Conda is an open source, cross-platform, language-agnostic package manager and environment management system that installs, runs, and updates packages and their dependencies. It was created for Python programs, but it can package and distribute software for any language, including multi-language projects. The Conda package and environment manager is included in all versions of Anaconda, Miniconda, and Anaconda Repository. == Anaconda.org == Anaconda Cloud is a package management service by Anaconda where users can find, access, store and share public and private notebooks, environments, and Conda and PyPI packages. Cloud hosts useful Python packages, notebooks and environments for a wide variety of applications. Users do not need to log in or to have a Cloud account, to search for public packages, download and install them. Users can build new Conda packages using Conda-build and then use the Anaconda Client CLI to upload packages to Anaconda.org. Notebooks users can be aided with writing and debugging code with Anaconda's AI Assistant.
AI agent
In the context of generative artificial intelligence, AI agents (also referred to as compound AI systems or agentic AI) are a class of intelligent agents that can pursue goals, use tools, and take actions with varying degrees of autonomy. In practice, they usually operate within human-defined objectives, constraints, and available tools. == Overview == AI agents possess several key attributes, including goal-directed behavior, natural language interfaces, the capacity to use external tools, and the ability to perform multi-step tasks. Their control flow is frequently driven by large language models (LLMs). Agent systems may also include memory components, planning logic, tool interfaces, and orchestration software for coordinating agent components. AI agents do not have a standard definition. NIST describes agentic AI as an emerging area requiring standards for secure operation, interoperability, and reliable interaction with external systems. A common application of AI agents is task automation: for example, booking travel plans based on a user's prompted request. Companies such as Google, Microsoft and Amazon Web Services have offered platforms for deploying pre-built AI agents. Several protocols have been proposed for standardizing inter-agent communication, with examples including the Model Context Protocol, Gibberlink, and many others. Some of these protocols are also used for connecting agents to external applications. In December 2025, Linux Foundation announced the formation of the Agentic AI Foundation (AAIF), with the goal of ensuring agentic AI evolves transparently and collaboratively. == History == AI agents have been traced back to research from the 1990s, with Harvard professor Milind Tambe noting that the definition of an AI agent was not clear at the time. Researcher Andrew Ng has been credited with spreading the term "agentic" to a wider audience in 2024. == Training and testing == Researchers have attempted to build world models and reinforcement learning environments to train or evaluate AI agents. For example, video games such as Minecraft and No Man's Sky as well as replicas of company websites, have also been used for training such agents. == Autonomous capabilities == The Financial Times compared the autonomy of AI agents to the SAE classification of self-driving cars, likening most applications to level 2 or level 3, with some achieving level 4 in highly specialized circumstances, and level 5 being theoretical. == Cognitive architecture == The following are some internal design options for reasoning within an agent: Retrieval-augmented generation ReAct (Reason + Act) pattern is an iterative process in which an AI agent alternates between reasoning and taking actions, receives observations from the environment or external tools, and integrates these observations into subsequent reasoning steps. Reflexion, which uses an LLM to create feedback on the agent's plan of action and stores that feedback in a memory cache. A tool/agent registry, for organizing software functions or other agents that the agent can use. One-shot model querying, which queries the model once to create the plan of action. === Reference architecture === Ken Huang proposed an AI agent reference architecture, which consists of seven interconnected layers, with each layer building on the functionality of the layers beneath it: Layer 1: Foundation models - provide the core AI engines to power agent capabilities. Layer 2: Data operations - manage the complex data infrastructure required for AI agent operations, including Vector database, data loaders, RAG. Layer 3: Agent frameworks - sophisticated software and tools that simplify the development and management of the AI agents. Layer 4: Deployment and infrastructure - provide the robust technical foundation for running AI agents. Layer 5: Evaluation and observability - focus on assessing the safety and performance of AI agents. Layer 6: Security and compliance - a crucial protective framework ensuring AI agents operate safely, securely, and conform to regulatory boundaries. At this layer security and compliance features embedded into all the AI agent stack layers are integrated together. Layer 7: Agent ecosystem - represents the AI agents' interface with real-world applications and users. == Orchestration patterns == To execute complex tasks, autonomous agents are often integrated with other agents or specialized tools. These configurations, known as orchestration patterns or workflows, include the following: Prompt chaining: A sequence where the output of one step serves as the input for the next. Routing: The classification of an input to direct it to a specialized downstream task or tool. Parallelization: The simultaneous execution of multiple tasks. Sequential processing: A fixed, linear progression of tasks through a predefined pipeline. Planner-critic: An iterative pattern where one agent generates a proposal and another evaluates it to provide feedback for refinement. == Multimodal AI agents == In addition to large language models (LLMs), vision-language models (VLMs) and multimodal foundation models can be used as the basis for agents. In September 2024, Allen Institute for AI released an open-source vision-language model. Nvidia released a framework for developers to use VLMs, LLMs and retrieval-augmented generation for building AI agents that can analyze images and videos, including video search and video summarization. Microsoft released a multimodal agent model – trained on images, video, software user interface interactions, and robotics data – that the company claimed can manipulate software and robots. == Applications == As of April 2025, per the Associated Press, there are few real-world applications of AI agents. As of June 2025, per Fortune, many companies are primarily experimenting with AI agents. The Information divided AI agents into seven archetypes: business-task agents, for acting within enterprise software; conversational agents, which act as chatbots for customer support; research agents, for querying and analyzing information (such as OpenAI Deep Research); analytics agents, for analyzing data to create reports; software developer or coding agents (such as Cursor); domain-specific agents, which include specific subject matter knowledge; and web browser agents (such as OpenAI Operator). By mid-2025, AI agents have been used in video game development, gambling (including sports betting), cryptocurrency wallets (including cryptocurrency trading and meme coins) and social media. In August 2025, New York Magazine described software development as the most definitive use case of AI agents. Likewise, by October 2025, noting a decline in expectations, The Information noted AI coding agents and customer support as the primary use cases by businesses. In November 2025, The Wall Street Journal reported that few companies that deployed AI agents have received a return on investment. === Applications in government === Several government bodies in the United States and United Kingdom have deployed or announced the deployment of agents, at the local and national level. The city of Kyle, Texas deployed an AI agent from Salesforce in March 2025 for 311 customer service. In November 2025, the Internal Revenue Service stated that it would use Agentforce, AI agents from Salesforce, for the Office of Chief Counsel, Taxpayer Advocate Services and the Office of Appeals. That same month, Staffordshire Police announced that they would trial Agentforce agents for handling non-emergency 101 calls in the United Kingdom starting in 2026. In December 2025, the Department of Neighborhoods in Detroit, Michigan, in partnership with a local business, deployed a pilot project in two Detroit districts for an AI agent to be used for customer service calls. In February 2025, Thomas Shedd, the director of the Technology Transformation Services, proposed using AI coding agents across the United States federal government. A recruiter for the Department of Government Efficiency proposed in April 2025 to use AI agents to automate the work of about 70,000 United States federal government employees, as part of a startup with funding from OpenAI and a partnership agreement with Palantir. This proposal was criticized by experts for its impracticality, if not impossibility, and the lack of corresponding widespread adoption by businesses. In December 2025, the Food and Drug Administration announced that it would offer "agentic AI capabilities" to its staff for "meeting management, pre-market reviews, review validation, post-market surveillance, inspections and compliance and administrative functions." That same month, the United States Department of Defense launched GenAI.mil, an internal platform for American military personnel to use generative AI-based applications based on Google Gemini, including "intelligent agentic workflows". Defense Secretary Pete Hegseth listed applications such as "[conducting] deep r
Comparison of machine learning software
The following tables are a comparison of machine learning software such as software frameworks, libraries, and computer programs used for machine learning. == Machine learning software == == Other comparisons == == Machine learning helper libraries and platforms == Apache OpenNLP — natural language processing toolkit CUDA — GPU computing platform used to accelerate machine learning and deep learning workloads Horovod — distributed training framework for deep learning Hugging Face Transformers — library of pretrained transformer models built on other machine learning frameworks Kubeflow — machine learning platform for Kubernetes Mallet — toolkit for natural language processing and text analysis NumPy — numerical computing library used in machine learning OpenCV — computer vision library with machine learning functions ONNX — open format for representing machine learning models pandas — data analysis and data preparation library used in machine learning PlaidML — tensor compiler and backend for machine learning frameworks Polars — Dataframe library used for machine learning data preparation and analysis PyArrow — columnar data library used in machine learning data processing ROOT (TMVA) — data analysis framework with machine learning tools SciPy — scientific computing and optimization library used in machine learning == Online development environments for machine learning == Google Colab — hosted Jupyter Notebook environment commonly used for machine learning and deep learning JupyterLab — notebook-based development environment for machine learning and data science Jupyter Notebook — interactive notebook environment used for machine learning and data science Kaggle — online data science and machine learning platform
Proximal gradient methods for learning
Proximal gradient (forward backward splitting) methods for learning is an area of research in optimization and statistical learning theory which studies algorithms for a general class of convex regularization problems where the regularization penalty may not be differentiable. One such example is ℓ 1 {\displaystyle \ell _{1}} regularization (also known as Lasso) of the form min w ∈ R d 1 n ∑ i = 1 n ( y i − ⟨ w , x i ⟩ ) 2 + λ ‖ w ‖ 1 , where x i ∈ R d and y i ∈ R . {\displaystyle \min _{w\in \mathbb {R} ^{d}}{\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-\langle w,x_{i}\rangle )^{2}+\lambda \|w\|_{1},\quad {\text{ where }}x_{i}\in \mathbb {R} ^{d}{\text{ and }}y_{i}\in \mathbb {R} .} Proximal gradient methods offer a general framework for solving regularization problems from statistical learning theory with penalties that are tailored to a specific problem application. Such customized penalties can help to induce certain structure in problem solutions, such as sparsity (in the case of lasso) or group structure (in the case of group lasso). == Relevant background == Proximal gradient methods are applicable in a wide variety of scenarios for solving convex optimization problems of the form min x ∈ H F ( x ) + R ( x ) , {\displaystyle \min _{x\in {\mathcal {H}}}F(x)+R(x),} where F {\displaystyle F} is convex and differentiable with Lipschitz continuous gradient, R {\displaystyle R} is a convex, lower semicontinuous function which is possibly nondifferentiable, and H {\displaystyle {\mathcal {H}}} is some set, typically a Hilbert space. The usual criterion of x {\displaystyle x} minimizes F ( x ) + R ( x ) {\displaystyle F(x)+R(x)} if and only if ∇ ( F + R ) ( x ) = 0 {\displaystyle \nabla (F+R)(x)=0} in the convex, differentiable setting is now replaced by 0 ∈ ∂ ( F + R ) ( x ) , {\displaystyle 0\in \partial (F+R)(x),} where ∂ φ {\displaystyle \partial \varphi } denotes the subdifferential of a real-valued, convex function φ {\displaystyle \varphi } . Given a convex function φ : H → R {\displaystyle \varphi :{\mathcal {H}}\to \mathbb {R} } an important operator to consider is its proximal operator prox φ : H → H {\displaystyle \operatorname {prox} _{\varphi }:{\mathcal {H}}\to {\mathcal {H}}} defined by prox φ ( u ) = arg min x ∈ H φ ( x ) + 1 2 ‖ u − x ‖ 2 2 , {\displaystyle \operatorname {prox} _{\varphi }(u)=\operatorname {arg} \min _{x\in {\mathcal {H}}}\varphi (x)+{\frac {1}{2}}\|u-x\|_{2}^{2},} which is well-defined because of the strict convexity of the ℓ 2 {\displaystyle \ell _{2}} norm. The proximal operator can be seen as a generalization of a projection. We see that the proximity operator is important because x ∗ {\displaystyle x^{}} is a minimizer to the problem min x ∈ H F ( x ) + R ( x ) {\displaystyle \min _{x\in {\mathcal {H}}}F(x)+R(x)} if and only if x ∗ = prox γ R ( x ∗ − γ ∇ F ( x ∗ ) ) , {\displaystyle x^{}=\operatorname {prox} _{\gamma R}\left(x^{}-\gamma \nabla F(x^{})\right),} where γ > 0 {\displaystyle \gamma >0} is any positive real number. === Moreau decomposition === One important technique related to proximal gradient methods is the Moreau decomposition, which decomposes the identity operator as the sum of two proximity operators. Namely, let φ : X → R {\displaystyle \varphi :{\mathcal {X}}\to \mathbb {R} } be a lower semicontinuous, convex function on a vector space X {\displaystyle {\mathcal {X}}} . We define its Fenchel conjugate φ ∗ : X → R {\displaystyle \varphi ^{}:{\mathcal {X}}\to \mathbb {R} } to be the function φ ∗ ( u ) := sup x ∈ X ⟨ x , u ⟩ − φ ( x ) . {\displaystyle \varphi ^{}(u):=\sup _{x\in {\mathcal {X}}}\langle x,u\rangle -\varphi (x).} The general form of Moreau's decomposition states that for any x ∈ X {\displaystyle x\in {\mathcal {X}}} and any γ > 0 {\displaystyle \gamma >0} that x = prox γ φ ( x ) + γ prox φ ∗ / γ ( x / γ ) , {\displaystyle x=\operatorname {prox} _{\gamma \varphi }(x)+\gamma \operatorname {prox} _{\varphi ^{}/\gamma }(x/\gamma ),} which for γ = 1 {\displaystyle \gamma =1} implies that x = prox φ ( x ) + prox φ ∗ ( x ) {\displaystyle x=\operatorname {prox} _{\varphi }(x)+\operatorname {prox} _{\varphi ^{}}(x)} . The Moreau decomposition can be seen to be a generalization of the usual orthogonal decomposition of a vector space, analogous with the fact that proximity operators are generalizations of projections. In certain situations it may be easier to compute the proximity operator for the conjugate φ ∗ {\displaystyle \varphi ^{}} instead of the function φ {\displaystyle \varphi } , and therefore the Moreau decomposition can be applied. This is the case for group lasso. == Lasso regularization == Consider the regularized empirical risk minimization problem with square loss and with the ℓ 1 {\displaystyle \ell _{1}} norm as the regularization penalty: min w ∈ R d 1 n ∑ i = 1 n ( y i − ⟨ w , x i ⟩ ) 2 + λ ‖ w ‖ 1 , {\displaystyle \min _{w\in \mathbb {R} ^{d}}{\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-\langle w,x_{i}\rangle )^{2}+\lambda \|w\|_{1},} where x i ∈ R d and y i ∈ R . {\displaystyle x_{i}\in \mathbb {R} ^{d}{\text{ and }}y_{i}\in \mathbb {R} .} The ℓ 1 {\displaystyle \ell _{1}} regularization problem is sometimes referred to as lasso (least absolute shrinkage and selection operator). Such ℓ 1 {\displaystyle \ell _{1}} regularization problems are interesting because they induce sparse solutions, that is, solutions w {\displaystyle w} to the minimization problem have relatively few nonzero components. Lasso can be seen to be a convex relaxation of the non-convex problem min w ∈ R d 1 n ∑ i = 1 n ( y i − ⟨ w , x i ⟩ ) 2 + λ ‖ w ‖ 0 , {\displaystyle \min _{w\in \mathbb {R} ^{d}}{\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-\langle w,x_{i}\rangle )^{2}+\lambda \|w\|_{0},} where ‖ w ‖ 0 {\displaystyle \|w\|_{0}} denotes the ℓ 0 {\displaystyle \ell _{0}} "norm", which is the number of nonzero entries of the vector w {\displaystyle w} . Sparse solutions are of particular interest in learning theory for interpretability of results: a sparse solution can identify a small number of important factors. === Solving for L1 proximity operator === For simplicity we restrict our attention to the problem where λ = 1 {\displaystyle \lambda =1} . To solve the problem min w ∈ R d 1 n ∑ i = 1 n ( y i − ⟨ w , x i ⟩ ) 2 + ‖ w ‖ 1 , {\displaystyle \min _{w\in \mathbb {R} ^{d}}{\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-\langle w,x_{i}\rangle )^{2}+\|w\|_{1},} we consider our objective function in two parts: a convex, differentiable term F ( w ) = 1 n ∑ i = 1 n ( y i − ⟨ w , x i ⟩ ) 2 {\displaystyle F(w)={\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-\langle w,x_{i}\rangle )^{2}} and a convex function R ( w ) = ‖ w ‖ 1 {\displaystyle R(w)=\|w\|_{1}} . Note that R {\displaystyle R} is not strictly convex. Let us compute the proximity operator for R ( w ) {\displaystyle R(w)} . First we find an alternative characterization of the proximity operator prox R ( x ) {\displaystyle \operatorname {prox} _{R}(x)} as follows: u = prox R ( x ) ⟺ 0 ∈ ∂ ( R ( u ) + 1 2 ‖ u − x ‖ 2 2 ) ⟺ 0 ∈ ∂ R ( u ) + u − x ⟺ x − u ∈ ∂ R ( u ) . {\displaystyle {\begin{aligned}u=\operatorname {prox} _{R}(x)\iff &0\in \partial \left(R(u)+{\frac {1}{2}}\|u-x\|_{2}^{2}\right)\\\iff &0\in \partial R(u)+u-x\\\iff &x-u\in \partial R(u).\end{aligned}}} For R ( w ) = ‖ w ‖ 1 {\displaystyle R(w)=\|w\|_{1}} it is easy to compute ∂ R ( w ) {\displaystyle \partial R(w)} : the i {\displaystyle i} th entry of ∂ R ( w ) {\displaystyle \partial R(w)} is precisely ∂ | w i | = { 1 , w i > 0 − 1 , w i < 0 [ − 1 , 1 ] , w i = 0. {\displaystyle \partial |w_{i}|={\begin{cases}1,&w_{i}>0\\-1,&w_{i}<0\\\left[-1,1\right],&w_{i}=0.\end{cases}}} Using the recharacterization of the proximity operator given above, for the choice of R ( w ) = ‖ w ‖ 1 {\displaystyle R(w)=\|w\|_{1}} and γ > 0 {\displaystyle \gamma >0} we have that prox γ R ( x ) {\displaystyle \operatorname {prox} _{\gamma R}(x)} is defined entrywise by ( prox γ R ( x ) ) i = { x i − γ , x i > γ 0 , | x i | ≤ γ x i + γ , x i < − γ , {\displaystyle \left(\operatorname {prox} _{\gamma R}(x)\right)_{i}={\begin{cases}x_{i}-\gamma ,&x_{i}>\gamma \\0,&|x_{i}|\leq \gamma \\x_{i}+\gamma ,&x_{i}<-\gamma ,\end{cases}}} which is known as the soft thresholding operator S γ ( x ) = prox γ ‖ ⋅ ‖ 1 ( x ) {\displaystyle S_{\gamma }(x)=\operatorname {prox} _{\gamma \|\cdot \|_{1}}(x)} . === Fixed point iterative schemes === To finally solve the lasso problem we consider the fixed point equation shown earlier: x ∗ = prox γ R ( x ∗ − γ ∇ F ( x ∗ ) ) . {\displaystyle x^{}=\operatorname {prox} _{\gamma R}\left(x^{}-\gamma \nabla F(x^{})\right).} Given that we have computed the form of the proximity operator explicitly, then we can define a standard fixed point iteration procedure. Namely, fix some initial w 0 ∈ R d {\displaystyle w^{0}\in \mathbb {R} ^{d}} , and for k = 1 , 2 , … {\displaystyle k=1,2,\ldots } define w k + 1 = S γ ( w k − γ ∇ F ( w k ) ) . {\displaystyle w^{k+1}=S_{\gamma }\left(w^{k}-\gamma \nabla F\l
Empirical dynamic modeling
Empirical dynamic modeling (EDM) is a framework for analysis and prediction of nonlinear dynamical systems. Applications include population dynamics, ecosystem service, medicine, neuroscience, dynamical systems, geophysics, and human-computer interaction. EDM was originally developed by Robert May and George Sugihara. It can be considered a methodology for data modeling, predictive analytics, dynamical system analysis, machine learning and time series analysis. == Description == Mathematical models have tremendous power to describe observations of real-world systems. They are routinely used to test hypothesis, explain mechanisms and predict future outcomes. However, real-world systems are often nonlinear and multidimensional, in some instances rendering explicit equation-based modeling problematic. Empirical models, which infer patterns and associations from the data instead of using hypothesized equations, represent a natural and flexible framework for modeling complex dynamics. Donald DeAngelis and Simeon Yurek illustrated that canonical statistical models are ill-posed when applied to nonlinear dynamical systems. A hallmark of nonlinear dynamics is state-dependence: system states are related to previous states governing transition from one state to another. EDM operates in this space, the multidimensional state-space of system dynamics rather than on one-dimensional observational time series. EDM does not presume relationships among states, for example, a functional dependence, but projects future states from localised, neighboring states. EDM is thus a state-space, nearest-neighbors paradigm where system dynamics are inferred from states derived from observational time series. This provides a model-free representation of the system naturally encompassing nonlinear dynamics. A cornerstone of EDM is recognition that time series observed from a dynamical system can be transformed into higher-dimensional state-spaces by time-delay embedding with Takens's theorem. The state-space models are evaluated based on in-sample fidelity to observations, conventionally with Pearson correlation between predictions and observations. == Methods == Primary EDM algorithms include Simplex projection, Sequential locally weighted global linear maps (S-Map) projection, Multivariate embedding in Simplex or S-Map, Convergent cross mapping (CCM), and Multiview Embeding, described below. Nearest neighbors are found according to: NN ( y , X , k ) = ‖ X N i E − y ‖ ≤ ‖ X N j E − y ‖ if 1 ≤ i ≤ j ≤ k {\displaystyle {\text{NN}}(y,X,k)=\|X_{N_{i}}^{E}-y\|\leq \|X_{N_{j}}^{E}-y\|{\text{ if }}1\leq i\leq j\leq k} === Simplex === Simplex projection is a nearest neighbor projection. It locates the k {\displaystyle k} nearest neighbors to the location in the state-space from which a prediction is desired. To minimize the number of free parameters k {\displaystyle k} is typically set to E + 1 {\displaystyle E+1} defining an E + 1 {\displaystyle E+1} dimensional simplex in the state-space. The prediction is computed as the average of the weighted phase-space simplex projected T p {\displaystyle Tp} points ahead. Each neighbor is weighted proportional to their distance to the projection origin vector in the state-space. Find k {\displaystyle k} nearest neighbor: N k ← NN ( y , X , k ) {\displaystyle N_{k}\gets {\text{NN}}(y,X,k)} Define the distance scale: d ← ‖ X N 1 E − y ‖ {\displaystyle d\gets \|X_{N_{1}}^{E}-y\|} Compute weights: For{ i = 1 , … , k {\displaystyle i=1,\dots ,k} } : w i ← exp ( − ‖ X N i E − y ‖ / d ) {\displaystyle w_{i}\gets \exp(-\|X_{N_{i}}^{E}-y\|/d)} Average of state-space simplex: y ^ ← ∑ i = 1 k ( w i X N i + T p ) / ∑ i = 1 k w i {\displaystyle {\hat {y}}\gets \sum _{i=1}^{k}\left(w_{i}X_{N_{i}+T_{p}}\right)/\sum _{i=1}^{k}w_{i}} === S-Map === S-Map extends the state-space prediction in Simplex from an average of the E + 1 {\displaystyle E+1} nearest neighbors to a linear regression fit to all neighbors, but localised with an exponential decay kernel. The exponential localisation function is F ( θ ) = exp ( − θ d / D ) {\displaystyle F(\theta )={\text{exp}}(-\theta d/D)} , where d {\displaystyle d} is the neighbor distance and D {\displaystyle D} the mean distance. In this way, depending on the value of θ {\displaystyle \theta } , neighbors close to the prediction origin point have a higher weight than those further from it, such that a local linear approximation to the nonlinear system is reasonable. This localisation ability allows one to identify an optimal local scale, in-effect quantifying the degree of state dependence, and hence nonlinearity of the system. Another feature of S-Map is that for a properly fit model, the regression coefficients between variables have been shown to approximate the gradient (directional derivative) of variables along the manifold. These Jacobians represent the time-varying interaction strengths between system variables. Find k {\displaystyle k} nearest neighbor: N ← NN ( y , X , k ) {\displaystyle N\gets {\text{NN}}(y,X,k)} Sum of distances: D ← 1 k ∑ i = 1 k ‖ X N i E − y ‖ {\displaystyle D\gets {\frac {1}{k}}\sum _{i=1}^{k}\|X_{N_{i}}^{E}-y\|} Compute weights: For{ i = 1 , … , k {\displaystyle i=1,\dots ,k} } : w i ← exp ( − θ ‖ X N i E − y ‖ / D ) {\displaystyle w_{i}\gets \exp(-\theta \|X_{N_{i}}^{E}-y\|/D)} Reweighting matrix: W ← diag ( w i ) {\displaystyle W\gets {\text{diag}}(w_{i})} Design matrix: A ← [ 1 X N 1 X N 1 − 1 … X N 1 − E + 1 1 X N 2 X N 2 − 1 … X N 2 − E + 1 ⋮ ⋮ ⋮ ⋱ ⋮ 1 X N k X N k − 1 … X N k − E + 1 ] {\displaystyle A\gets {\begin{bmatrix}1&X_{N_{1}}&X_{N_{1}-1}&\dots &X_{N_{1}-E+1}\\1&X_{N_{2}}&X_{N_{2}-1}&\dots &X_{N_{2}-E+1}\\\vdots &\vdots &\vdots &\ddots &\vdots \\1&X_{N_{k}}&X_{N_{k}-1}&\dots &X_{N_{k}-E+1}\end{bmatrix}}} Weighted design matrix: A ← W A {\displaystyle A\gets WA} Response vector at T p {\displaystyle Tp} : b ← [ X N 1 + T p X N 2 + T p ⋮ X N k + T p ] {\displaystyle b\gets {\begin{bmatrix}X_{N_{1}+T_{p}}\\X_{N_{2}+T_{p}}\\\vdots \\X_{N_{k}+T_{p}}\end{bmatrix}}} Weighted response vector: b ← W b {\displaystyle b\gets Wb} Least squares solution (SVD): c ^ ← argmin c ‖ A c − b ‖ 2 2 {\displaystyle {\hat {c}}\gets {\text{argmin}}_{c}\|Ac-b\|_{2}^{2}} Local linear model c ^ {\displaystyle {\hat {c}}} is prediction: y ^ ← c ^ 0 + ∑ i = 1 E c ^ i y i {\displaystyle {\hat {y}}\gets {\hat {c}}_{0}+\sum _{i=1}^{E}{\hat {c}}_{i}y_{i}} === Multivariate Embedding === Multivariate Embedding recognizes that time-delay embeddings are not the only valid state-space construction. In Simplex and S-Map one can generate a state-space from observational vectors, or time-delay embeddings of a single observational time series, or both. === Convergent Cross Mapping === Convergent cross mapping (CCM) leverages a corollary to the Generalized Takens Theorem that it should be possible to cross predict or cross map between variables observed from the same system. Suppose that in some dynamical system involving variables X {\displaystyle X} and Y {\displaystyle Y} , X {\displaystyle X} causes Y {\displaystyle Y} . Since X {\displaystyle X} and Y {\displaystyle Y} belong to the same dynamical system, their reconstructions (via embeddings) M x {\displaystyle M_{x}} , and M y {\displaystyle M_{y}} , also map to the same system. The causal variable X {\displaystyle X} leaves a signature on the affected variable Y {\displaystyle Y} , and consequently, the reconstructed states based on Y {\displaystyle Y} can be used to cross predict values of X {\displaystyle X} . CCM leverages this property to infer causality by predicting X {\displaystyle X} using the M y {\displaystyle M_{y}} library of points (or vice versa for the other direction of causality), while assessing improvements in cross map predictability as larger and larger random samplings of M y {\displaystyle M_{y}} are used. If the prediction skill of X {\displaystyle X} increases and saturates as the entire M y {\displaystyle M_{y}} is used, this provides evidence that X {\displaystyle X} is casually influencing Y {\displaystyle Y} . === Multiview Embedding === Multiview Embedding is a Dimensionality reduction technique where a large number of state-space time series vectors are combitorially assessed towards maximal model predictability. == Extensions == Extensions to EDM techniques include: Generalized Theorems for Nonlinear State Space Reconstruction Extended Convergent Cross Mapping Dynamic stability S-Map regularization Visual analytics with EDM Convergent Cross Sorting Expert system with EDM hybrid Sliding windows based on the extended convergent cross-mapping Empirical Mode Modeling Accounting for missing data and variable step sizes Accounting for observation noise Hierarchical Bayesian EDM via Gaussian processes Intelligent and Adaptive Control Optimal control via Empirical dynamic programming Multiview distance regularised S-map
Principle of rationality
The principle of rationality (or rationality principle) was coined by Karl R. Popper in his Harvard Lecture of 1963, and published in his book Myth of Framework. It is related to what he called the 'logic of the situation' in an Economica article of 1944/1945, published later in his book The Poverty of Historicism. According to Popper's rationality principle, agents act in the most adequate way according to the objective situation. It is an idealized conception of human behavior which he used to drive his model of situational analysis. Cognitive scientist Allen Newell elaborated on the principle in his account of knowledge level modeling. == Popper == Popper called for social science to be grounded in what he called situational analysis or situational logic. This requires building models of social situations which include individual actors and their relationship to social institutions, e.g. markets, legal codes, bureaucracies, etc. These models attribute certain aims and information to the actors. This forms the 'logic of the situation', the result of reconstructing meticulously all circumstances of an historical event. The 'principle of rationality' is the assumption that people are instrumental in trying to reach their goals, and this is what drives the model. Popper believed that this model could be continuously refined to approach the objective truth. Popper called his principle of rationality nearly empty (a technical term meaning without empirical content) and strictly speaking false, but nonetheless tremendously useful. These remarks earned him a lot of criticism because seemingly he had swerved from his famous Logic of Scientific Discovery. Among the many philosophers having discussed Popper's principle of rationality from the 1960s up to now are Noretta Koertge, R. Nadeau, Viktor J. Vanberg, Hans Albert, E. Matzner, Ian C. Jarvie, Mark A. Notturno, John Wettersten, Ian C. Böhm. == Newell == In the context of knowledge-based systems, Newell (in 1982) proposed the following principle of rationality: "If an agent has knowledge that one of its actions will lead to one of its goals, then the agent will select that action." This principle is employed by agents at the knowledge level to move closer to a desired goal. An important philosophical difference between Newell and Popper is that Newell argued that the knowledge level is real in the sense that it exists in nature and is not made up. This allowed Newell to treat the rationality principle as a way of understanding nature and avoid the problems Popper ran into by treating knowledge as non physical and therefore non empirical.