127 private links
Artificial intelligence has made tremendous advances since its inception about seventy years ago. Self-driving cars, programs beating experts at complex games, and smart robots capable of assisting people that need care are just some among the successful examples of machine intelligence. This kind of progress might entice us to envision a society populated by autonomous robots capable of performing the same tasks humans do in the near future. This prospect seems limited only by the power and complexity of current computational devices, which is improving fast. However, there are several significant obstacles on this path. General intelligence involves situational reasoning, taking perspectives, choosing goals, and an ability to deal with ambiguous information. We observe that all of these characteristics are connected to the ability of identifying and exploiting new affordances—opportunities (or impediments) on the path of an agent to achieve its goals. A general example of an affordance is the use of an object in the hands of an agent. We show that it is impossible to predefine a list of such uses. Therefore, they cannot be treated algorithmically. This means that “AI agents” and organisms differ in their ability to leverage new affordances. Only organisms can do this. This implies that true AGI is not achievable in the current algorithmic frame of AI research. It also has important consequences for the theory of evolution. We argue that organismic agency is strictly required for truly open-ended evolution through radical emergence. We discuss the diverse ramifications of this argument, not only in AI research and evolution, but also for the philosophy of science.
Addressing the need for novel insect observation and control tools, the Photonic Fence detects and tracks mosquitoes and other flying insects and can apply lethal doses of laser light to them. Previously, we determined lethal exposure levels for a variety of lasers and pulse conditions on anesthetized Anopheles stephensi mosquitoes. In this work, similar studies were performed while the subjects were freely flying within transparent cages two meters from the optical system; a proof-of-principle demonstration of a 30 m system was also performed. From the dose–response curves of mortality data created as a function of various beam diameter, pulse width, and power conditions at visible and near-infrared wavelengths, the visible wavelengths required significantly lower laser exposure than near infrared wavelengths to disable subjects, though near infrared sources remain attractive given their cost and retina safety. The flight behavior of the subjects and the performance of the tracking system were found to have no impact on the mortality outcomes for pulse durations up to 25 ms, which appears to be the ideal duration to minimize required laser power. The results of this study affirm the practicality of using optical approaches to protect people and crops from pestilent flying insects.
The modern project of creating human-like artificial intelligence (AI) started after World War II, when it was discovered that electronic computers are not just number-crunching machines, but can also manipulate symbols. It is possible to pursue this goal without assuming that machine intelligence is identical to human intelligence. This is known as weak AI. However, many AI researcher have pursued the aim of developing artificial intelligence that is in principle identical to human intelligence, called strong AI. Weak AI is less ambitious than strong AI, and therefore less controversial. However, there are important controversies related to weak AI as well. This paper focuses on the distinction between artificial general intelligence (AGI) and artificial narrow intelligence (ANI). Although AGI may be classified as weak AI, it is close to strong AI because one chief characteristics of human intelligence is its generality. Although AGI is less ambitious than strong AI, there were critics almost from the very beginning. One of the leading critics was the philosopher Hubert Dreyfus, who argued that computers, who have no body, no childhood and no cultural practice, could not acquire intelligence at all. One of Dreyfus’ main arguments was that human knowledge is partly tacit, and therefore cannot be articulated and incorporated in a computer program. However, today one might argue that new approaches to artificial intelligence research have made his arguments obsolete. Deep learning and Big Data are among the latest approaches, and advocates argue that they will be able to realize AGI. A closer look reveals that although development of artificial intelligence for specific purposes (ANI) has been impressive, we have not come much closer to developing artificial general intelligence (AGI). The article further argues that this is in principle impossible, and it revives Hubert Dreyfus’ argument that computers are not in the world.
Going without sleep for too long kills animals but scientists haven’t known why. Newly published work suggests that the answer lies in an unexpected part of the body.
A transcompiler, also known as source-to-source translator, is a system that converts source code from a high-level programming language (such as C++ or Python) to another. Transcompilers are primarily used for interoperability, and to port codebases written in an obsolete or deprecated language (e.g. COBOL, Python 2) to a modern one. They typically rely on handcrafted rewrite rules, applied to the source code abstract syntax tree. Unfortunately, the resulting translations often lack readability, fail to respect the target language conventions, and require manual modifications in order to work properly. The overall translation process is timeconsuming and requires expertise in both the source and target languages, making code-translation projects expensive.
Although neural models significantly outperform their rule-based counterparts in the context of natural language translation, their applications to transcompilation have been limited due to the scarcity of parallel data in this domain. In this paper, we propose to leverage recent approaches in unsupervised machine translation to train a fully unsupervised neural transcompiler. We train our model on source code from open source GitHub projects, and show that it can translate functions between C++, Java, and Python with high accuracy.
Our method relies exclusively on monolingual source code, requires no expertise in the source or target languages, and can easily be generalized to other
programming languages. We also build and release a test set composed of 852 parallel functions, along with unit tests to check the correctness of translations. We show that our model outperforms rule-based commercial baselines by a significant margin.
We show that for thousands of years, humans have concentrated in a surprisingly narrow subset of Earth’s available climates, characterized by mean annual temperatures around ∼13 °C. This distribution likely reflects a human temperature niche related to fundamental constraints. We demonstrate that depending on scenarios of population growth and warming, over the coming 50 y, 1 to 3 billion people are projected to be left outside the climate conditions that have served humanity well over the past 6,000 y. Absent climate mitigation or migration, a substantial part of humanity will be exposed to mean annual temperatures warmer than nearly anywhere today.
All species have an environmental niche, and despite technological advances, humans are unlikely to be an exception. Here, we demonstrate that for millennia, human populations have resided in the same narrow part of the climatic envelope available on the globe, characterized by a major mode around ∼11 °C to 15 °C mean annual temperature (MAT). Supporting the fundamental nature of this temperature niche, current production of crops and livestock is largely limited to the same conditions, and the same optimum has been found for agricultural and nonagricultural economic output of countries through analyses of year-to-year variation. We show that in a business-as-usual climate change scenario, the geographical position of this temperature niche is projected to shift more over the coming 50 y than it has moved since 6000 BP. Populations will not simply track the shifting climate, as adaptation in situ may address some of the challenges, and many other factors affect decisions to migrate. Nevertheless, in the absence of migration, one third of the global population is projected to experience a MAT >29 °C currently found in only 0.8% of the Earth’s land surface, mostly concentrated in the Sahara. As the potentially most affected regions are among the poorest in the world, where adaptive capacity is low, enhancing human development in those areas should be a priority alongside climate mitigation.
In this short essay, written for a symposium in the San Diego Law Review, Professor Daniel Solove examines the nothing to hide argument. When asked about government surveillance and data mining, many people respond by declaring: "I've got nothing to hide." According to the nothing to hide argument, there is no threat to privacy unless the government uncovers unlawful activity, in which case a person has no legitimate justification to claim that it remain private. The nothing to hide argument and its variants are quite prevalent, and thus are worth addressing. In this essay, Solove critiques the nothing to hide argument and exposes its faulty underpinnings.
We describe the vision of being able to reason about the design space of data structures.
We break this down into two questions: 1) Can we know all data structures that is possible to design? 2) Can we compute the performance of arbitrary designs on a given hardware and workload without having to implement the design or even access the target hardware?
If those challenges are possible, then an array of exciting opportunities would become feasible such as interactive what-if design to improve the productivity of data systems researchers and engineers, and informed decision making in industrial settings with regards to critical ardware/workload/data structure design issues. Then, even fully automated discovery of new data structure designs becomes possible. Furthermore, the structure of the design space itself provides numerous insights and opportunities such as the existence of design continuums that can lead to data systems with deep adaptivity, and a new understanding of the possible performance trade-offs. Given the universal presence of data structures at the very core of any data-driven field across all sciences and industries, reasoning about their design can have significant benefits, making it more feasible (easier, faster and cheaper) to adopt tailored state-of-the-art storage solutions. And this effect is going to become increasingly more critical as data keeps growing, hardware keeps changing and more applications/fields realize the transformative power and potential of data analytics.
This paper presents this vision and surveys first steps that demonstrate its feasibility.
VoLoc, a system that uses the microphone array on Alexa, as well as room echoes of the human voice, to infer the user location inside the home.
Computer simulations are invaluable tools for scientific discovery. However, accurate simulations are often slow to execute, which limits their applicability to extensive parameter exploration, large-scale data analysis, and uncertainty quantification. A promising route to accelerate simulations by building fast emulators with machine learning requires large training datasets, which can be prohibitively expensive to obtain with slow simulations. Here we present a method based on neural architecture search to build accurate emulators even with a limited number of training data. The method successfully accelerates simulations by up to 2 billion times in 10 scientific cases including astrophysics, climate science, biogeochemistry, high energy density physics, fusion energy, and seismology, using the same super-architecture, algorithm, and hyperparameters. Our approach also inherently provides emulator uncertainty estimation, adding further confidence in their use. We anticipate this work will accelerate research involving expensive simulations, allow more extensive parameters exploration, and enable new, previously unfeasible computational discovery.
Detection and attribution typically aims to find long-term climate signals in internal, often short-term variability. Here, common methods are extended to high-frequency temperature and humidity data, detecting instantaneous, global-scale climate change since 1999 for any year and 2012 for any day.
This post overviews the paper Confident Learning: Estimating Uncertainty in Dataset Labels authored by Curtis G. Northcutt, Lu Jiang, and Isaac L. Chuang.
Topical keyphrase extraction is used to summarize large collections of text
documents. However, traditional methods cannot properly reflect the intrinsic
semantics and relationships of keyphrases because they rely on a simple
term-frequency-based process. Consequently, these methods are not effective in
obtaining significant contextual knowledge. To resolve this, we propose a
topical keyphrase extraction method based on a hierarchical semantic network
and multiple centrality network measures that together reflect the hierarchical
semantics of keyphrases. We conduct experiments on real data to examine the
practicality of the proposed method and to compare its performance with that of
existing topical keyphrase extraction methods. The results confirm that the
proposed method outperforms state-of-the-art topical keyphrase extraction
methods in terms of the representativeness of the selected keyphrases for each topic. The proposed method can effectively reflect intrinsic keyphrase semantics and interrelationships.
N-grams have been a common tool for information retrieval and machine learning applications for decades. In nearly all previous works, only a few values of $n$ are tested, with $n > 6$ being exceedingly rare. Larger values of $n$ are not tested due to computational burden or the fear of overfitting.
In this work, we present a method to find the top-$k$ most frequent $n$-grams that is 60$\times$ faster for small $n$, and can tackle large $n\geq1024$. Despite the unprecedented size of $n$ considered, we show how these features still have predictive ability for malware classification tasks. More important, large $n$-grams provide benefits in producing features that are interpretable by malware analysis, and can be used to create general purpose signatures compatible with industry standard tools like Yara. Furthermore, the counts of common $n$-grams in a file may be added as features to publicly available human-engineered features that rival efficacy of professionally-developed features when used to train gradient-boosted decision tree models on the EMBER dataset.
With the increasing number of scientific publications, the analysis of the trends and the state-of-the-art in a certain scientific field is becoming very time-consuming and tedious task. In response to urgent needs of information, for which the existing systematic review model does not well, several other review types have emerged, namely the rapid review and scoping reviews.
The paper proposes an NLP powered tool that automates most of the review process by automatic analysis of articles indexed in the IEEE Xplore, PubMed, and Springer digital libraries. We demonstrate the applicability of the toolkit by analyzing articles related to Enhanced Living Environments and Ambient Assisted Living, in accordance with the PRISMA surveying methodology. The relevant articles were processed by the NLP toolkit to identify articles that contain up to 20 properties clustered into 4 logical groups.
The analysis showed increasing attention from the scientific communities towards Enhanced and Assisted living environments over the last 10 years and showed several trends in the specific research topics that fall into this scope. The case study demonstrates that the NLP toolkit can ease and speed up the review process and show valuable insights from the surveyed articles even without manually reading of most of the articles. Moreover, it pinpoints the most relevant articles which contain more properties and therefore, significantly reduces the manual work, while also generating informative tables, charts and graphs.
Anonymization has been the main means of addressing privacy concerns in sharing medical and socio-demographic data. Here, the authors estimate the likelihood that a specific person can be re-identified in heavily incomplete datasets, casting doubt on the adequacy of current anonymization practices.
Deep learning techniques have become the method of choice for researchers working on algorithmic aspects of recommender systems. With the strongly increased interest in machine learning in general, it has, as a result, become difficult to keep track of what represents the state-of-the-art at the moment, e.g., for top-n recommendation tasks. At the same time, several recent publications point out problems in today's research practice in applied machine learning, e.g., in terms of the reproducibility of the results or the choice of the baselines when proposing new models.
In this work, we report the results of a systematic analysis of algorithmic proposals for top-n recommendation tasks. Specifically, we considered 18 algorithms that were presented at top-level research conferences in the last years. Only 7 of them could be reproduced with reasonable effort. For these methods, it however turned out that 6 of them can often be outperformed with comparably simple heuristic methods, e.g., based on nearest-neighbor or graph-based techniques. The remaining one clearly outperformed the baselines but did not consistently outperform a well-tuned non-neural linear ranking method.
Overall, our work sheds light on a number of potential problems in today's machine learning scholarship and calls for improved scientific practices in this area. Source code of our experiments and full results are available at: https://github.com/MaurizioFD/RecSys2019_DeepLearning_Evaluation.