Addressing the need for novel insect observation and control tools, the Photonic Fence detects and tracks mosquitoes and other flying insects and can apply lethal doses of laser light to them. Previously, we determined lethal exposure levels for a variety of lasers and pulse conditions on anesthetized Anopheles stephensi mosquitoes. In this work, similar studies were performed while the subjects were freely flying within transparent cages two meters from the optical system; a proof-of-principle demonstration of a 30 m system was also performed. From the dose–response curves of mortality data created as a function of various beam diameter, pulse width, and power conditions at visible and near-infrared wavelengths, the visible wavelengths required significantly lower laser exposure than near infrared wavelengths to disable subjects, though near infrared sources remain attractive given their cost and retina safety. The flight behavior of the subjects and the performance of the tracking system were found to have no impact on the mortality outcomes for pulse durations up to 25 ms, which appears to be the ideal duration to minimize required laser power. The results of this study affirm the practicality of using optical approaches to protect people and crops from pestilent flying insects.
The modern project of creating human-like artificial intelligence (AI) started after World War II, when it was discovered that electronic computers are not just number-crunching machines, but can also manipulate symbols. It is possible to pursue this goal without assuming that machine intelligence is identical to human intelligence. This is known as weak AI. However, many AI researcher have pursued the aim of developing artificial intelligence that is in principle identical to human intelligence, called strong AI. Weak AI is less ambitious than strong AI, and therefore less controversial. However, there are important controversies related to weak AI as well. This paper focuses on the distinction between artificial general intelligence (AGI) and artificial narrow intelligence (ANI). Although AGI may be classified as weak AI, it is close to strong AI because one chief characteristics of human intelligence is its generality. Although AGI is less ambitious than strong AI, there were critics almost from the very beginning. One of the leading critics was the philosopher Hubert Dreyfus, who argued that computers, who have no body, no childhood and no cultural practice, could not acquire intelligence at all. One of Dreyfus’ main arguments was that human knowledge is partly tacit, and therefore cannot be articulated and incorporated in a computer program. However, today one might argue that new approaches to artificial intelligence research have made his arguments obsolete. Deep learning and Big Data are among the latest approaches, and advocates argue that they will be able to realize AGI. A closer look reveals that although development of artificial intelligence for specific purposes (ANI) has been impressive, we have not come much closer to developing artificial general intelligence (AGI). The article further argues that this is in principle impossible, and it revives Hubert Dreyfus’ argument that computers are not in the world.
Going without sleep for too long kills animals but scientists haven’t known why. Newly published work suggests that the answer lies in an unexpected part of the body.
A transcompiler, also known as source-to-source translator, is a system that converts source code from a high-level programming language (such as C++ or Python) to another. Transcompilers are primarily used for interoperability, and to port codebases written in an obsolete or deprecated language (e.g. COBOL, Python 2) to a modern one. They typically rely on handcrafted rewrite rules, applied to the source code abstract syntax tree. Unfortunately, the resulting translations often lack readability, fail to respect the target language conventions, and require manual modifications in order to work properly. The overall translation process is timeconsuming and requires expertise in both the source and target languages, making code-translation projects expensive.
Although neural models significantly outperform their rule-based counterparts in the context of natural language translation, their applications to transcompilation have been limited due to the scarcity of parallel data in this domain. In this paper, we propose to leverage recent approaches in unsupervised machine translation to train a fully unsupervised neural transcompiler. We train our model on source code from open source GitHub projects, and show that it can translate functions between C++, Java, and Python with high accuracy.
Our method relies exclusively on monolingual source code, requires no expertise in the source or target languages, and can easily be generalized to other
programming languages. We also build and release a test set composed of 852 parallel functions, along with unit tests to check the correctness of translations. We show that our model outperforms rule-based commercial baselines by a significant margin.
We show that for thousands of years, humans have concentrated in a surprisingly narrow subset of Earth’s available climates, characterized by mean annual temperatures around ∼13 °C. This distribution likely reflects a human temperature niche related to fundamental constraints. We demonstrate that depending on scenarios of population growth and warming, over the coming 50 y, 1 to 3 billion people are projected to be left outside the climate conditions that have served humanity well over the past 6,000 y. Absent climate mitigation or migration, a substantial part of humanity will be exposed to mean annual temperatures warmer than nearly anywhere today.
All species have an environmental niche, and despite technological advances, humans are unlikely to be an exception. Here, we demonstrate that for millennia, human populations have resided in the same narrow part of the climatic envelope available on the globe, characterized by a major mode around ∼11 °C to 15 °C mean annual temperature (MAT). Supporting the fundamental nature of this temperature niche, current production of crops and livestock is largely limited to the same conditions, and the same optimum has been found for agricultural and nonagricultural economic output of countries through analyses of year-to-year variation. We show that in a business-as-usual climate change scenario, the geographical position of this temperature niche is projected to shift more over the coming 50 y than it has moved since 6000 BP. Populations will not simply track the shifting climate, as adaptation in situ may address some of the challenges, and many other factors affect decisions to migrate. Nevertheless, in the absence of migration, one third of the global population is projected to experience a MAT >29 °C currently found in only 0.8% of the Earth’s land surface, mostly concentrated in the Sahara. As the potentially most affected regions are among the poorest in the world, where adaptive capacity is low, enhancing human development in those areas should be a priority alongside climate mitigation.
In this short essay, written for a symposium in the San Diego Law Review, Professor Daniel Solove examines the nothing to hide argument. When asked about government surveillance and data mining, many people respond by declaring: "I've got nothing to hide." According to the nothing to hide argument, there is no threat to privacy unless the government uncovers unlawful activity, in which case a person has no legitimate justification to claim that it remain private. The nothing to hide argument and its variants are quite prevalent, and thus are worth addressing. In this essay, Solove critiques the nothing to hide argument and exposes its faulty underpinnings.
We describe the vision of being able to reason about the design space of data structures.
We break this down into two questions: 1) Can we know all data structures that is possible to design? 2) Can we compute the performance of arbitrary designs on a given hardware and workload without having to implement the design or even access the target hardware?
If those challenges are possible, then an array of exciting opportunities would become feasible such as interactive what-if design to improve the productivity of data systems researchers and engineers, and informed decision making in industrial settings with regards to critical ardware/workload/data structure design issues. Then, even fully automated discovery of new data structure designs becomes possible. Furthermore, the structure of the design space itself provides numerous insights and opportunities such as the existence of design continuums that can lead to data systems with deep adaptivity, and a new understanding of the possible performance trade-offs. Given the universal presence of data structures at the very core of any data-driven field across all sciences and industries, reasoning about their design can have significant benefits, making it more feasible (easier, faster and cheaper) to adopt tailored state-of-the-art storage solutions. And this effect is going to become increasingly more critical as data keeps growing, hardware keeps changing and more applications/fields realize the transformative power and potential of data analytics.
This paper presents this vision and surveys first steps that demonstrate its feasibility.
VoLoc, a system that uses the microphone array on Alexa, as well as room echoes of the human voice, to infer the user location inside the home.
Computer simulations are invaluable tools for scientific discovery. However, accurate simulations are often slow to execute, which limits their applicability to extensive parameter exploration, large-scale data analysis, and uncertainty quantification. A promising route to accelerate simulations by building fast emulators with machine learning requires large training datasets, which can be prohibitively expensive to obtain with slow simulations. Here we present a method based on neural architecture search to build accurate emulators even with a limited number of training data. The method successfully accelerates simulations by up to 2 billion times in 10 scientific cases including astrophysics, climate science, biogeochemistry, high energy density physics, fusion energy, and seismology, using the same super-architecture, algorithm, and hyperparameters. Our approach also inherently provides emulator uncertainty estimation, adding further confidence in their use. We anticipate this work will accelerate research involving expensive simulations, allow more extensive parameters exploration, and enable new, previously unfeasible computational discovery.
Detection and attribution typically aims to find long-term climate signals in internal, often short-term variability. Here, common methods are extended to high-frequency temperature and humidity data, detecting instantaneous, global-scale climate change since 1999 for any year and 2012 for any day.
This post overviews the paper Confident Learning: Estimating Uncertainty in Dataset Labels authored by Curtis G. Northcutt, Lu Jiang, and Isaac L. Chuang.
Topical keyphrase extraction is used to summarize large collections of text
documents. However, traditional methods cannot properly reflect the intrinsic
semantics and relationships of keyphrases because they rely on a simple
term-frequency-based process. Consequently, these methods are not effective in
obtaining significant contextual knowledge. To resolve this, we propose a
topical keyphrase extraction method based on a hierarchical semantic network
and multiple centrality network measures that together reflect the hierarchical
semantics of keyphrases. We conduct experiments on real data to examine the
practicality of the proposed method and to compare its performance with that of
existing topical keyphrase extraction methods. The results confirm that the
proposed method outperforms state-of-the-art topical keyphrase extraction
methods in terms of the representativeness of the selected keyphrases for each topic. The proposed method can effectively reflect intrinsic keyphrase semantics and interrelationships.
N-grams have been a common tool for information retrieval and machine learning applications for decades. In nearly all previous works, only a few values of $n$ are tested, with $n > 6$ being exceedingly rare. Larger values of $n$ are not tested due to computational burden or the fear of overfitting.
In this work, we present a method to find the top-$k$ most frequent $n$-grams that is 60$\times$ faster for small $n$, and can tackle large $n\geq1024$. Despite the unprecedented size of $n$ considered, we show how these features still have predictive ability for malware classification tasks. More important, large $n$-grams provide benefits in producing features that are interpretable by malware analysis, and can be used to create general purpose signatures compatible with industry standard tools like Yara. Furthermore, the counts of common $n$-grams in a file may be added as features to publicly available human-engineered features that rival efficacy of professionally-developed features when used to train gradient-boosted decision tree models on the EMBER dataset.
With the increasing number of scientific publications, the analysis of the trends and the state-of-the-art in a certain scientific field is becoming very time-consuming and tedious task. In response to urgent needs of information, for which the existing systematic review model does not well, several other review types have emerged, namely the rapid review and scoping reviews.
The paper proposes an NLP powered tool that automates most of the review process by automatic analysis of articles indexed in the IEEE Xplore, PubMed, and Springer digital libraries. We demonstrate the applicability of the toolkit by analyzing articles related to Enhanced Living Environments and Ambient Assisted Living, in accordance with the PRISMA surveying methodology. The relevant articles were processed by the NLP toolkit to identify articles that contain up to 20 properties clustered into 4 logical groups.
The analysis showed increasing attention from the scientific communities towards Enhanced and Assisted living environments over the last 10 years and showed several trends in the specific research topics that fall into this scope. The case study demonstrates that the NLP toolkit can ease and speed up the review process and show valuable insights from the surveyed articles even without manually reading of most of the articles. Moreover, it pinpoints the most relevant articles which contain more properties and therefore, significantly reduces the manual work, while also generating informative tables, charts and graphs.
Anonymization has been the main means of addressing privacy concerns in sharing medical and socio-demographic data. Here, the authors estimate the likelihood that a specific person can be re-identified in heavily incomplete datasets, casting doubt on the adequacy of current anonymization practices.
Deep learning techniques have become the method of choice for researchers working on algorithmic aspects of recommender systems. With the strongly increased interest in machine learning in general, it has, as a result, become difficult to keep track of what represents the state-of-the-art at the moment, e.g., for top-n recommendation tasks. At the same time, several recent publications point out problems in today's research practice in applied machine learning, e.g., in terms of the reproducibility of the results or the choice of the baselines when proposing new models.
In this work, we report the results of a systematic analysis of algorithmic proposals for top-n recommendation tasks. Specifically, we considered 18 algorithms that were presented at top-level research conferences in the last years. Only 7 of them could be reproduced with reasonable effort. For these methods, it however turned out that 6 of them can often be outperformed with comparably simple heuristic methods, e.g., based on nearest-neighbor or graph-based techniques. The remaining one clearly outperformed the baselines but did not consistently outperform a well-tuned non-neural linear ranking method.
Overall, our work sheds light on a number of potential problems in today's machine learning scholarship and calls for improved scientific practices in this area. Source code of our experiments and full results are available at: https://github.com/MaurizioFD/RecSys2019_DeepLearning_Evaluation.
Natural language processing algorithms applied to three million materials science abstracts uncover relationships between words, material compositions and properties, and predict potential new thermoelectric materials.
The overwhelming majority of scientific knowledge is published as text, which is difficult to analyse by either traditional statistical analysis or modern machine learning methods. By contrast, the main source of machine-interpretable data for the materials research community has come from structured property databases, which encompass only a small fraction of the knowledge present in the research literature. Beyond property values, publications contain valuable knowledge regarding the connections and relationships between data items as interpreted by the authors. To improve the identification and use of this knowledge, several studies have focused on the retrieval of information from scientific literature using supervised natural language processing, which requires large hand-labelled datasets for training. Here we show that materials science knowledge present in the published literature can be efficiently encoded as information-dense word embeddings (vector representations of words) without human labelling or supervision. Without any explicit insertion of chemical knowledge, these embeddings capture complex materials science concepts such as the underlying structure of the periodic table and structure–property relationships in materials. Furthermore, we demonstrate that an unsupervised method can recommend materials for functional applications several years before their discovery. This suggests that latent knowledge regarding future discoveries is to a large extent embedded in past publications. Our findings highlight the possibility of extracting knowledge and relationships from the massive body of scientific literature in a collective manner, and point towards a generalized approach to the mining of scientific literature.
We present HotStuff, a leader-based Byzantine fault-tolerant replication protocol for the partially synchronous model.
Once network communication becomes synchronous, HotStuff enables a correct leader to drive the protocol to consensus at the pace of actual (vs. maximum) network delay--a property called responsiveness--and with communication complexity that is linear in the number of replicas. To our knowledge, HotStuff is the first partially synchronous BFT replication protocol exhibiting these combined properties. HotStuff is built around a novel framework that forms a bridge between classical BFT foundations and blockchains. It allows the expression of other known protocols (DLS, PBFT, Tendermint, Casper), and ours, in a common framework.
Our deployment of HotStuff over a network with over 100 replicas achieves throughput and latency comparable to that of BFT-SMaRt, while enjoying linear communication footprint during leader failover (vs. quadratic with BFT-SMaRt).
Daniel J. Bernstein, Bo-Yin Yang. "Fast constant-time gcd computation and modular inversion."