# 数据收集 Collection
# Grade 2 ( 5-7岁 ) 要掌握的程度
Everyday digital devices collect and display data over time. The collection and use of data about individuals and the world around them is a routine part of life and influences how people live.
Many everyday objects, such as cell phones, digital toys, and cars, can contain tools (such as sensors) and computers to collect and display data from their surroundings.
Crosscutting Concept: Human–Computer Interaction Connection Within Framework: K–2.Networks and the Internet.Network Communication and Organization
数字设备时时刻刻在收集、处理、展示数据. 关于个人和周遭世界的数据收集和数据使用已经成为我们日常生活的一部分, 直接影响着人们的生活方式.
许多日常用品, 比如手机、数码玩具、汽车, 都内置了传感器和计算机系统, 都在不断地收集和展示着数据.
# Grade 5 ( 8-11岁 ) 要掌握的程度
People select digital tools for the collection of data based on what is being observed and how the data will be used. For example, a digital thermometer is used to measure temperature and a GPS sensor is used to track locations.
There is a wide array of digital data collection tools; however, only some are appropriate for certain types of data. Tools are chosen based upon the type of measurement they use as well as the type of data people wish to observe. Data scientists use the term observation to describe data collection, whether or not a human is involved in the collection.
Crosscutting Concept: Abstraction Connections Within Framework: 3–5.Algorithms and Programming.Variables; 3–5.Algorithms and Programming.Algorithms
我们需要根据观测的对象和数据的使用方式, 来选择收集数据的工具. 比如说, 想要测量温度数据可以选择数字温度计, 想要追踪位置信息可以选择GPS传感器.
用于收集数据的数字工具种类繁多, 但是, 某些特定类型的数据只能用某些特定的数字工具. 工具的最终选择, 仍然需要根据测量的类型以及人们希望观察的数据类型来决定.
需要注意, 数据科学家通常用“观测/observation”这个词来描述“数据收集”的过程, 而不管这个过程是否有人参与还是只使用了数字设备自动收集数据.
# Grade 8 ( 11-14岁 ) 要掌握的程度
People design algorithms and tools to automate the collection of data by computers. When data collection is automated, data is sampled and converted into a form that a computer can process. For example, data from an analog sensor must be converted into a digital form. The method used to automate data collection is influenced by the availability of tools and the intended use of the data.
Data can be collected from either individual devices or systems. The method of data collection (for example, surveys versus sensor data) can affect the accuracy and precision of the data. Some types of data are more difficult to collect than others. For example, emotions must be subjectively evaluated on an individual basis and are thus difficult to measure across a population. Access to tools may be limited by factors including cost, training, and availability.
Crosscutting Concept: Human–Computer Interaction Connection Within Framework: 6–8.Computing Systems.Hardware and Software
人们通过设计算法和工具, 让计算机能够自动收集数据. 数据被自动收集后, 会被自动取样并转换为计算机可以处理的形式. 比如, 从模拟传感器收集的数据必须转换为计算机能够处理的数字形式.
选用数据自动化收集的方法时, 需要考虑工具的可用性和数据的预期用途.
数据可以从单个设备或者系统中收集.
不同的数据收集方法 (比如线下人工调查、传感器) 会影响数据的准确性和精确度.
有些类型的数据比其他数据更难收集. 比如, 关于情绪的数据必须基于个人的主观评价, 所以很难在大范围的人群中进行测量.
选用哪一种数据收集工具需要考虑成本、培训教学、可用性等因素.
# Grade 12 ( 14-18岁 ) 要掌握的程度
Data is constantly collected or generated through automated processes that are not always evident, raising privacy concerns. The different collection methods and tools that are used influence the amount and quality of the data that is observed and recorded.
Data can be collected and aggregated across millions of people, even when they are not actively engaging with or physically near the data collection devices. This automated and nonevident collection can raise privacy concerns, such as social media sites mining an account even when the user is not online. Other examples include surveillance video used in a store to track customers for security or information about purchase habits or the monitoring of road traffic to change signals in real time to improve road efficiency without drivers being aware. Methods and devices for collecting data can differ by the amount of storage required, level of detail collected, and sampling rates. For example, ultrasonic range finders are good at long distances and are very accurate, as compared to infrared range finders, which are better for short distances. Computer models and simulations produce large amounts of data used in analysis.
Crosscutting Concept: Privacy and Security Connections Within Framework: 9–12.Computing Systems.Devices; 9–12. Impacts of Computing.Safety, Law, and Ethics
巨量的数据时时刻刻被收集起来或者不断通过程序自动生成, 这个过程有时候是隐蔽的, 这引起了人们对隐私的关注.
使用不同的收集方法和工具, 会影响数据的数量和质量.
数以百万计的数据被收集和汇总起来, 即使人们并没有主动和数据收集设备接触, 甚至有时候不需要人们靠近这些设备. 这种自动的、不明显的数据收集可能会引起隐私问题. 比如说, 社交媒体网站在用户不在线的情况下仍然会不断挖掘和分析账户的信息. 比如说, 商店里的视频监控, 除了用来保障顾客安全之外, 也被用来追踪和分析顾客的购买习惯. 再比如说, 路上的视频监控, 在司机不知情的情况下, 收集车辆信息, 来实时改变交通信号, 以提高道路通行效率.
根据数据存储大小的需求、需要的数据详细程度以及采样率, 可以采用不同的数据收集方法和设备. 比如说, 在远距离监测上, 超声波测距仪更好, 而且数据非常准确. 而在短距离监测上, 红外线测距仪表现得更好.
除了真实世界的数据, 我们还可以使用计算机模型和计算机模拟来产生大量用于分析的数据.