软件开发新手入门五大核心技能：基础编程能力

时间：2026-05-30 21:47

第五章数组与集合：批量数据管理与数据结构基础说到批量数据管理，Java中的数组与集合，以及Python中的列表与字典，是每位开发者几乎每天都在打交道的数据结构。尽管这些基础内容看似简单，但使用是否得当，直接影响代码的运行效率和可读性。我们先从最经典的数组入手——它虽然设计传统，但内存布局清晰、性

第五章数组与集合：批量数据管理与数据结构基础

说到批量数据管理，Java中的数组与集合，以及Python中的列表与字典，是每位开发者几乎每天都在打交道的数据结构。尽管这些基础内容看似简单，但使用是否得当，直接影响代码的运行效率和可读性。我们先从最经典的数组入手——它虽然设计传统，但内存布局清晰、性能可控，是理解底层原理的理想起点。

5.1 数组深入理解：声明方式与内存模型

软件开发新手入门五大核心技能之基础编程能力（四）

先来看Java数组的几种声明方式。你可能会遇到三种典型写法：直接分配空间、声明时赋值、或者分步声明。关键要记住：数组的引用存储在栈内存中，实际内容保存在堆内存里，初始值分别为0或null——这是内存管理的基础概念，后续学习集合时也能通过对比加深理解。

// 数组的声明与初始化
// 方式1：声明并分配空间
int[] numbers = new int[5];  // 默认值0
String[] names = new String[3];  // 默认值null
// 方式2：声明并赋值
int[] scores = {85, 92, 78, 90, 88};
// 方式3：分步
int[] ages;
ages = new int[]{18, 20, 22};
// 数组内存分析
// int[] arr = new int[3];
// 栈内存: arr -> 堆内存地址 0x1000
// 堆内存: 0x1000: [0, 0, 0]（初始值）
// 数组遍历
int[] data = {1, 2, 3, 4, 5};
// 传统for循环
for (int i = 0; i < data.length; i++) {
    System.out.println(data[i]);
}
// 增强for循环
for (int num : data) {
    System.out.println(num);
}
// 多维数组
int[][] matrix = {{1, 2, 3},{4, 5, 6},{7, 8, 9}};
// 不规则数组（锯齿数组）
int[][] jagged = new int[3][];
jagged[0] = new int[2];
jagged[1] = new int[4];
jagged[2] = new int[3];
// 数组常用操作
int[] arr = {5, 2, 8, 1, 9};
Arrays.sort(arr);  // 排序：[1,2,5,8,9]
int index = Arrays.binarySearch(arr, 5);  // 二分查找：2
int[] copy = Arrays.copyOf(arr, 10);  // 复制并扩展
boolean equal = Arrays.equals(arr, copy); // 比较内容
Arrays.fill(arr, 0); // 填充为0
// 数组的局限性
// 1. 长度固定，无法动态扩展
// 2. 只能存储同类型数据
// 3. 插入删除效率低（需要移动元素）

数组存在三个明显的局限性：长度固定、类型单一、插入与删除需要大量移动元素。正因如此，在需要频繁增删改的场景下，动态数组（例如Java的ArrayList或Python的列表）就派上了用场。说到Python列表，它的灵活性几乎完全打破了数组的约束——切片操作就是一个典型的例子。

# Python列表（动态数组）
lst = [1, 2, 3, 4, 5]
# 切片操作（强大功能）
print(lst[1:4])  # [2, 3, 4]
print(lst[:3])   # [1, 2, 3]
print(lst[2:])   # [3, 4, 5]
print(lst[::2])  # [1, 3, 5]（步长2）
print(lst[::-1]) # [5, 4, 3, 2, 1]（反转）
# 修改元素
lst[2] = 99  # [1, 2, 99, 4, 5]
# 添加元素
lst.append(6)    # 末尾添加
lst.insert(0, 0) # 指定位置插入
lst.extend([7, 8]) # 扩展多个
# 删除元素
lst.pop()    # 删除末尾，返回被删元素
lst.pop(2)   # 删除索引2
lst.remove(99) # 删除值为99的第一个元素
del lst[0]   # 删除索引0
# 列表常用操作
numbers = [3, 1, 4, 1, 5, 9, 2]
numbers.sort()        # 原地排序
sorted_numbers = sorted(numbers)  # 返回新列表
numbers.reverse()     # 反转
print(len(numbers))   # 长度
print(max(numbers))   # 最大值
print(min(numbers))   # 最小值
print(sum(numbers))   # 求和
# 列表推导式（高效创建列表）
squares = [x**2 for x in range(10)]
even_squares = [x**2 for x in range(20) if x % 2 == 0]
matrix = [[i*j for j in range(5)] for i in range(5)]
# 列表作为栈和队列
# 栈（后进先出）
stack = []
stack.append(1)
stack.append(2)
stack.pop()  # 2
# 队列（先进先出，使用collections.deque效率高）
from collections import deque
queue = deque([1, 2, 3])
queue.append(4)
queue.popleft()  # 1

5.2 集合框架高级用法：ArrayList、HashMap与更多数据结构

进入Java集合框架后，你会发现它已经为你准备好了各种数据结构的实现。ArrayList是动态数组，适合随机访问；LinkedList是双向链表，适合频繁插入与删除；HashSet擅长去重，TreeSet则自动排序。HashMap更是键值对查询的利器——当然，别忘了线程安全问题，在多线程环境下需要额外注意。

import ja va.util.*;
public class CollectionDemo {
    public static void main(String[] args) {
        // ArrayList（动态数组）
        List arrayList = new ArrayList<>();
        arrayList.add("Apple");
        arrayList.add("Banana");
        arrayList.add(1, "Orange");   // 插入索引1
        String fruit = arrayList.get(0);
        arrayList.remove("Banana");

        // LinkedList（双向链表）
        LinkedList linkedList = new LinkedList<>();
        linkedList.addFirst(1);
        linkedList.addLast(3);
        linkedList.add(1, 2);
        int first = linkedList.getFirst();
        int last = linkedList.getLast();

        // HashSet（无序，不重复）
        Set hashSet = new HashSet<>();
        hashSet.add("Ja va");
        hashSet.add("Python");
        hashSet.add("Ja va");  // 重复，不会添加
        System.out.println(hashSet.size());  // 2

        // TreeSet（有序，红黑树）
        TreeSet treeSet = new TreeSet<>();
        treeSet.add(5);
        treeSet.add(2);
        treeSet.add(8);
        treeSet.add(1);
        System.out.println(treeSet);  // [1, 2, 5, 8]

        // HashMap
        Map hashMap = new HashMap<>();
        hashMap.put("Alice", 85);
        hashMap.put("Bob", 92);
        hashMap.put("Charlie", 78);
        // 遍历Map
        for (Map.Entry entry : hashMap.entrySet()) {
            System.out.println(entry.getKey() + ": " + entry.getValue());
        }
        // 获取值（带默认值）
        int score = hashMap.getOrDefault("Da vid", 0);

        // TreeMap（有序Map）
        TreeMap treeMap = new TreeMap<>();
        treeMap.put("Banana", 3);
        treeMap.put("Apple", 5);
        treeMap.put("Cherry", 2);
        System.out.println(treeMap);  // {Apple=5, Banana=3, Cherry=2}

        // 迭代器（Iterator）
        List numbers = Arrays.asList(1, 2, 3, 4, 5);
        Iterator it = numbers.iterator();
        while (it.hasNext()) {
            int num = it.next();
            if (num % 2 == 0) {
                it.remove();  // 安全删除
            }
        }
    }
}

再看Python的字典和集合，语法设计非常直观。字典的访问可以使用get方法并指定默认值，合并可以使用管道符（Python 3.9+）或**解包，操作起来非常简洁。集合的运算（并集、交集、差集、对称差）直接对应数学符号，写出来的代码就像伪代码一样清晰。需要特别注意的是，Python集合的去重功能在面试中经常被问及，但其背后依赖元素的可哈希性——可变对象（例如列表）不能放入集合。

# 字典（dict）- 哈希表实现
person = {
    "name": "Alice",
    "age": 25,
    "city": "New York"
}
# 访问元素
print(person["name"])         # Alice
print(person.get("age"))      # 25
print(person.get("country", "USA"))  # 默认值
# 添加/修改
person["email"] = "alice@example.com"
person["age"] = 26
# 删除
del person["city"]
email = person.pop("email")    # 删除并返回值
last_item = person.popitem()   # 删除并返回最后一项
# 字典遍历
for key in person:
    print(f"{key}: {person[key]}")
for key, value in person.items():
    print(f"{key}: {value}")
for key in person.keys():
    print(key)
for value in person.values():
    print(value)
# 字典推导式
squares_dict = {x: x**2 for x in range(5)}  # {0:0, 1:1, 2:4, 3:9, 4:16}
# 合并字典
dict1 = {"a": 1, "b": 2}
dict2 = {"c": 3, "d": 4}
merged = {**dict1, **dict2}  # Python 3.5+
# 或
merged = dict1 | dict2       # Python 3.9+

# 集合（set）- 无序不重复
numbers = {1, 2, 3, 4, 5}
numbers.add(6)
numbers.remove(3)
numbers.discard(10)  # 不存在也不报错
# 集合运算
A = {1, 2, 3, 4}
B = {3, 4, 5, 6}
print(A | B)  # 并集: {1,2,3,4,5,6}
print(A & B)  # 交集: {3,4}
print(A - B)  # 差集: {1,2}
print(A ^ B)  # 对称差: {1,2,5,6}
# 检查子集/超集
print({1, 2}.issubset(A))     # True
print(A.issuperset({1, 2}))   # True
# 集合推导式
even_set = {x for x in range(10) if x % 2 == 0}  # {0,2,4,6,8}
# 去重功能
duplicates = [1, 2, 2, 3, 3, 3, 4]
unique = list(set(duplicates))  # [1,2,3,4]

最后提醒一点：数组和集合没有绝对的优劣之分，选型完全取决于应用场景。如果数据量固定且追求极致性能，数组是最佳选择；如果业务逻辑变化频繁，集合框架能帮你节省大量开发时间。掌握好这些基础，后续学习更复杂的数据结构（例如图、堆、布隆过滤器）时，理解起来会顺畅得多。

来源：https://developer.aliyun.com/article/1738602

编程